Abstract

A plethora of different onset detection methods have been proposed in the recent years. However, few attempts have been made with respect to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. In this paper, we present a multi-resolution approach based on discrete wavelet transform and linear prediction filtering that improves time resolution and performance of onset detection in different musical scenarios. In our approach, wavelet coefficients and forward prediction errors are combined with auditory spectral features and then processed by a bidirectional Long Short-Term Memory recurrent neural network, which acts as reduction function. The network is trained with a large database of onset data covering various genres and onset types. We compare results with state-of-the-art methods on a dataset that includes Bello, Glover and ISMIR 2004 Ballroom sets, and we conclude that our approach significantly outperforms existing methods in terms of F-Measure. For pitched non percussive music an absolute improvement of 7.5% is reported.

Keywords

Computer scienceArtificial neural networkArtificial intelligenceWaveletPattern recognition (psychology)Speech recognitionLinear predictionTerm (time)Machine learning

Affiliated Institutions

Related Publications

Publication Info

Year
2014
Type
article
Pages
2164-2168
Citations
91
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

91
OpenAlex
2
Influential
55
CrossRef

Cite This

Erik Marchi, Giacomo Ferroni, Florian Eyben et al. (2014). Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2164-2168. https://doi.org/10.1109/icassp.2014.6853982

Identifiers

DOI
10.1109/icassp.2014.6853982

Data Quality

Data completeness: 81%