TY - CHAP
T1 - A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks
AU - Lehner, Bernhard
AU - Widmer, Gerhard
AU - Bock, Sebastian
PY - 2015/12/22
Y1 - 2015/12/22
N2 - Singing voice detection aims at identifying the regions in a music recording where at least one person sings. This is a challenging problem that cannot be solved without analysing the temporal evolution of the signal. Current state-of-the-art methods combine timbral with temporal characteristics, by summarising various feature values over time, e.g. by computing their variance. This leads to more contextual information, but also to increased latency, which is problematic if our goal is on-line, real-time singing voice detection. To overcome this problem and reduce the necessity to include context in the features themselves, we introduce a method that uses Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN). In experiments on several data sets, the resulting singing voice detector outperforms the state-of-the-art baselines in terms of accuracy, while at the same time drastically reducing latency and increasing the time resolution of the detector.
AB - Singing voice detection aims at identifying the regions in a music recording where at least one person sings. This is a challenging problem that cannot be solved without analysing the temporal evolution of the signal. Current state-of-the-art methods combine timbral with temporal characteristics, by summarising various feature values over time, e.g. by computing their variance. This leads to more contextual information, but also to increased latency, which is problematic if our goal is on-line, real-time singing voice detection. To overcome this problem and reduce the necessity to include context in the features themselves, we introduce a method that uses Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN). In experiments on several data sets, the resulting singing voice detector outperforms the state-of-the-art baselines in terms of accuracy, while at the same time drastically reducing latency and increasing the time resolution of the detector.
KW - music information retrieval
KW - recurrent neural nets
KW - singing voice detection
UR - https://www.mendeley.com/catalogue/304cf74b-d050-3565-99d7-6e87b39b8ba9/
U2 - 10.1109/EUSIPCO.2015.7362337
DO - 10.1109/EUSIPCO.2015.7362337
M3 - Chapter
SN - 9780992862633
T3 - 2015 23rd European Signal Processing Conference, EUSIPCO 2015
SP - 21
EP - 25
BT - 2015 23rd European Signal Processing Conference, EUSIPCO 2015
PB - Institute of Electrical and Electronics Engineers Inc.
ER -