A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks

Bernhard Lehner; Gerhard Widmer; Sebastian Bock

doi:10.1109/EUSIPCO.2015.7362337

A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks

Bernhard Lehner, Gerhard Widmer, Sebastian Bock

Embedded AI (eAI)

Research output: Conference proceeding/Chapter in Book/Report/ › Chapter › peer-review

Abstract

Singing voice detection aims at identifying the regions in a music recording where at least one person sings. This is a challenging problem that cannot be solved without analysing the temporal evolution of the signal. Current state-of-the-art methods combine timbral with temporal characteristics, by summarising various feature values over time, e.g. by computing their variance. This leads to more contextual information, but also to increased latency, which is problematic if our goal is on-line, real-time singing voice detection. To overcome this problem and reduce the necessity to include context in the features themselves, we introduce a method that uses Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN). In experiments on several data sets, the resulting singing voice detector outperforms the state-of-the-art baselines in terms of accuracy, while at the same time drastically reducing latency and increasing the time resolution of the detector.

Original language	English
Title of host publication	2015 23rd European Signal Processing Conference, EUSIPCO 2015
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	21-25
Number of pages	5
ISBN (Print)	9780992862633
DOIs	https://doi.org/10.1109/EUSIPCO.2015.7362337
Publication status	Published - 22 Dec 2015

Publication series

Name	2015 23rd European Signal Processing Conference, EUSIPCO 2015

Keywords

music information retrieval
recurrent neural nets
singing voice detection

Access to Document

10.1109/EUSIPCO.2015.7362337

Cite this

Lehner, B., Widmer, G., & Bock, S. (2015). A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks. In 2015 23rd European Signal Processing Conference, EUSIPCO 2015 (pp. 21-25). (2015 23rd European Signal Processing Conference, EUSIPCO 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/EUSIPCO.2015.7362337

@inbook{26e4a172630a421dbd06f06169fcef7a,

title = "A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks",

abstract = "Singing voice detection aims at identifying the regions in a music recording where at least one person sings. This is a challenging problem that cannot be solved without analysing the temporal evolution of the signal. Current state-of-the-art methods combine timbral with temporal characteristics, by summarising various feature values over time, e.g. by computing their variance. This leads to more contextual information, but also to increased latency, which is problematic if our goal is on-line, real-time singing voice detection. To overcome this problem and reduce the necessity to include context in the features themselves, we introduce a method that uses Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN). In experiments on several data sets, the resulting singing voice detector outperforms the state-of-the-art baselines in terms of accuracy, while at the same time drastically reducing latency and increasing the time resolution of the detector.",

keywords = "music information retrieval, recurrent neural nets, singing voice detection",

author = "Bernhard Lehner and Gerhard Widmer and Sebastian Bock",

year = "2015",

month = dec,

day = "22",

doi = "10.1109/EUSIPCO.2015.7362337",

language = "English",

isbn = "9780992862633",

series = "2015 23rd European Signal Processing Conference, EUSIPCO 2015",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "21--25",

booktitle = "2015 23rd European Signal Processing Conference, EUSIPCO 2015",

address = "United States",

}

Lehner, B, Widmer, G & Bock, S 2015, A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks. in 2015 23rd European Signal Processing Conference, EUSIPCO 2015. 2015 23rd European Signal Processing Conference, EUSIPCO 2015, Institute of Electrical and Electronics Engineers Inc., pp. 21-25. https://doi.org/10.1109/EUSIPCO.2015.7362337

A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks. / Lehner, Bernhard; Widmer, Gerhard; Bock, Sebastian.
2015 23rd European Signal Processing Conference, EUSIPCO 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 21-25 (2015 23rd European Signal Processing Conference, EUSIPCO 2015).

Research output: Conference proceeding/Chapter in Book/Report/ › Chapter › peer-review

TY - CHAP

T1 - A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks

AU - Lehner, Bernhard

AU - Widmer, Gerhard

AU - Bock, Sebastian

PY - 2015/12/22

Y1 - 2015/12/22

N2 - Singing voice detection aims at identifying the regions in a music recording where at least one person sings. This is a challenging problem that cannot be solved without analysing the temporal evolution of the signal. Current state-of-the-art methods combine timbral with temporal characteristics, by summarising various feature values over time, e.g. by computing their variance. This leads to more contextual information, but also to increased latency, which is problematic if our goal is on-line, real-time singing voice detection. To overcome this problem and reduce the necessity to include context in the features themselves, we introduce a method that uses Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN). In experiments on several data sets, the resulting singing voice detector outperforms the state-of-the-art baselines in terms of accuracy, while at the same time drastically reducing latency and increasing the time resolution of the detector.

AB - Singing voice detection aims at identifying the regions in a music recording where at least one person sings. This is a challenging problem that cannot be solved without analysing the temporal evolution of the signal. Current state-of-the-art methods combine timbral with temporal characteristics, by summarising various feature values over time, e.g. by computing their variance. This leads to more contextual information, but also to increased latency, which is problematic if our goal is on-line, real-time singing voice detection. To overcome this problem and reduce the necessity to include context in the features themselves, we introduce a method that uses Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN). In experiments on several data sets, the resulting singing voice detector outperforms the state-of-the-art baselines in terms of accuracy, while at the same time drastically reducing latency and increasing the time resolution of the detector.

KW - music information retrieval

KW - recurrent neural nets

KW - singing voice detection

UR - https://www.mendeley.com/catalogue/304cf74b-d050-3565-99d7-6e87b39b8ba9/

U2 - 10.1109/EUSIPCO.2015.7362337

DO - 10.1109/EUSIPCO.2015.7362337

M3 - Chapter

SN - 9780992862633

T3 - 2015 23rd European Signal Processing Conference, EUSIPCO 2015

SP - 21

EP - 25

BT - 2015 23rd European Signal Processing Conference, EUSIPCO 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks

Abstract

Publication series

Keywords

Access to Document

Other files and links

Fingerprint

Cite this