TY - CONF
T1 - Improving voice activity detection in movies
AU - Lehner, Bernhard
AU - Widmer, Gerhard
AU - Sonnleitner, Reinhard
PY - 2015
Y1 - 2015
N2 - Voice Activity Detection in movies is a non-trivial and challenging task. The different emotional states of the speakers, as well as the variety of soundscapes and noises contribute to the complexity of the task. In this paper, we propose a set of lightweight features that are specifically designed to perform under such conditions, while at the same time preventing confusions of singing voice with speech. For evaluation, we use four fulllength movies, previously unseen to the system and painstakingly annotated. We compare our detector to a state-of-the-art reference system. The new approach performs better, yielding just about half the Equal Error Rate (EER). Furthermore, since the ground truth annotation task is extremely tedious, and to help with advancing in this topic, we release the annotations of all four movies to the research community.
AB - Voice Activity Detection in movies is a non-trivial and challenging task. The different emotional states of the speakers, as well as the variety of soundscapes and noises contribute to the complexity of the task. In this paper, we propose a set of lightweight features that are specifically designed to perform under such conditions, while at the same time preventing confusions of singing voice with speech. For evaluation, we use four fulllength movies, previously unseen to the system and painstakingly annotated. We compare our detector to a state-of-the-art reference system. The new approach performs better, yielding just about half the Equal Error Rate (EER). Furthermore, since the ground truth annotation task is extremely tedious, and to help with advancing in this topic, we release the annotations of all four movies to the research community.
KW - Speech detection
KW - Voice activity detection
UR - https://www.mendeley.com/catalogue/88b679d9-685a-31e0-9f4c-1d0f119e53cd/
UR - https://www.mendeley.com/catalogue/88b679d9-685a-31e0-9f4c-1d0f119e53cd/
U2 - 10.21437/interspeech.2015-455
DO - 10.21437/interspeech.2015-455
M3 - Paper
ER -