Improving voice activity detection in movies

Bernhard Lehner, Gerhard Widmer, Reinhard Sonnleitner

Research output: Contribution to conference (No Proceedings)Paperpeer-review

Abstract

Voice Activity Detection in movies is a non-trivial and challenging task. The different emotional states of the speakers, as well as the variety of soundscapes and noises contribute to the complexity of the task. In this paper, we propose a set of lightweight features that are specifically designed to perform under such conditions, while at the same time preventing confusions of singing voice with speech. For evaluation, we use four fulllength movies, previously unseen to the system and painstakingly annotated. We compare our detector to a state-of-the-art reference system. The new approach performs better, yielding just about half the Equal Error Rate (EER). Furthermore, since the ground truth annotation task is extremely tedious, and to help with advancing in this topic, we release the annotations of all four movies to the research community.
Original languageEnglish
Number of pages5
DOIs
Publication statusPublished - 2015

Keywords

  • Speech detection
  • Voice activity detection

Fingerprint

Dive into the research topics of 'Improving voice activity detection in movies'. Together they form a unique fingerprint.

Cite this