Improving voice activity detection in movies

Bernhard Lehner, Gerhard Widmer, Reinhard Sonnleitner

Publikation: KonferenzbeitragPapierBegutachtung

Abstract

Voice Activity Detection in movies is a non-trivial and challenging task. The different emotional states of the speakers, as well as the variety of soundscapes and noises contribute to the complexity of the task. In this paper, we propose a set of lightweight features that are specifically designed to perform under such conditions, while at the same time preventing confusions of singing voice with speech. For evaluation, we use four fulllength movies, previously unseen to the system and painstakingly annotated. We compare our detector to a state-of-the-art reference system. The new approach performs better, yielding just about half the Equal Error Rate (EER). Furthermore, since the ground truth annotation task is extremely tedious, and to help with advancing in this topic, we release the annotations of all four movies to the research community.
OriginalspracheEnglisch
Seitenumfang5
DOIs
PublikationsstatusVeröffentlicht - 2015

Fingerprint

Untersuchen Sie die Forschungsthemen von „Improving voice activity detection in movies“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren