A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification

Hamid Eghbal-zadeh; Bernhard Lehner; Matthias Dorfer; Gerhard Widmer

doi:10.23919/EUSIPCO.2017.8081711

A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification

Hamid Eghbal-zadeh, Bernhard Lehner, Matthias Dorfer, Gerhard Widmer

Research output: Conference proceeding/Chapter in Book/Report/ › Conference Paper › peer-review

Abstract

In Acoustic Scene Classification (ASC) two major approaches have been followed. While one utilizes engineered features such as mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features that are the outcome of an optimization algorithm. I-vectors are the result of a modeling technique that usually takes engineered features as input. It has been shown that standard MFCCs extracted from monaural audio signals lead to i-vectors that exhibit poor performance, especially on indoor acoustic scenes. At the same time, Convolutional Neural Networks (CNNs) are well known for their ability to learn features by optimizing their filters. They have been applied on ASC and have shown promising results. In this paper, we first propose a novel multi-channel i-vector extraction and scoring scheme for ASC, improving their performance on indoor and outdoor scenes. Second, we propose a CNN architecture that achieves promising ASC results. Further, we show that i-vectors and CNNs capture complementary information from acoustic scenes. Finally, we propose a hybrid system for ASC using multi-channel i-vectors and CNNs by utilizing a score fusion technique. Using our method, we participated in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1^st rank among 49 submissions, substantially improving the previous state of the art.

Original language	English
Title of host publication	2017 25th European Signal Processing Conference (EUSIPCO)
Pages	2749-2753
Number of pages	5
DOIs	https://doi.org/10.23919/EUSIPCO.2017.8081711
Publication status	Published - 2 Sept 2017
Externally published	Yes
Event	2017 25th European Signal Processing Conference (EUSIPCO) - Kos Duration: 28 Aug 2017 → 2 Sept 2017

Conference

Conference	2017 25th European Signal Processing Conference (EUSIPCO)
Period	28/08/17 → 2/09/17

Keywords

Feature extraction
Mel frequency cepstral coefficient
Adaptation models
Training
Computational modeling
Neural networks

Access to Document

10.23919/EUSIPCO.2017.8081711

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8081711

Cite this

@inproceedings{1a5ad497de834bf285294167064f4bff,

title = "A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification",

abstract = "In Acoustic Scene Classification (ASC) two major approaches have been followed. While one utilizes engineered features such as mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features that are the outcome of an optimization algorithm. I-vectors are the result of a modeling technique that usually takes engineered features as input. It has been shown that standard MFCCs extracted from monaural audio signals lead to i-vectors that exhibit poor performance, especially on indoor acoustic scenes. At the same time, Convolutional Neural Networks (CNNs) are well known for their ability to learn features by optimizing their filters. They have been applied on ASC and have shown promising results. In this paper, we first propose a novel multi-channel i-vector extraction and scoring scheme for ASC, improving their performance on indoor and outdoor scenes. Second, we propose a CNN architecture that achieves promising ASC results. Further, we show that i-vectors and CNNs capture complementary information from acoustic scenes. Finally, we propose a hybrid system for ASC using multi-channel i-vectors and CNNs by utilizing a score fusion technique. Using our method, we participated in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1st rank among 49 submissions, substantially improving the previous state of the art.",

keywords = "Feature extraction, Mel frequency cepstral coefficient, Adaptation models, Training, Computational modeling, Neural networks",

author = "Hamid Eghbal-zadeh and Bernhard Lehner and Matthias Dorfer and Gerhard Widmer",

year = "2017",

month = sep,

day = "2",

doi = "10.23919/EUSIPCO.2017.8081711",

language = "English",

isbn = "978-1-5386-0751-0",

pages = "2749--2753",

booktitle = "2017 25th European Signal Processing Conference (EUSIPCO)",

note = "2017 25th European Signal Processing Conference (EUSIPCO) ; Conference date: 28-08-2017 Through 02-09-2017",

}

Eghbal-zadeh, H, Lehner, B, Dorfer, M & Widmer, G 2017, A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification. in 2017 25th European Signal Processing Conference (EUSIPCO)., 8081711, pp. 2749-2753, 2017 25th European Signal Processing Conference (EUSIPCO), 28/08/17. https://doi.org/10.23919/EUSIPCO.2017.8081711

TY - GEN

T1 - A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification

AU - Eghbal-zadeh, Hamid

AU - Lehner, Bernhard

AU - Dorfer, Matthias

AU - Widmer, Gerhard

PY - 2017/9/2

Y1 - 2017/9/2

N2 - In Acoustic Scene Classification (ASC) two major approaches have been followed. While one utilizes engineered features such as mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features that are the outcome of an optimization algorithm. I-vectors are the result of a modeling technique that usually takes engineered features as input. It has been shown that standard MFCCs extracted from monaural audio signals lead to i-vectors that exhibit poor performance, especially on indoor acoustic scenes. At the same time, Convolutional Neural Networks (CNNs) are well known for their ability to learn features by optimizing their filters. They have been applied on ASC and have shown promising results. In this paper, we first propose a novel multi-channel i-vector extraction and scoring scheme for ASC, improving their performance on indoor and outdoor scenes. Second, we propose a CNN architecture that achieves promising ASC results. Further, we show that i-vectors and CNNs capture complementary information from acoustic scenes. Finally, we propose a hybrid system for ASC using multi-channel i-vectors and CNNs by utilizing a score fusion technique. Using our method, we participated in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1st rank among 49 submissions, substantially improving the previous state of the art.

AB - In Acoustic Scene Classification (ASC) two major approaches have been followed. While one utilizes engineered features such as mel-frequency-cepstral-coefficients (MFCCs), the other uses learned features that are the outcome of an optimization algorithm. I-vectors are the result of a modeling technique that usually takes engineered features as input. It has been shown that standard MFCCs extracted from monaural audio signals lead to i-vectors that exhibit poor performance, especially on indoor acoustic scenes. At the same time, Convolutional Neural Networks (CNNs) are well known for their ability to learn features by optimizing their filters. They have been applied on ASC and have shown promising results. In this paper, we first propose a novel multi-channel i-vector extraction and scoring scheme for ASC, improving their performance on indoor and outdoor scenes. Second, we propose a CNN architecture that achieves promising ASC results. Further, we show that i-vectors and CNNs capture complementary information from acoustic scenes. Finally, we propose a hybrid system for ASC using multi-channel i-vectors and CNNs by utilizing a score fusion technique. Using our method, we participated in the ASC task of the DCASE-2016 challenge. Our hybrid approach achieved 1st rank among 49 submissions, substantially improving the previous state of the art.

KW - Feature extraction

KW - Mel frequency cepstral coefficient

KW - Adaptation models

KW - Training

KW - Computational modeling

KW - Neural networks

UR - https://ieeexplore.ieee.org/document/8081711/

U2 - 10.23919/EUSIPCO.2017.8081711

DO - 10.23919/EUSIPCO.2017.8081711

M3 - Conference Paper

SN - 978-1-5386-0751-0

SP - 2749

EP - 2753

BT - 2017 25th European Signal Processing Conference (EUSIPCO)

T2 - 2017 25th European Signal Processing Conference (EUSIPCO)

Y2 - 28 August 2017 through 2 September 2017

ER -

A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification

Abstract

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this