A Layer-wise Score Level Ensemble Framework for Acoustic Scene Classification (original) (raw)

2018, 2018 26th European Signal Processing Conference (EUSIPCO)

Scene classification based on acoustic information is a challenging task due to various factors such as the nonstationary nature of the environment and multiple overlapping acoustic events. In this paper, we address the acoustic scene classification problem using SoundNet, a deep convolution neural network, pre-trained on raw audio signals. We propose a classification strategy by combining scores from each layer. This is based on the hypothesis that layers of the deep convolutional network learn complementary information and combining this layer-wise information provides better classification than the features extracted from an individual layer. In addition, we also propose a pooling strategy to reduce the dimensionality of features extracted from different layers of SoundNet. Our experiments on DCASE 2016 acoustic scene classification dataset reveals the effectiveness of this layer-wise ensemble approach. The proposed approach provides a relative improvement of approx. 30.85% over the classification accuracy provided by the best individual layer of SoundNet.