Ensemble of Deep Neural Networks for Acoustic Scene Classification (original) (raw)
Related papers
A Layer-wise Score Level Ensemble Framework for Acoustic Scene Classification
2018 26th European Signal Processing Conference (EUSIPCO), 2018
Scene classification based on acoustic information is a challenging task due to various factors such as the nonstationary nature of the environment and multiple overlapping acoustic events. In this paper, we address the acoustic scene classification problem using SoundNet, a deep convolution neural network, pre-trained on raw audio signals. We propose a classification strategy by combining scores from each layer. This is based on the hypothesis that layers of the deep convolutional network learn complementary information and combining this layer-wise information provides better classification than the features extracted from an individual layer. In addition, we also propose a pooling strategy to reduce the dimensionality of features extracted from different layers of SoundNet. Our experiments on DCASE 2016 acoustic scene classification dataset reveals the effectiveness of this layer-wise ensemble approach. The proposed approach provides a relative improvement of approx. 30.85% over the classification accuracy provided by the best individual layer of SoundNet.
CDNN-CRNN JOINED MODEL FOR ACOUSTIC SCENE CLASSIFICATION Technical Report
2019
This work proposes a deep learning framework applied for Acoustic Scene Classification (ASC), targeting DCASE2019 task 1A. In general, the front-end process shows a combination of three types of spectrograms: Gammatone (GAM), log-Mel and Constant Q Transform (CQT). The back-end classification presents a joined learning model between CDNN and CRNN. Our experiments over the development dataset of DCASE2019 challenge task 1A show a significant improvement, increasing 11.2% compared to DCASE2019 baseline of 62.5%. The Kaggle reports the classification accuracy of 74.6% when we train all development dataset.
A Convolutional Neural Network Approach for Acoustic Scene Classification
—This paper presents a novel application of convo-lutional neural networks (CNNs) for the task of acoustic scene classification (ASC). We here propose the use of a CNN trained to classify short sequences of audio, represented by their log-mel spectrogram. We also introduce a training method that can be used under particular circumstances in order to make full use of small datasets. The proposed system is tested and evaluated on three different ASC datasets and compared to other state-of-the-art systems which competed in the " Detection and Classification of Acoustic Scenes and Events " (DCASE) challenges held in 2016 1 and 2013. The best accuracy scores obtained by our system on the DCASE 2016 datasets are 79.0% (development) and 86.2% (evaluation), which constitute a 6.4% and 9% improvements with respect to the baseline system. Finally, when tested on the DCASE 2013 evaluation dataset, the proposed system manages to reach a 77.0% accuracy, improving by 1% the challenge winner's score.
Deep Learning Frameworks Applied For Audio-Visual Scene Classification
2021
In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance.Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification accuracy of 82.2\%, 91.1\%, and 93.9\% with audio input only, visual input only, and both audio-visual input, respectively.The highest classification accuracy of 93.9\%, obtained from an ensemble of audio-based and visual-based frameworks, shows an improvement of 16.5\% compared with DCASE baseline.
AUDIO SCENE CLASSIFICATION USING ENHANCED CONVOLUTIONAL NEURAL NETWORKS FOR DCASE 2021 CHALLENGE
This technical report describes our system proposed for Task 1B-AudioVisual Scene Classification of the DCASE 2021 Challenge. Our system focuses in the audio signal based classification. The system has an architecture based on the combination of Convolutional Neural Networks and OpenL3 embeddings. The CNN consist of three stacked 2D convolutional layers to process the log-Mel spectrogram parameters obtained from the input signals. Additionally OpenL3 embeddings of the input signals are also calculated and merged with the output of the CNN stack. The resulting vector is fed to a classification block consisting of three fully connected layers. Mixup augmentation technique is applied to the training data and binaural data is also used as input to provide additional information. In this report, we describe the proposed systems in detail and compare them to the baseline approach using the provided development datasets.
A Low-Compexity Deep Learning FrameworkFor Acoustic Scene Classification
2021
In this paper, we presents a low-complexitydeep learning frameworks for acoustic scene classification(ASC). The proposed framework can be separated into threemain steps: Front-end spectrogram extraction, back-endclassification, and late fusion of predicted probabilities.First, we use Mel filter, Gammatone filter and ConstantQ Transfrom (CQT) to transform raw audio signal intospectrograms, where both frequency and temporal featuresare presented. Three spectrograms are then fed into threeindividual back-end convolutional neural networks (CNNs),classifying into ten urban scenes. Finally, a late fusion ofthree predicted probabilities obtained from three CNNs isconducted to achieve the final classification result. To reducethe complexity of our proposed CNN network, we applytwo model compression techniques: model restriction anddecomposed convolution. Our extensive experiments, whichare conducted on DCASE 2021 (IEEE AASP Challenge onDetection and Classification of Acoustic Scenes and Eve...
A Robust Framework for Acoustic Scene Classification
Interspeech 2019, 2019
Acoustic scene classification (ASC) using front-end timefrequency features and back-end neural network classifiers has demonstrated good performance in recent years. However a profusion of systems has arisen to suit different tasks and datasets, utilising different feature and classifier types. This paper aims at a robust framework that can explore and utilise a range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance. In particular, we exploit three different types of frontend time-frequency feature; log energy Mel filter, Gammatone filter and constant Q transform. At the back-end we evaluate effective a two-stage model that exploits a Convolutional Neural Network for pre-trained feature extraction, followed by Deep Neural Network classifiers as a post-trained feature adaptation model and classifier. We also explore the use of a data augmentation technique for these features that effectively generates a variety of intermediate data, reinforcing model learning abilities, particularly for marginal cases. We assess performance on the DCASE2016 dataset, demonstrating good classification accuracies exceeding 90%, significantly outperforming the DCASE2016 baseline and highly competitive compared to state-of-the-art systems.
Acoustic Scene Analysis and Classification Using Densenet Convolutional Neural Network
In this paper we present an account of state-of the-art in Acoustic Scene Classification (ASC), the task of environmental scenario classification through the sounds they produce. Our work aims to classify 50 different outdoor and indoor scenario using environmental sounds. We use a dataset ESC-50 from the IEEE challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this we propose to use 2000 different environmental audio recordings. In this method the raw audio data is converted into Mel-spectrogram and other characteristics like Tonnetz, Chroma and MFCC. The generated Mel-spectrogram is fed as an input to neural network for training. Our model follows structure of neural network in the form of convolution and pooling. With a focus on real time environmental classification and to overcome the problem of low generalization in the model, the paper introduced augmentation to achieve modified noise based audio by adding gaussian white noise. Active researche...
ArXiv, 2018
This paper describes an acoustic scene classification method which achieved the 4th ranking result in the IEEE AASP challenge of Detection and Classification of Acoustic Scenes and Events 2016. In order to accomplish the ensuing task, several methods are explored in three aspects: feature extraction, feature transformation, and score fusion for final decision. In the part of feature extraction, several features are investigated for effective acoustic scene classification. For resolving the issue that the same sound can be heard in different places, a feature transformation is applied for better separation for classification. From these, several systems based on different feature sets are devised for classification. The final result is determined by fusing the individual systems. The method is demonstrated and validated by the experiment conducted using the Challenge database.