Comparative Study of MFCC Feature with Different Machine Learning Techniques in Acoustic Scene Classification (original) (raw)
Related papers
PERFORMANCE ACCURACY OF CLASSIFICATION ON ENVIRONMENTAL SOUND CLASSIFICATION (ESC_50) DATASET
IJCIRAS, 2020
The classification of audio dataset is intended to distinguish between the different source of audio such as indoor, outdoor and environmental sounds. The environmental sound classification (ESC-50) dataset is composed with a labeled set of 2000 environmental recordings. The spectral centroid method is applied to extract audio features from ESC-50 dataset with waveform audio file (WAV) format. The decision tree is easy to implement and fast for fitting and prediction therefore this proposed system is utilized the coarse tree and medium tree as a classifier. Then fivefold cross-validation is also applied to evaluate the performance of classifier. The proposed system is implemented by using Matlab programming. The classification accuracy of coarse tree is 63.8% whereas the medium tree is 58.6% on ESC-50 dataset.
Acoustic Scene Classification: A Competition Review
2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), 2018
In this paper we study the problem of acoustic scene classification, i.e., categorization of audio sequences into mutually exclusive classes based on their spectral content. We describe the methods and results discovered during a competition organized in the context of a graduate machine learning course; both by the students and external participants. We identify the most suitable methods and study the impact of each by performing an ablation study of the mixture of approaches. We also compare the results with a neural network baseline, and show the improvement over that. Finally, we discuss the impact of using a competition as a part of a university course, and justify its importance in the curriculum based on student feedback.
Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification
Interspeech 2018, 2018
Acoustic scene classification (ASC) is an audio signal processing task where mel-scaled spectral features are widely used by researchers. These features, considered de facto baseline in speech processing, traditionally employ Fourier based transforms. Unlike speech, environmental audio spans a larger range of audible frequency and might contain short high-frequency transients and continuous low-frequency background noise, simultaneously. Wavelets, with a better time-frequency localization capacity, can be considered more suitable for dealing with such signals. This paper attempts ASC by a novel use of wavelet transform based mel-scaled features. The proposed features are shown to possess better discriminative properties than other spectral features while using a similar classification framework. The experiments are performed on two datasets, similar in scene classes but differing by dataset size and length of the audio samples. When compared with two benchmark systems, one based on mel-frequency cepstral coefficients and Gaussian mixture models, and the other based on log mel-band energies and multi-layer perceptron, the proposed system performed considerably better on the test data.
Identification of Recorded Audio Location Using Acoustic Environment Classification
International Journal of Advance Research and Innovative Ideas in Education, 2015
There are many artifact and different distortions present in the recording. The reverberation depends on the volume properties of the room and it causes the calumniation of the recording. The background noise depends on the unnecessary audio source activities present in the evident recording. For audio to be considered as proof in a court, its authenticity must be verified. A blind deconvolution method based on FIR filtering and overlap add method is used to estimate reverberation time. Particle filtering is used to estimate the background noise. Feature extraction is done by using MFCC approach. The 128 Dimensional feature vector is the addition of features from acoustic reverberation and background noise and the higher order statistics. SVM classifier is used for classification of the environments. The performance of the system is checked using audio recordings dataset. The SVM classifier provides best results for the trained dataset and moderate results for untrained dataset.
A Robust Framework for Acoustic Scene Classification
Interspeech 2019, 2019
Acoustic scene classification (ASC) using front-end timefrequency features and back-end neural network classifiers has demonstrated good performance in recent years. However a profusion of systems has arisen to suit different tasks and datasets, utilising different feature and classifier types. This paper aims at a robust framework that can explore and utilise a range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance. In particular, we exploit three different types of frontend time-frequency feature; log energy Mel filter, Gammatone filter and constant Q transform. At the back-end we evaluate effective a two-stage model that exploits a Convolutional Neural Network for pre-trained feature extraction, followed by Deep Neural Network classifiers as a post-trained feature adaptation model and classifier. We also explore the use of a data augmentation technique for these features that effectively generates a variety of intermediate data, reinforcing model learning abilities, particularly for marginal cases. We assess performance on the DCASE2016 dataset, demonstrating good classification accuracies exceeding 90%, significantly outperforming the DCASE2016 baseline and highly competitive compared to state-of-the-art systems.
Hybrid Computerized Method for Environmental Sound Classification
IEEE Access, 2020
Classification of environmental sounds plays a key role in security, investigation, robotics since the study of the sounds present in a specific environment can allow to get significant insights. Lack of standardized methods for an automatic and effective environmental sound classification (ESC) creates a need to be urgently satisfied. As a response to this limitation, in this paper, a hybrid model for automatic and accurate classification of environmental sounds is proposed. Optimum allocation sampling (OAS) is used to elicit the informative samples from each class. The representative samples obtained by OAS are turned into the spectrogram containing their time-frequency-amplitude representation by using a short-time Fourier transform (STFT). The spectrogram is then given as an input to pre-trained AlexNet and Visual Geometry Group (VGG)-16 networks. Multiple deep features are extracted using the pre-trained networks and classified by using multiple classification techniques namely decision tree (fine, medium, coarse kernel), k-nearest neighbor (fine, medium, cosine, cubic, coarse and weighted kernel), support vector machine, linear discriminant analysis, bagged tree and softmax classifiers. The ESC-10, a ten-class environmental sound dataset, is used for the evaluation of the methodology. An accuracy of 90.1%, 95.8%, 94.7%, 87.9%, 95.6%, and 92.4% is obtained with a decision tree, k-neared neighbor, support vector machine, linear discriminant analysis, bagged tree and softmax classifier respectively. The proposed method proved to be robust, effective, and promising in comparison with other existing state-of-the-art techniques, using the same dataset. INDEX TERMS Environmental sound classification, optimal allocation sampling, spectrogram, convolutional neural network, classification techniques.
ArXiv, 2018
This paper describes an acoustic scene classification method which achieved the 4th ranking result in the IEEE AASP challenge of Detection and Classification of Acoustic Scenes and Events 2016. In order to accomplish the ensuing task, several methods are explored in three aspects: feature extraction, feature transformation, and score fusion for final decision. In the part of feature extraction, several features are investigated for effective acoustic scene classification. For resolving the issue that the same sound can be heard in different places, a feature transformation is applied for better separation for classification. From these, several systems based on different feature sets are devised for classification. The final result is determined by fusing the individual systems. The method is demonstrated and validated by the experiment conducted using the Challenge database.
Robust features for environmental sound classification
2013 IEEE International Conference on Electronics, Computing and Communication Technologies, 2013
ABSTRACT In this paper we describe algorithms to classify environmental sounds with the aim of providing contextual information to devices such as hearing aids for optimum performance. We use signal sub-band energy to construct signal-dependent dictionary and matching pursuit algorithms to obtain a sparse representation of a signal. The coefficients of the sparse vector are used as weights to compute weighted features. These features, along with mel frequency cepstral coefficients (MFCC), are used as feature vectors for classification. Experimental results show that the proposed method gives an accuracy as high as 95.6 %, while classifying 14 categories of environmental sound using a gaussian mixture model (GMM).
Audio feature extraction and analysis for scene classification
Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing
Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence. where s n (i) is the i-th sample in the n-th frame audio signal and N is the frame length.
Analysis and classification of acoustic scenes with wavelet transform-based mel-scaled features
Multimedia Tools and Applications, 2020
Analysis of audio from real-life environments and their categorization into different acoustic scenes can make context-aware devices and applications more efficient. Unlike speech, such signals have overlapping frequency content while spanning a much larger audible frequency range. Also, they are less structured than speech/music signals. Wavelet transform has good time-frequency localization ability owing to its variable-length basis functions. Consequently, it facilitates the extraction of more characteristic information from environmental audio. This paper attempts to classify acoustic scenes by a novel use of wavelet-based mel-scaled features. The design of the proposed framework is based on the experiments conducted on two datasets which have same scene classes but differ with regard to sample length and amount of data (in hours). It outperformed two benchmark systems, one based on mel-frequency cepstral coefficients and Gaussian mixture models and the other based on log mel-band energies and multi-layer perceptron. We also present an investigation on the use of different train and test sample duration for acoustic scene classification. Keywords DCASE • Environmental sounds • Haar function • MFCC • SVM 1 Introduction Machine hearing has emerged as a rapidly growing field in audio signal processing [19]. Research domains such as computational auditory scene analysis [4], soundscape cognition [9], acoustic event detection (AED) [18, 20, 22], and acoustic scene classification (ASC) [2] come under its fold. ASC involves analysis of an audio signal based on the acoustic characteristics of the recording location and labeling it textually. The advances in this area can be attributed to the importance associated with the knowledge of environmental sounds Shefali Waldekar