Robust features for environmental sound classification (original) (raw)

Audio classification based on sparse coefficients

Sensor Signal Processing for Defence (SSPD 2011), 2011

Audio signal classification is usually done using conventional signal features such as mel-frequency cepstrum coefficients (MFCC), line spectral frequencies (LSF), and short time energy (STM). Learned dictionaries have been shown to have promising capability for creating sparse representation of a signal and hence have a potential to be used for the extraction of signal features. In this paper, we consider to use sparse features for audio classification from music and speech data. We use the K-SVD algorithm to learn separate dictionaries for the speech and music signals to represent their respective subspaces and use them to extract sparse features for each class of signals using Orthogonal Matching Pursuit (OMP). Based on these sparse features, Support Vector Machines (SVM) are used for speech and music classification. The same signals were also classified using SVM based on the conventional MFCC coefficients and the classification results were compared to those of sparse coefficients. It was found that at lower signal to noise ratio (SNR), sparse coefficients give far better signal classification results as compared to the MFCC based classification.

Evaluation of sound classification algorithms for hearing aid applications

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010

Automatic program switching has been shown to be greatly beneficial for hearing aid users. This feature is mediated by a sound classification system, which is traditionally implemented using simple features and heuristic classification schemes, resulting in an unsatisfactory performance in complex auditory scenarios. In this study, a number of experiments are conducted to systematically assess the impact of more sophisticated classifiers and features on automatic acoustic environment classification performance. The results show that advanced classifiers, such as Hidden Markov Model (HMM) or Gaussian Mixture Model (GMM), greatly improve classification performance over simple classifiers. This change does not require a great increase of computational complexity, provided that a suitable number (5 to 7) of low-level features are carefully chosen. These findings indicate that advanced classifiers can be feasible in hearing aid applications.

On feature selection in environmental sound recognition

Given a broad set of content-based audio features, we employ principal component analysis for the composition of an optimal feature set for environmental sounds. We select features based on quantitative data analysis (factor analysis) and conduct retrieval experiments to evaluate the quality of the feature combinations. Retrieval results show that statistical data analysis gives useful hints for feature selection. The experiments show the importance of feature selection in environmental sound recognition.

Hybrid Computerized Method for Environmental Sound Classification

IEEE Access, 2020

Classification of environmental sounds plays a key role in security, investigation, robotics since the study of the sounds present in a specific environment can allow to get significant insights. Lack of standardized methods for an automatic and effective environmental sound classification (ESC) creates a need to be urgently satisfied. As a response to this limitation, in this paper, a hybrid model for automatic and accurate classification of environmental sounds is proposed. Optimum allocation sampling (OAS) is used to elicit the informative samples from each class. The representative samples obtained by OAS are turned into the spectrogram containing their time-frequency-amplitude representation by using a short-time Fourier transform (STFT). The spectrogram is then given as an input to pre-trained AlexNet and Visual Geometry Group (VGG)-16 networks. Multiple deep features are extracted using the pre-trained networks and classified by using multiple classification techniques namely decision tree (fine, medium, coarse kernel), k-nearest neighbor (fine, medium, cosine, cubic, coarse and weighted kernel), support vector machine, linear discriminant analysis, bagged tree and softmax classifiers. The ESC-10, a ten-class environmental sound dataset, is used for the evaluation of the methodology. An accuracy of 90.1%, 95.8%, 94.7%, 87.9%, 95.6%, and 92.4% is obtained with a decision tree, k-neared neighbor, support vector machine, linear discriminant analysis, bagged tree and softmax classifier respectively. The proposed method proved to be robust, effective, and promising in comparison with other existing state-of-the-art techniques, using the same dataset. INDEX TERMS Environmental sound classification, optimal allocation sampling, spectrogram, convolutional neural network, classification techniques.

Sound Classification in Hearing Aids Inspired by Auditory Scene Analysis

Eurasip Journal on Advances in Signal Processing - EURASIP J ADV SIGNAL PROCESS, 2005

A sound classification system for the automatic recognition of the acoustic environment in a hearing aid is discussed. The system distinguishes the four sound classes "clean speech," "speech in noise," "noise," and "music." A number of features that are inspired by auditory scene analysis are extracted from the sound signal. These features describe amplitude modulations, spectral profile, harmonicity, amplitude onsets, and rhythm. They are evaluated together with different pattern classifiers. Simple classifiers, such as rule-based and minimum-distance classifiers, are compared with more complex approaches, such as Bayes classifier, neural network, and hidden Markov model. Sounds from a large database are employed for both training and testing of the system. The achieved recognition rates are very high except for the class "speech in noise." Problems arise in the classification of compressed pop music, strongly reverberated speech, and tonal or fluctuating noises.

Towards an optimal feature set for environmental sound recognition

Feature selection for audio retrieval is a non-trivial task. In this paper we aim at identifying an optimal feature combination for environmental sound recognition. The feature combination is constructed from a broad set of features. Additionally to state-of-the-art features, we evaluate the quality of audio features we previously introduced for another domain. We examine the properties of features by quantitative data analysis (factor analysis) and identify candidates for feature combinations. We verify the quality of the combination by retrieval experiments. The optimal solution yields Recall and Precision values of 87% and 88%, respectively.

Automatic Sound Classification Inspired by Auditory Scene Analysis

2000

A sound classification system for the automatic recognition of the acoustic environment in a hearing instrument is dis- cussed. The system distinguishes the four sound classes 'clean speech', 'speech in noise', 'noise', and 'music' and is based on auditory features and hidden Markov models. The em- ployed features describe level fluctuations, the spectral form and harmonicity. Sounds from a large

Environmental Sound Recognition: A Survey

There search in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ES R) has received more attention in recent years. Research on ES R has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes three parts: i) basic environmental sound processing schemes, ii) stationary ES R techniques and iii) non-stationary ES R techniques. Finally, concluding remarks and future research and development trends in the ES R field will be given. Index Terms-Environmental S ound Recognition (ES R), Mel filter, Audio Features, Mel-Frequency Cepstral Coefficients.

Soft computing based feature selection for environmental sound classification

The topic of this thesis work is soft computing based feature selection for environmental sound classification. Environmental sound classification systems have a wide range of applications, like hearing aids devices, handheld devices and auditory protection devices. Sound classification systems typically extract features which are learnt by a classifier. Using too many features can result in reduced performance by making the learning algorithm to learn wrong models. The proper selection of features for sound classification is a non-trivial task. Soft computing based feature selection methods are not studied for environmental sound classification, whereas these methods are very promising, because these can handle uncertain information in a more efficient way, using simple set theoretic functions and because these methods are more close to perception based reasoning. Therefore this thesis investigates different feature selection methods, including soft computing based feature selection and classical information, entropy and correlation based approaches. Results of this study show that rough set neighborhood based method performs best in terms of number of features selected, recognition rate and consistency of performance. Also the resulting classification system performs robustly in presence of reverberation.

Environmental acoustic transformation and feature extraction for machine hearing

IOP Conference Series: Materials Science and Engineering, 2019

This paper explores the transformation of environmental sound waveform and feature set into a parametric type representation to be used in analysis, recognition, and identification for auditory analysis of machine hearing systems. Generally, the focus of the research and study in sound recognition is concentrated on the music and speech domains, on the other hand, there are limited in non-speech environmental recognition. We analyzed and evaluated the different current feature algorithms and methods explored for the acoustic recognition of environmental sounds, the Mel Filterbank Energies (FBEs) and Gammatone spectral coefficients (GSTC) and for classifying the sound signal the Convolutional Neural Network (CNN) was used. The result shows that GSTC performs well as a feature compared to FBEs, but FBEs tend to perform better when combined with other feature. This shows that a combination of features set is promising in obtaining a higher accuracy compared to a single feature in environmental sound classification, that is helpful in the development of the machine hearing systems.