Analysis and classification of acoustic scenes with wavelet transform-based mel-scaled features (original) (raw)
Related papers
CLASSIFICATION ENVIRONMENTAL SOUND WITH WAVELET
Sound signals can deliver important information about the environment. Therefore, recognition and assessment of these signals are very useful for surveillance purposes. Reaching new methods that have highest accuracy and lowest calculating time is ideal. Generally, wavelet method is used in environmental sound classification. In this paper, we propose both continuous and discrete wavelet transform. Firstly, three features are extracted from signal through calculating its scalogram, and secondly, we decompose signals with daubechies mother wavelet in 4 level and calculate B-spline coefficient and then extract three features from them. Finally, we evaluate this method on 80 sound segment from two airplane collected from a public database. It takes 1.3s for calculating the 0.2 seconds of sound segment. In the end combining these methods help reach a higher accuracy percentage compared to the previous works.
Environmental Sound Recognition: A Survey
There search in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ES R) has received more attention in recent years. Research on ES R has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes three parts: i) basic environmental sound processing schemes, ii) stationary ES R techniques and iii) non-stationary ES R techniques. Finally, concluding remarks and future research and development trends in the ES R field will be given. Index Terms-Environmental S ound Recognition (ES R), Mel filter, Audio Features, Mel-Frequency Cepstral Coefficients.
Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification
Interspeech 2018, 2018
Acoustic scene classification (ASC) is an audio signal processing task where mel-scaled spectral features are widely used by researchers. These features, considered de facto baseline in speech processing, traditionally employ Fourier based transforms. Unlike speech, environmental audio spans a larger range of audible frequency and might contain short high-frequency transients and continuous low-frequency background noise, simultaneously. Wavelets, with a better time-frequency localization capacity, can be considered more suitable for dealing with such signals. This paper attempts ASC by a novel use of wavelet transform based mel-scaled features. The proposed features are shown to possess better discriminative properties than other spectral features while using a similar classification framework. The experiments are performed on two datasets, similar in scene classes but differing by dataset size and length of the audio samples. When compared with two benchmark systems, one based on mel-frequency cepstral coefficients and Gaussian mixture models, and the other based on log mel-band energies and multi-layer perceptron, the proposed system performed considerably better on the test data.
Hybrid Computerized Method for Environmental Sound Classification
IEEE Access, 2020
Classification of environmental sounds plays a key role in security, investigation, robotics since the study of the sounds present in a specific environment can allow to get significant insights. Lack of standardized methods for an automatic and effective environmental sound classification (ESC) creates a need to be urgently satisfied. As a response to this limitation, in this paper, a hybrid model for automatic and accurate classification of environmental sounds is proposed. Optimum allocation sampling (OAS) is used to elicit the informative samples from each class. The representative samples obtained by OAS are turned into the spectrogram containing their time-frequency-amplitude representation by using a short-time Fourier transform (STFT). The spectrogram is then given as an input to pre-trained AlexNet and Visual Geometry Group (VGG)-16 networks. Multiple deep features are extracted using the pre-trained networks and classified by using multiple classification techniques namely decision tree (fine, medium, coarse kernel), k-nearest neighbor (fine, medium, cosine, cubic, coarse and weighted kernel), support vector machine, linear discriminant analysis, bagged tree and softmax classifiers. The ESC-10, a ten-class environmental sound dataset, is used for the evaluation of the methodology. An accuracy of 90.1%, 95.8%, 94.7%, 87.9%, 95.6%, and 92.4% is obtained with a decision tree, k-neared neighbor, support vector machine, linear discriminant analysis, bagged tree and softmax classifier respectively. The proposed method proved to be robust, effective, and promising in comparison with other existing state-of-the-art techniques, using the same dataset. INDEX TERMS Environmental sound classification, optimal allocation sampling, spectrogram, convolutional neural network, classification techniques.
Factors in the identification of environmental sounds
2001
Environmental sounds have been little studied in comparison to the other main classes of naturally-occurring sounds, speech and music. This dissertation describes a systematic investigation into the acoustic factors involved in the identification of a representative set of 70 environmental sounds. The importance of various spectral regions for identification was assessed by testing the identification of octave-width bandpass-filtered environmental sounds on trained listeners. The six filter center frequencies ranged from 212 to 6788 Hz. The poorest identifiability was in the lowest filter band, at 31% correct, whereas in the four highest filters performance was consistently between 70-80% correct (chance was 1.4%). The contribution of temporal information to the identifiability of these sounds was estimated by using 1-Channel Event-Modulated Noises (EMN) which have the amplitude envelopes of the environmental sounds used, but nearly uniform spectra. Six-Channel EMN which contained some coarse-grained spectral information were also utilized. The identification of both sets of EMN was tested on both experienced and naive listeners. With the 1-Channel EMN, Naive listeners performed poorly, only achieving 22% correct, whereas Experienced listeners fared much better, at 46% correct. Naive listeners recognized the 6-Channel EMN much more easily than the 1-Channel, reaching 54% correct. The sounds that were well recognized across all conditions generally had a distinct temporal envelope and few or no salient spectral features. Some acoustic properties seemed to predict the EMN data fairly well.
Studying environmental sounds the Watson way
Journal of The Acoustical Society of America, 2001
After years of research on laboratory-generated complex sounds, in the early 1990s Chuck Watson and colleagues in the Hearing and Communications Laboratory (HCL) became interested in whether sounds with some meaning to the listener were processed differently by the auditory system. So began in his lab a program of environmental sounds research, in the meticulous, deliberate manner Watson was known for. The first step was developing an addition to the Test of Basic Auditory Capabilities (TBAC) which would measure individual differences in the identification of familiar environmental sounds. Next came the psychophysical basics: detection and identification in noise. Then, borrowing a page from early speech researchers, the effects of low-, high-, and bandpass filtering on environmental sounds were investigated, as well as those of processing environmental sounds using vocoder methods. Work has continued outside the HCL on developing a standardized canon of environmental sounds for generalized testing, with an aim to creating diagnostic tests for environmental sounds similar to the SPIN and modified rhyme and reverberation (MRRT).
On feature selection in environmental sound recognition
Given a broad set of content-based audio features, we employ principal component analysis for the composition of an optimal feature set for environmental sounds. We select features based on quantitative data analysis (factor analysis) and conduct retrieval experiments to evaluate the quality of the feature combinations. Retrieval results show that statistical data analysis gives useful hints for feature selection. The experiments show the importance of feature selection in environmental sound recognition.
Environmental acoustic transformation and feature extraction for machine hearing
IOP Conference Series: Materials Science and Engineering, 2019
This paper explores the transformation of environmental sound waveform and feature set into a parametric type representation to be used in analysis, recognition, and identification for auditory analysis of machine hearing systems. Generally, the focus of the research and study in sound recognition is concentrated on the music and speech domains, on the other hand, there are limited in non-speech environmental recognition. We analyzed and evaluated the different current feature algorithms and methods explored for the acoustic recognition of environmental sounds, the Mel Filterbank Energies (FBEs) and Gammatone spectral coefficients (GSTC) and for classifying the sound signal the Convolutional Neural Network (CNN) was used. The result shows that GSTC performs well as a feature compared to FBEs, but FBEs tend to perform better when combined with other feature. This shows that a combination of features set is promising in obtaining a higher accuracy compared to a single feature in environmental sound classification, that is helpful in the development of the machine hearing systems.