Feature Extraction of Surround Sound Recordings for Acoustic Scene Classification (original) (raw)

Feature Extraction of Binaural Recordings for Acoustic Scene Classification

2018 Federated Conference on Computer Science and Information Systems (FedCSIS), 2018

Binaural technology becomes increasingly popular in the multimedia systems. This paper identifies a set of features of binaural recordings suitable for the automatic classification of the four basic spatial audio scenes representing the most typical patterns of audio content distribution around a listener. Moreover, it compares the five artificial-intelligence-based methods applied to the classification of binaural recordings. The results show that both the spatial and the spectro-temporal features are essential to accurate classification of binaurally rendered acoustic scenes. The spectro-temporal features appear to have a stronger influence on the classification results than the spatial metrics. According to the obtained results, the method based on the support vector machine, exploiting the features identified in the study, yields the classification accuracy approaching 84%.

Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

Applied Sciences, 2019

The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.

A COMPARISON OF TECHNIQUES FOR AUDIO ENVIRONMENT CLASSIFICATION

Excessive background noise is one of the most common complaints from hearing aid users. Background noise classification systems can be used in hearing aids to adjust the response based on the noise environment. This paper examines and compares several classification techniques in the form of the k-nearest neighbours (K-NN)classifier, the non-windowed artificial neural network (ANN) and the hidden Markov model (HMM), to an artificial neural network using windowed input (WANN). Results obtained indicate that the WANN gives an accuracy of up to 97.9%, which is the highest accuracy of the tested classifiers. The memory and computational requirements of the windowed ANN are also small compared to the HMM and K-NN. Overall, the WANN is able to give excellent accuracy and reliability and is considered to be a good choice for background noise classification in hearing aids.

Audio feature extraction and analysis for scene classification

Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing

Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence. where s n (i) is the i-th sample in the n-th frame audio signal and N is the frame length.

Classification of Audio Data using Support Vector Machine

2011

Audio mining is to extract audio signals for indicating patterns and features of audio data to get data mining results. Various audio features like Mel frequency Cepstral Coefficient (MFCC), Linear Predictive Coefficient (LPC), Compactness, Spectral Flux (SF), Band Periodicity (BP), Zero Crossing Rate (ZCR) etc are used to classify audio data into various classes. Various classification algorithms such as Naive Bayes, FT, J48, ID3 and LibSVM are used to classify audio data into defined classes. Using various performance parameters such as True Positive (TP) Rate, False Positive (FP) Rate etc., results of various classification algorithms are compared.

Content-based audio classification and segmentation by using support vector machines

Multimedia Systems, 2003

Content-based audio classification and segmentation is a basis for further audio/video analysis. In this paper, we present our work on audio segmentation and classification which employs support vector machines (SVMs). Five audio classes are considered in this paper: silence, music, background sound, pure speech, and non-pure speech which includes speech over music and speech over noise. A sound stream is segmented by classifying each sub-segment into one of these five classes. We have evaluated the performance of SVM on different audio type-pairs classification with testing unit of different-length and compared the performance of SVM, K-Nearest Neighbor (KNN), and Gaussian Mixture Model (GMM). We also evaluated the effectiveness of some new proposed features. Experiments on a database composed of about 4-hour audio data show that the proposed classifier is very efficient on audio classification and segmentation. It also shows the accuracy of the SVM-based method is much better than the method based on KNN and GMM.

Acoustic Scene Classification: A Competition Review

2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), 2018

In this paper we study the problem of acoustic scene classification, i.e., categorization of audio sequences into mutually exclusive classes based on their spectral content. We describe the methods and results discovered during a competition organized in the context of a graduate machine learning course; both by the students and external participants. We identify the most suitable methods and study the impact of each by performing an ablation study of the mixture of approaches. We also compare the results with a neural network baseline, and show the improvement over that. Finally, we discuss the impact of using a competition as a part of a university course, and justify its importance in the curriculum based on student feedback.

Classification of audio scenes with novel features in a fused system framework

Digital Signal Processing, 2018

The rapidly increasing requirements from context-aware gadgets, like smartphones and intelligent wearable devices, along with applications such as audio archiving, have given a fillip to the research in the field of Acoustic Scene Classification (ASC). The Detection and Classification of Acoustic Scenes and Events (DCASE) challenges have seen systems addressing the problem of ASC from different directions. Some of them could achieve better results than the Mel Frequency Cepstral Coefficients-Gaussian Mixture Model (MFCC-GMM) baseline system. However, a collective decision from all participating systems was found to surpass the accuracy obtained by each system. The simultaneous use of various approaches can exploit the discriminating information in a better way for audio collected from different environments covering audible-frequency range in varying degrees. In this work, we show that the framelevel statistics of some well-known spectral features when fed to Support Vector Machine (SVM) classifier individually, are able to outperform the baseline system of DCASE challenges. Furthermore, we analyzed different methods of combining these features, and also of combining information from two channels when the data is in binaural format. The proposed approach resulted in around 17% and 9% relative improvement in accuracy with respect to the baseline system on the development and evaluation dataset, respectively, from DCASE 2016 ASC task.

Automatic Sound Classification Inspired by Auditory Scene Analysis

2000

A sound classification system for the automatic recognition of the acoustic environment in a hearing instrument is dis- cussed. The system distinguishes the four sound classes 'clean speech', 'speech in noise', 'noise', and 'music' and is based on auditory features and hidden Markov models. The em- ployed features describe level fluctuations, the spectral form and harmonicity. Sounds from a large

Feature Analysis for Audio Classification

Lecture Notes in Computer Science, 2014

In this work we analyze and implement several audio features. We emphasize our analysis on the ZCR feature and propose a modification making it more robust when signals are near zero. They are all used to discriminate the following audio classes: music, speech, environmental sound. An SVM classifier is used as a classification tool, which has proven to be efficient for audio classification. By means of a selection heuristic we draw conclusions of how they may be combined for fast classification.