Identification of Recorded Audio Location Using Acoustic Environment Classification (original) (raw)
Related papers
Audio Recording Location Identification Using Acoustic Environment Signature
IEEE Transactions on Information Forensics and Security, 2000
An audio recording is subject to a number of possible distortions and artifacts. Consider, for example, artifacts due to acoustic reverberation and background noise. The acoustic reverberation depends on the shape and the composition of the room, and it causes temporal and spectral smearing of the recorded sound. The background noise, on the other hand, depends on the secondary audio source activities present in the evidentiary recording. Extraction of acoustic cues from an audio recording is an important but challenging task. Temporal changes in the estimated reverberation and background noise can be used for dynamic acoustic environment identification (AEI), audio forensics, and ballistic settings. We describe a statistical technique based on spectral subtraction to estimate the amount of reverberation and nonlinear filtering based on particle filtering to estimate the background noise. The effectiveness of the proposed method is tested using a data set consisting of speech recordings of two human speakers (one male and one female) made in eight acoustic environments using four commercial grade microphones. Performance of the proposed method is evaluated for various experimental settings such as microphone independent, semi-and full-blind AEI, and robustness to MP3 compression. Performance of the proposed framework is also evaluated using Temporal Derivative-based Spectrum and Mel-Cepstrum (TDSM)-based features. Experimental results show that the proposed method improves AEI performance compared with the direct method (i.e., feature vector is extracted from the audio recording directly). In addition, experimental results also show that the proposed scheme is robust to MP3 compression attack.
The task of labelling the audio sample in outdoor condition or indoor condition is called Acoustic Scene Classification (ASC). The ASC use acoustic information to imply about the context of the recorded environment. Since ASC can only applied in indoor environment in real world, a new set of strategies and classification techniques are required to consider for outdoor environment. In this paper, we present the comparative study of different machine learning classifiers with Mel-Frequency Cepstral Coefficients (MFCC) feature. We used DCASE Challenge 2016 dataset to show the properties of machine learning classifiers. There are several classifiers to address the ASC task. In this paper, we compare the properties of different classifiers: K-nearest neighbours (KNN), Support Vector Machine (SVM), Decision Tree (ID3) and Linear Discriminant Analysis by using MFCC feature. The best of classification methodology and feature extraction are essential for ASC task. In this comparative study, we extract MFCC feature from acoustic scene audio and then extracted feature is applied in different classifiers to know the advantages of classifiers for MFCC feature. This paper also proposed the MFCC-moment feature for ASC task by considering the statistical moment information of MFCC feature.
Name That Room: Room identification using acoustic features in a recording
20th ACM international conference on Multimedia - MM '12, 2012
This paper presents a system for identifying the room in an audio or video recording through the analysis of acoustical properties. The room identification system was tested using a corpus of 13440 reverberant audio samples. With no common content between the training and testing data, an accuracy of 61% for musical signals and 85% for speech signals was achieved. This approach could be applied in a variety of scenarios where knowledge about the acoustical environment is desired, such as location estimation, music recommendation, or emergency response systems.
Feature Extraction of Surround Sound Recordings for Acoustic Scene Classification
Artificial Intelligence and Soft Computing, 2018
Binaural technology becomes increasingly popular in the multimedia systems. This paper identifies a set of features of binaural recordings suitable for the automatic classification of the four basic spatial audio scenes representing the most typical patterns of audio content distribution around a listener. Moreover, it compares the five artificial-intelligence-based methods applied to the classification of binaural recordings. The results show that both the spatial and the spectro-temporal features are essential to accurate classification of binaurally rendered acoustic scenes. The spectro-temporal features appear to have a stronger influence on the classification results than the spatial metrics. According to the obtained results, the method based on the support vector machine, exploiting the features identified in the study, yields the classification accuracy approaching 84%. I.
Identifying the Sound Event Recognition using Machine learning And Artificial Intelligence
Design Engineering, 2021
The analysis of sound information is extremely useful in a variety of applications such as multimedia information retrieval, audio surveillance, audio tagging, and forensic investigations. The analysis of an audio clip is performed in order to detect sound events. Applications of this technology include security systems, smart vehicle navigation and noise pollution monitoring. Sound Event Recognition (SER) is the focus of this research proposal. As compared to long-duration audio scenes, sound events have a short duration of about 100 to 500 milliseconds. A machine-learning model is being trained and tested in this paper, which can be incorporated in an automated data collection process. In this experiment, Convolutional Neural Network (CNN), Support Vector Machine (SVM), Hidden Markov Model (HMM) and Random Forest were compared.
Acoustic environment identification using unsupervised learning
Security Informatics, 2014
Acoustic environment leaves its characteristic signature in the audio recording captured in it. The acoustic environment signature can be modeled using acoustic reverberations and background noise. Acoustic reverberation depends on the geometry and composition of the recording location. The proposed scheme uses similarity in the estimated acoustic signature for acoustic environment identification (AEI). We describe a parametric model to realize acoustic reverberation, and a statistical framework based on maximum likelihood estimation is used to estimate the model parameters. The density-based clustering is used for automatic AEI using estimated acoustic parameters. Performance of the proposed framework is evaluated for two data sets consisting of hand-clapping and speech recordings made in a diverse set of acoustic environments using three microphones. Impact of the microphone type variation, frequency, and clustering accuracy and efficiency on the performance of the proposed method is investigated. Performance of the proposed method is also compared with the existing state-of-the-art (SoA) for AEI.
Audio classification utilizing a rule-based approach and the support vector machine classifier
2013
The evaluation of two classification architectures utilizing the rule-based approach and the one-against-one support vector machine (OAO-SVM) is presented in this paper. The classification of the audio stream is carried out in two steps. At first, the rule-based speech/non-speech and music/environment sound discrimination is conducted. The set of adopted features, with a high efficiency in separation of speech and music signals, is implemented in order to find the best discriminator. Consequently, speech segments are classified into pure speech, speech with music and speech with env. sound using the OAO-SVM multi-class classification scheme. Experimental results show that the used classification architecture can decrease the classification error in comparison with OAO-SVM by using MFCC features only.
A COMPARISON OF TECHNIQUES FOR AUDIO ENVIRONMENT CLASSIFICATION
Excessive background noise is one of the most common complaints from hearing aid users. Background noise classification systems can be used in hearing aids to adjust the response based on the noise environment. This paper examines and compares several classification techniques in the form of the k-nearest neighbours (K-NN)classifier, the non-windowed artificial neural network (ANN) and the hidden Markov model (HMM), to an artificial neural network using windowed input (WANN). Results obtained indicate that the WANN gives an accuracy of up to 97.9%, which is the highest accuracy of the tested classifiers. The memory and computational requirements of the windowed ANN are also small compared to the HMM and K-NN. Overall, the WANN is able to give excellent accuracy and reliability and is considered to be a good choice for background noise classification in hearing aids.
Capturing the Acoustic Scene Characteristics for Audio Scene Detection
Scene detection on user-generated content (UGC) aims to classify an audio recording that belongs to a specific scene such as busy street, office or supermarket rather than a sound such as car noise, computer keyboard or cash machine. The difficulty of scene content analysis on UGC lies in the lack of structure and acoustic variability of the audio. The i-vector system is state-of-the-art in Speaker Verification and Scene Detection, and is outperforming conventional Gaussian Mixture Model (GMM)-based approaches. The system compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC. This paper reports our results in the challenge by using a hand-tuned i-vector system and MFCC features on the IEEE-AASP Scene Classification Challenge dataset. Compared to the MFCC+GMM baseline system, our approach increased the classification accuracy by 26.4% relative, to 65.8%. We discuss our approach and highlight parameters in our system that significantly improved our classification accuracy.
Acoustic Environment Identification and Its Applications to Audio Forensics
IEEE Transactions on Information Forensics and Security, 2013
An audio recording is subject to a number of possible distortions and artifacts. Consider, for example, artifacts due to acoustic reverberation and background noise. The acoustic reverberation depends on the shape and the composition of a room, and it causes temporal and spectral smearing of the recorded sound. The background noise, on the other hand, depends on the secondary audio source activities present in the evidentiary recording. Extraction of acoustic cues from an audio recording is an important but challenging task. Temporal changes in the estimated reverberation and background noise can be used for dynamic acoustic environment identification (AEI), audio forensics, and ballistic settings. We describe a statistical technique to model and estimate the amount of reverberation and background noise variance in an audio recording. An energy-based voice activity detection method is proposed for automatic decaying-tail-selection from an audio recording. Effectiveness of the proposed method is tested using a data set consisting of speech recordings. The performance of the proposed method is also evaluated for both speaker-dependent and speaker-independent scenarios.