Audio Recording Location Identification Using Acoustic Environment Signature (original) (raw)

Acoustic Environment Identification and Its Applications to Audio Forensics

IEEE Transactions on Information Forensics and Security, 2013

An audio recording is subject to a number of possible distortions and artifacts. Consider, for example, artifacts due to acoustic reverberation and background noise. The acoustic reverberation depends on the shape and the composition of a room, and it causes temporal and spectral smearing of the recorded sound. The background noise, on the other hand, depends on the secondary audio source activities present in the evidentiary recording. Extraction of acoustic cues from an audio recording is an important but challenging task. Temporal changes in the estimated reverberation and background noise can be used for dynamic acoustic environment identification (AEI), audio forensics, and ballistic settings. We describe a statistical technique to model and estimate the amount of reverberation and background noise variance in an audio recording. An energy-based voice activity detection method is proposed for automatic decaying-tail-selection from an audio recording. Effectiveness of the proposed method is tested using a data set consisting of speech recordings. The performance of the proposed method is also evaluated for both speaker-dependent and speaker-independent scenarios.

Identification of Recorded Audio Location Using Acoustic Environment Classification

International Journal of Advance Research and Innovative Ideas in Education, 2015

There are many artifact and different distortions present in the recording. The reverberation depends on the volume properties of the room and it causes the calumniation of the recording. The background noise depends on the unnecessary audio source activities present in the evident recording. For audio to be considered as proof in a court, its authenticity must be verified. A blind deconvolution method based on FIR filtering and overlap add method is used to estimate reverberation time. Particle filtering is used to estimate the background noise. Feature extraction is done by using MFCC approach. The 128 Dimensional feature vector is the addition of features from acoustic reverberation and background noise and the higher order statistics. SVM classifier is used for classification of the environments. The performance of the system is checked using audio recordings dataset. The SVM classifier provides best results for the trained dataset and moderate results for untrained dataset.

Audio forensics from acoustic reverberation

2010

An audio recording is subject to a number of possible distortions and artifacts. For example, the persistence of sound, due to multiple reflections from various surfaces in a room, causes temporal and spectral smearing of the recorded sound. This distortion is referred to as audio reverberation time. We describe a technique to model and estimate the amount of reverberation in an audio recording. Because reverberation depends on the shape and composition of a room, differences in the estimated reverberation can be used in a forensic and ballistic setting.

Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral Features

2011

Environment recognition from digital audio for forensics application is a growing area of interest. However, compared to other branches of audio forensics, it is a less researched one. Especially less attention has been given to detect environment from files where foreground speech is present, which is a forensics scenario. In this paper, we perform several experiments focusing on the problems of environment recognition from audio particularly for forensics application. Experimental results show that the task is easier when audio files contain only environmental sound than when they contain both foreground speech and background environment. We propose a full set of MPEG-7 audio features combined with mel frequency cepstral coefficients (MFCCs) to improve the accuracy. In the experiments, the proposed approach significantly increases the recognition accuracy of environment sound even in the presence of high amount of foreground human speech.

Digital audio forensics using background noise

2010

This paper presents a new audio forensics method based on background noise in the audio signals. The traditional speech enhancement algorithms improve the quality of speech signals, however, existing methods leave traces of speech in the removed noise. Estimated noise using these existing methods contains traces of speech signal, also known as leakage signal. Although this speech leakage signal has low SNR, yet it can be perceived easily by listening to the estimated noise signal, it therefore cannot be used for audio forensics applications. For reliable audio authentication, a better noise estimation method is desirable. To achieve this goal, a two-step framework is proposed to estimate the background noise with minimal speech leakage signal. A correlation based similarity measure is then applied to determine the integrity of speech signal. The proposed method has been evaluated for different speech signals recorded in various environments. The results show that it performs better than the existing speech enhancement algorithms with significant improvement in terms of SNR value.

Digital audio forensics

Proceedings of the 9th workshop on Multimedia & security - MM&Sec '07, 2007

In this paper a first approach for digital media forensics is presented to determine the used microphones and the environments of recorded digital audio samples by using known audio steganalysis features. Our first evaluation is based on a limited exemplary test set of 10 different audio reference signals recorded as mono audio data by four microphones in 10 different rooms with 44.1 kHz sampling rate and 16 bit quantisation. Note that, of course, a generalisation of the results cannot be achieved. Motivated by the syntactical and semantical analysis of information and in particular by known audio steganalysis approaches, a first set of specific features are selected for classification to evaluate, whether this first feature set can support correct classifications. The idea was mainly driven by the existing steganalysis features and the question of applicability within a first and limited test set. In the tests presented in this paper, an inter-device analysis with different device characteristics is performed while intra-device evaluations (identical microphone models of the same manufacturer) are not considered. For classification the data mining tool WEKA with K-means as a clustering and Naive Bayes as a classification technique are applied with the goal to evaluate their classification in regard to the classification accuracy on known audio steganalysis features. Our results show, that for our test set, the used classification techniques and selected steganalysis features, microphones can be better classified than environments. These first tests show promising results but of course are based on a limited test and training set as well a specific test set generation. Therefore additional and enhanced features with different test set generation strategies are necessary to generalise the findings.

Digital Audio Forensics: Microphone and Environment Classification Using Deep Learning

IEEE Access

The recording device along with the acoustic environment plays a major role in digital audio forensics. We propose an acoustic source identification system in this paper, which includes identifying both the recording device and the environment in which it was recorded. A hybrid Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM) is used in this study to automatically extract environments and microphone features from the speech sound. In the experiments, we investigated the effect of using the voiced and unvoiced segments of speech on the accuracy of the environment and microphone classification. We also studied the effect of background noise on microphone classification in 3 different environments, i.e., very quiet, quiet, and noisy. The proposed system utilizes a subset of the KSU-DB corpus containing 3 environments, 4 classes of recording devices, 136 speakers (68 males and 68 females), and 3600 recordings of words, sentences, and continuous speech. This research combines the advantages of both CNN and RNN (in particular bidirectional LSTM) models, called CRNN. The speech signals were represented as a spectrogram and were fed to the CRNN model as 2D images. The proposed method achieved accuracies of 98% and 98.57% for environment and microphone classification, respectively, using unvoiced speech segments.

Name That Room: Room identification using acoustic features in a recording

20th ACM international conference on Multimedia - MM '12, 2012

This paper presents a system for identifying the room in an audio or video recording through the analysis of acoustical properties. The room identification system was tested using a corpus of 13440 reverberant audio samples. With no common content between the training and testing data, an accuracy of 61% for musical signals and 85% for speech signals was achieved. This approach could be applied in a variety of scenarios where knowledge about the acoustical environment is desired, such as location estimation, music recommendation, or emergency response systems.

Audio analysis for surveillance applications

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005., 2005

We propose a time series analysis based approach for systematic choice of audio classes for detection of crimes in elevators in 1. Since all the different sounds in a surveillance environment cannot be anticipated, a surveillance system for event detection cannot complete rely on a supervised audio classification framework. In this paper, we propose a hybrid solution that consists two parts; one that performs unsupervised audio analysis and another that performs analysis using an audio classification framework obtained from off-line analysis and training. The proposed system is capable of detecting new kinds of suspicious audio events that occur as outliers agains a background of usual activity. It adaptively learns a Gaussian Mixture Model (GMM) to model the background sounds and updates the model incrementally as new audio data arrives. New types of suspicious events can be detected as deviants from this usual background model. The results on elevator audio data are promising.

A Practical Forensic Method for Enhancing Speech Signals Drowned in Loud Music

Recording audio or video is nowadays easier than ever. Almost every phone can do this task with high quality. This has some serious implications in forensic: almost every dialogue or event can be recorded and used as evidence in trials. The problem is that editing multimedia content has also become a very accessible operation. The advances of editing software make it possible with very convincing results for the untrained audience. Forged recordings could be used in trials. The need for multimedia forensic is imminent. There are two main directions of this field: probe authentication and noise reduction. This paper presents the research activities conducted to extract speech signal masked by loud music. The developed system is based on an adaptive system identification configuration. Various scenarios are studied showing the advantages and disadvantages of the adaptive algorithms that were tested. The influence of the acoustic environment over the performances of the proposed system...