Digital audio forensics (original) (raw)
Related papers
Security, Forensics, Steganography, and Watermarking of Multimedia Contents X, 2008
The main goals of this paper are to show the impact of the basic assumptions for the cover channel characteristics as well as the impact of different training/testing set generation strategies on the statistical detectability of exemplary chosen audio hiding approaches known from steganography and watermarking. Here we have selected exemplary five steganography algorithms and four watermarking algorithms. The channel characteristics for two different chosen audio cover channels (an application specific exemplary scenario of VoIP steganography and universal audio steganography) are formalised and their impact on decisions in the steganalysis process, especially on the strategies applied for training/ testing set generation, are shown. Following the assumptions on the cover channel characteristics either cover dependent or cover independent training and testing can be performed, using either correlated or non-correlated training and test sets. In comparison to previous work, additional frequency domain features are introduced for steganalysis and the performance (in terms of classification accuracy) of Bayesian classifiers and multinomial logistic regression models is compared with the results of SVM classification. We show that the newly implemented frequency domain features increase the classification accuracy achieved in SVM classification. Furthermore it is shown on the example of VoIP steganalysis that channel character specific evaluation performs better than tests without focus on a specific channel (i.e. universal steganalysis). A comparison of test results for cover dependent and independent training and testing shows that the latter performs better for all nine algorithms evaluated here and the used SVM based classifier.
Digital Audio Forensics: Microphone and Environment Classification Using Deep Learning
IEEE Access
The recording device along with the acoustic environment plays a major role in digital audio forensics. We propose an acoustic source identification system in this paper, which includes identifying both the recording device and the environment in which it was recorded. A hybrid Convolutional Neural Network (CNN) with Long Short-Term Memory (LSTM) is used in this study to automatically extract environments and microphone features from the speech sound. In the experiments, we investigated the effect of using the voiced and unvoiced segments of speech on the accuracy of the environment and microphone classification. We also studied the effect of background noise on microphone classification in 3 different environments, i.e., very quiet, quiet, and noisy. The proposed system utilizes a subset of the KSU-DB corpus containing 3 environments, 4 classes of recording devices, 136 speakers (68 males and 68 females), and 3600 recordings of words, sentences, and continuous speech. This research combines the advantages of both CNN and RNN (in particular bidirectional LSTM) models, called CRNN. The speech signals were represented as a spectrogram and were fed to the CRNN model as 2D images. The proposed method achieved accuracies of 98% and 98.57% for environment and microphone classification, respectively, using unvoiced speech segments.
Acoustic Environment Identification and Its Applications to Audio Forensics
IEEE Transactions on Information Forensics and Security, 2013
An audio recording is subject to a number of possible distortions and artifacts. Consider, for example, artifacts due to acoustic reverberation and background noise. The acoustic reverberation depends on the shape and the composition of a room, and it causes temporal and spectral smearing of the recorded sound. The background noise, on the other hand, depends on the secondary audio source activities present in the evidentiary recording. Extraction of acoustic cues from an audio recording is an important but challenging task. Temporal changes in the estimated reverberation and background noise can be used for dynamic acoustic environment identification (AEI), audio forensics, and ballistic settings. We describe a statistical technique to model and estimate the amount of reverberation and background noise variance in an audio recording. An energy-based voice activity detection method is proposed for automatic decaying-tail-selection from an audio recording. Effectiveness of the proposed method is tested using a data set consisting of speech recordings. The performance of the proposed method is also evaluated for both speaker-dependent and speaker-independent scenarios.
Digital multimedia audio forensics: past, present and future
Multimedia Tools and Applications, 2017
Digital audio forensics is used for a variety of applications ranging from authenticating audio files to link an audio recording to the acquisition device (e.g., microphone), and also linking to the acoustic environment in which the audio recording was made, and identifying traces of coding or transcoding. This survey paper provides an overview of the current state-of-the-art (SOA) in digital audio forensics and highlights some open research problems and future challenges in this active area of research. The paper categorizes the audio file analysis into container and content-based analysis in order to detect the authenticity of the file. Existing SOA, in audio forensics, is discussed based on both container and content-based analysis. The importance of this research topic has encouraged many researchers to contribute in this area; yet, further scopes are available to help researchers and readers expand the body of knowledge. The ultimate goal of this paper is to introduce all information on audio forensics and encourage researchers to solve the unanswered questions. Our survey paper would contribute to this critical research area, which has addressed many serious cases in the past, and help solve many more cases in the future by using advanced techniques with more accurate results.
An Automatic Digital Audio Authentication/Forensics System
IEEE Access, 2017
With the continuous rise in ingenious forgery, a wide range of digital audio authentication applications are emerging as a preventive and detective control in real-world circumstances, such as forged evidence, breach of copyright protection, and unauthorized data access. To investigate and verify, this paper presents a novel automatic authentication system that differentiates between the forged and original audio. The design philosophy of the proposed system is primarily based on three psychoacoustic principles of hearing, which are implemented to simulate the human sound perception system. Moreover, the proposed system is able to classify between the audio of different environments recorded with the same microphone. To authenticate the audio and environment classification, the computed features based on the psychoacoustic principles of hearing are dangled to the Gaussian mixture model to make automatic decisions. It is worth mentioning that the proposed system authenticates an unknown speaker irrespective of the audio content i.e., independent of narrator and text. To evaluate the performance of the proposed system, audios in multienvironments are forged in such a way that a human cannot recognize them. Subjective evaluation by three human evaluators is performed to verify the quality of the generated forged audio. The proposed system provides a classification accuracy of 99.2% ± 2.6. Furthermore, the obtained accuracy for the other scenarios, such as text-dependent and text-independent audio authentication, is 100% by using the proposed system. INDEX TERMS Digital audio authentication, audio forensics, forgery, machine learning algorithm, human psychoacoustic principles.
Energy and Entropy Based Features for WAV Audio Steganalysis
J. Inf. Hiding Multim. Signal Process., 2017
Digital steganalysis techniques attempt to detect hidden information in digital media. The rising interest in steganalysis is attributed to the growing number of steganography algorithms and the threats they represent. This article presents a combined maximum entropy energy approach for audio steganalysis. First, the audio signal is divided into four energy-based regions: noise, low, medium and high; then entropy is computed from each region. Finally, a support vector machine is applied to the collected features for discovering the hidden data in audio signals. Active speech level algorithm is used to capture energy fluctuation in audio streams. The paper shows that the extracted features from separate energy-based regions of the signals have significantly improved detection accuracy of hidden messages. Our work includes comparisons with current state-of-the-art audio steganalysis techniques. The experimental results show that our method achieves up to 96.7% correct for an embedding...
AUDIO STEGANALYSIS OF LSB AUDIO USING MOMENTS AND MULTIPLE 145-160
Steganography is the art and science of communicating in a way which hides the existence of the communication. Important information is firstly hidden in a host data, such as digital image, text, video or audio, etc, and then transmitted secretly to the receiver. Steganalysis is another important topic in information hiding which is the art of detecting the presence of steganography. In this paper an effective steganalysis method based on statistical moment as well as invariant moments of the audio signals is used to detect the presence of hidden messages has been presented. Multiple Regression analysis technique has been carried out to detect the presence of the hidden messages, as well as to estimate the relative length of the embedded messages. The design of audio steganalyzer depends upon the choice of the audio feature selection and the design of a two-class classifier. Experimental results demonstrate the effectiveness and accuracy of the proposed technique.
Steganalysis of LSB Matching in WAV Audio
To expose the existence of hidden message, this paper presents a steganalysis method for LSB (Least signifycant bit) matching steganography in wav audio based on multiorder Markov feature. Noise sequences are extracted respectively based on two methods of local correlation and 5/3 wavelet de-noising, and then the correlation of noise sequence is quantified as multi-order Markov feature vectors. Based on the extracted feature vectors, a steganography detector based on support vector machine (SVM) is trained to identify whether the wav audio file carries hidden data. Experimental results show that detection accuracy can reach more than 80% even when the embedding rate is only 10%, which is far better than other method.
Audio Recording Location Identification Using Acoustic Environment Signature
IEEE Transactions on Information Forensics and Security, 2000
An audio recording is subject to a number of possible distortions and artifacts. Consider, for example, artifacts due to acoustic reverberation and background noise. The acoustic reverberation depends on the shape and the composition of the room, and it causes temporal and spectral smearing of the recorded sound. The background noise, on the other hand, depends on the secondary audio source activities present in the evidentiary recording. Extraction of acoustic cues from an audio recording is an important but challenging task. Temporal changes in the estimated reverberation and background noise can be used for dynamic acoustic environment identification (AEI), audio forensics, and ballistic settings. We describe a statistical technique based on spectral subtraction to estimate the amount of reverberation and nonlinear filtering based on particle filtering to estimate the background noise. The effectiveness of the proposed method is tested using a data set consisting of speech recordings of two human speakers (one male and one female) made in eight acoustic environments using four commercial grade microphones. Performance of the proposed method is evaluated for various experimental settings such as microphone independent, semi-and full-blind AEI, and robustness to MP3 compression. Performance of the proposed framework is also evaluated using Temporal Derivative-based Spectrum and Mel-Cepstrum (TDSM)-based features. Experimental results show that the proposed method improves AEI performance compared with the direct method (i.e., feature vector is extracted from the audio recording directly). In addition, experimental results also show that the proposed scheme is robust to MP3 compression attack.