Detection and Classification of Acoustic Scenes and Events (original) (raw)
Related papers
Detection and classification of acoustic scenes and events: An IEEE AASP challenge
2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013
This paper describes a newly-launched public evaluation challenge on acoustic scene classification and detection of sound events within a scene. Systems dealing with such tasks are far from exhibiting human-like performance and robustness. Undermining factors are numerous: the extreme variability of sources of interest possibly interfering, the presence of complex background noise as well as room effects like reverberation. The proposed challenge is an attempt to help the research community move forward in defining and studying the aforementioned tasks. Apart from the challenge description, this paper provides an overview of systems submitted to the challenge as well as a detailed evaluation of the results achieved by those systems.
CLEAR evaluation of acoustic event detection and classification systems
2007
In this paper, we present the results of the Acoustic Event Detection (AED) and Classification (AEC) evaluations carried out in February 2006 by the three participant partners from the CHIL project. The primary evaluation task was AED of the testing portions of the isolated sound databases and seminar recordings produced in CHIL. Additionally, a secondary AEC evaluation task was designed using only the isolated sound databases. The set of meetingroom acoustic event classes and the metrics were agreed by the three partners and ELDA was in charge of the scoring task. In this paper, the various systems for the tasks of AED and AEC and their results are presented.
Acoustic Scene Classification: A Competition Review
2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), 2018
In this paper we study the problem of acoustic scene classification, i.e., categorization of audio sequences into mutually exclusive classes based on their spectral content. We describe the methods and results discovered during a competition organized in the context of a graduate machine learning course; both by the students and external participants. We identify the most suitable methods and study the impact of each by performing an ablation study of the mixture of approaches. We also compare the results with a neural network baseline, and show the improvement over that. Finally, we discuss the impact of using a competition as a part of a university course, and justify its importance in the curriculum based on student feedback.
An Overview of Audio Event Detection Methods from Feature Extraction to Classification
Applied Artificial Intelligence, 2017
Audio streams, such as news broadcasting, meeting rooms, and special video comprise sound from an extensive variety of sources. The detection of audio events including speech, coughing, gunshots, etc. leads to intelligent audio event detection (AED). With substantial attention geared to AED for various types of applications, such as security, speech recognition, speaker recognition, home care, and health monitoring, scientists are now more motivated to perform extensive research on AED. The deployment of AED is actually amore complicated task when going beyond exclusively highlighting audio events in terms of feature extraction and classification in order to select the best features with high detection accuracy. To date, a wide range of different detection systems based on intelligent techniques have been utilized to create machine learning-based audio event detection schemes. Nevertheless, the preview study does not encompass any state-of-the-art reviews of the proficiency and significances of such methods for resolving audio event detection matters. The major contribution of this work entails reviewing and categorizing existing AED schemes into preprocessing, feature extraction, and classification methods. The importance of the algorithms and methodologies and their proficiency and restriction are additionally analyzed in this study. This research is expanded by critically comparing audio detection methods and algorithms according to accuracy and false alarms using different types of datasets.
Classification of audio scenes with novel features in a fused system framework
Digital Signal Processing, 2018
The rapidly increasing requirements from context-aware gadgets, like smartphones and intelligent wearable devices, along with applications such as audio archiving, have given a fillip to the research in the field of Acoustic Scene Classification (ASC). The Detection and Classification of Acoustic Scenes and Events (DCASE) challenges have seen systems addressing the problem of ASC from different directions. Some of them could achieve better results than the Mel Frequency Cepstral Coefficients-Gaussian Mixture Model (MFCC-GMM) baseline system. However, a collective decision from all participating systems was found to surpass the accuracy obtained by each system. The simultaneous use of various approaches can exploit the discriminating information in a better way for audio collected from different environments covering audible-frequency range in varying degrees. In this work, we show that the framelevel statistics of some well-known spectral features when fed to Support Vector Machine (SVM) classifier individually, are able to outperform the baseline system of DCASE challenges. Furthermore, we analyzed different methods of combining these features, and also of combining information from two channels when the data is in binaural format. The proposed approach resulted in around 17% and 9% relative improvement in accuracy with respect to the baseline system on the development and evaluation dataset, respectively, from DCASE 2016 ASC task.
Event identification by acoustic signature recognition
1995
Many events of interest to the security community produce acoustic emissions that are, in principle, identifiable as to cause. Some obvious examples are gunshots, breaking glass, takeoffs and landings of small aircraft, vehicular engine noises, footsteps (high frequencies when on gravel, very low frequencies when on soil), and voices (whispers to shouts). We are investigating wavelet-based methods to extract unique features of such events for classification and identification. We also discuss methods of classification and pattern recognition specifically tailored for acoustic signatures obtained by wavelet analysis. The paper is divided into three parts: completed work, work in progress, and future applications. The completed phase has led to the successful recognition of aircraft types on landing and takeoff. Both small aircraft (twin-engine turboprop) and large (commercial airliners) were included in the study. The project considered the design of a small, field-deployable, inexpensive device. The techniques developed during the aircraft identification phase were then adapted to a multispectral electromagnetic interference monitoring device now deployed in a nuclear power plant. This is a general-purpose wavelet analysis engine, spanning 14 octaves, and can be adapted for other specific tasks. Work in progress is focused on applying the methods previously developed to speaker identification. Some of the problems to be overcome include recognition of sounds as voice patterns and as distinct from possible background noises (e.g., music}, as well as identification of the speaker from a short-duration voice sample. A generalization of the completed work and the work in progress is a device capable of classifying any number of acoustic events-particular1 y quasi-stationary events such as engine noises and voices and singular events such as gunshots and breaking glass. We will show examples of both kinds of events and discuss their recognition likelihood.
A Robust Framework for Acoustic Scene Classification
Interspeech 2019, 2019
Acoustic scene classification (ASC) using front-end timefrequency features and back-end neural network classifiers has demonstrated good performance in recent years. However a profusion of systems has arisen to suit different tasks and datasets, utilising different feature and classifier types. This paper aims at a robust framework that can explore and utilise a range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance. In particular, we exploit three different types of frontend time-frequency feature; log energy Mel filter, Gammatone filter and constant Q transform. At the back-end we evaluate effective a two-stage model that exploits a Convolutional Neural Network for pre-trained feature extraction, followed by Deep Neural Network classifiers as a post-trained feature adaptation model and classifier. We also explore the use of a data augmentation technique for these features that effectively generates a variety of intermediate data, reinforcing model learning abilities, particularly for marginal cases. We assess performance on the DCASE2016 dataset, demonstrating good classification accuracies exceeding 90%, significantly outperforming the DCASE2016 baseline and highly competitive compared to state-of-the-art systems.
IJERT-Acoustic Event Recognition for theApplication of surveillance
International Journal of Engineering Research and Technology (IJERT), 2018
https://www.ijert.org/acoustic-event-recognition-for-theapplication-of-surveillance https://www.ijert.org/research/acoustic-event-recognition-for-theapplication-of-surveillance-IJERTCONV6IS15109.pdf From few decades many systems was designed and proposed for automatically detecting the situations on road such as accidents to safe guard quick intervention of the emergency teams. However in some situations visual data is not efficient and sufficiently reliable. So use of audio event detectors and microphones along with the existing systems improves the overall reliability and obtains efficient results in surveillance systems. The name Acoustic Event Recognition (AER) deals with detection, classification and recognition of unstructured or unmannered environment which may contain overlapping sound events and non-stationary noises in the background. Many sounds along with noise contribute to the context of the surrounding environment. So noises must not be degraded. As such noises are commonly useful in Automatic Speech Recognition (ASR) which are also useful for many surveillance applications.
Confidence Based Acoustic Event Detection
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018
Acoustic event detection, the determination of the acoustic event type and the localisation of the event, has been widely applied in many real-world applications. Many works adopt the multi-label classification technique to perform the polyphonic acoustic event detection with a global threshold to detect the active acoustic events. However, the manually labeled boundaries are error-prone and cannot always be accurate, especially when the frame length is too short to be accurately labeled by human annotators. To deal with this, a confidence is assigned to each frame and acoustic event detection is performed using a multi-variable regression approach in this paper. Experimental results on the latest TUT sound event 2017 database of polyphonic events demonstrate the superior performance of the proposed approach compared to the multi-label classification based AED method.
Novel Approaches to Speech Detection in the Processing of Continuous Audio Streams
Robust Speech Recognition and Understanding, 2007
Open Access Database www.i-techonline.com Robust Speech Recognition and Understanding 24 a generic phoneme recognizer. We also propose the fusion of different selected representations in order to improve the speech-detection results. Section 3 describes the two SNS-segmentation approaches used in our evaluations, one of which was specially designed for the proposed feature representation. In the evaluation section we present results from a wide range of experiments on a BN audio database using different speech-processing applications. We try to assess the performance of the proposed representation using a comparison with existing approaches for two different tasks. In the first task the performance of different representations of the audio signals is assessed directly by comparing the evaluation results of speech and non-speech detection on BN audio data. The second group of experiments tries to determine the impact of SNS segmentation on the subsequent processing of the audio data. We then measure the impact of different SNSsegmentation systems when they are applied in a pre-processing step of an evaluated speaker-diarisation system that is used as a speaker-tracking tool for BN audio data.