Identifying the Sound Event Recognition using Machine learning And Artificial Intelligence (original) (raw)

Convolutional Neural Network based Audio Event Classification

Ksii Transactions on Internet and Information Systems, 2018

This paper proposes an audio event classification method based on convolutional neural networks (CNNs). CNN has great advantages of distinguishing complex shapes of image. Proposed system uses the features of audio sound as an input image of CNN. Mel scale filter bank features are extracted from each frame, then the features are concatenated over 40 consecutive frames and as a result, the concatenated frames are regarded as an input image. The output layer of CNN generates probabilities of audio event (e.g. dogs bark, siren, forest). The event probabilities for all images in an audio segment are accumulated, then the audio event having the highest accumulated probability is determined to be the classification result. This proposed method classified thirty audio events with the accuracy of 81.5% for the UrbanSound8K, BBC Sound FX, DCASE2016, and FREESOUND dataset.

An Overview of Audio Event Detection Methods from Feature Extraction to Classification

Applied Artificial Intelligence, 2017

Audio streams, such as news broadcasting, meeting rooms, and special video comprise sound from an extensive variety of sources. The detection of audio events including speech, coughing, gunshots, etc. leads to intelligent audio event detection (AED). With substantial attention geared to AED for various types of applications, such as security, speech recognition, speaker recognition, home care, and health monitoring, scientists are now more motivated to perform extensive research on AED. The deployment of AED is actually amore complicated task when going beyond exclusively highlighting audio events in terms of feature extraction and classification in order to select the best features with high detection accuracy. To date, a wide range of different detection systems based on intelligent techniques have been utilized to create machine learning-based audio event detection schemes. Nevertheless, the preview study does not encompass any state-of-the-art reviews of the proficiency and significances of such methods for resolving audio event detection matters. The major contribution of this work entails reviewing and categorizing existing AED schemes into preprocessing, feature extraction, and classification methods. The importance of the algorithms and methodologies and their proficiency and restriction are additionally analyzed in this study. This research is expanded by critically comparing audio detection methods and algorithms according to accuracy and false alarms using different types of datasets.

Continuous robust sound event classification using time-frequency features and deep learning

PLOS ONE, 2017

The automatic detection and recognition of sound events by computers is a requirement for a number of emerging sensing and human computer interaction technologies. Recent advances in this field have been achieved by machine learning classifiers working in conjunction with time-frequency feature representations. This combination has achieved excellent accuracy for classification of discrete sounds. The ability to recognise sounds under real-world noisy conditions, called robust sound event classification, is an especially challenging task that has attracted recent research attention. Another aspect of real-word conditions is the classification of continuous, occluded or overlapping sounds, rather than classification of short isolated sound recordings. This paper addresses the classification of noise-corrupted, occluded, overlapped, continuous sound recordings. It first proposes a standard evaluation task for such sounds based upon a common existing method for evaluating isolated sound classification. It then benchmarks several high performing isolated sound classifiers to operate with continuous sound data by incorporating an energy-based event detection front end. Results are reported for each tested system using the new task, to provide the first analysis of their performance for continuous sound event detection. In addition it proposes and evaluates a novel Bayesian-inspired front end for the segmentation and detection of continuous sound recordings prior to classification.

Acoustic events detection with Support Vector Machines

2010

This paper deals with detection and classification of acoustic events which could indicate potentially dangerous situation. The main objective of the paper is to determine the classification accuracy of gun shots taken in noisy environment. The detection and classification method is based on Support Vector Machines. The training and testing data are gunshots which were recorded in an open space and consequently degraded with background street noise.

Deep Neural Networks for Sound Event Detection

2019

The objective of this thesis is to develop novel classification and feature learning techniques for the task of sound event detection (SED) in real-world environments. Throughout their lives, humans experience a consistent learning process on how to assign meanings to sounds. Thanks to this, most of the humans can easily recognize the sound of a thunder, dog bark, door bell, bird singing etc. In this work, we aim to develop systems that can automatically detect the sound events commonly present in our daily lives. Such systems can be utilized in e.g. contextaware devices, acoustic surveillance, bio-acoustical and healthcare monitoring, and smart-home cities. In this thesis, we propose to apply the modern machine learning methods called deep learning for SED. The relationship between the commonly used timefrequency representations for SED (such as mel spectrogram and magnitude spectrogram) and the target sound event labels are highly complex. Deep learning methods such as deep neural...

Sound event detection using deep neural networks

TELKOMNIKA Telecommunication Computing Electronics and Control, 2020

We applied various architectures of deep neural networks for sound event detection and compared their performance using two different datasets. Feed forward neural network (FNN), convolutional neural network (CNN), recurrent neural network (RNN) and convolutional recurrent neural network (CRNN) were implemented using hyper-parameters optimized for each architecture and dataset. The results show that the performance of deep neural networks varied significantly depending on the learning rate, which can be optimized by conducting a series of experiments on the validation data over predetermined ranges. Among the implemented architectures, the CRNN performed best under all testing conditions, followed by CNN. Although RNN was effective in tracking the time-correlation information in audio signals,it exhibited inferior performance compared to the CNN and the CRNN. Accordingly, it is necessary to develop more optimization strategies for implementing RNN in sound event detection.

IJERT-Acoustic Event Recognition for theApplication of surveillance

International Journal of Engineering Research and Technology (IJERT), 2018

https://www.ijert.org/acoustic-event-recognition-for-theapplication-of-surveillance https://www.ijert.org/research/acoustic-event-recognition-for-theapplication-of-surveillance-IJERTCONV6IS15109.pdf From few decades many systems was designed and proposed for automatically detecting the situations on road such as accidents to safe guard quick intervention of the emergency teams. However in some situations visual data is not efficient and sufficiently reliable. So use of audio event detectors and microphones along with the existing systems improves the overall reliability and obtains efficient results in surveillance systems. The name Acoustic Event Recognition (AER) deals with detection, classification and recognition of unstructured or unmannered environment which may contain overlapping sound events and non-stationary noises in the background. Many sounds along with noise contribute to the context of the surrounding environment. So noises must not be degraded. As such noises are commonly useful in Automatic Speech Recognition (ASR) which are also useful for many surveillance applications.

Sound-event partitioning and feature normalization for robust sound-event detection

2014

The ubiquitous of smartphones has opened up the possibility of mobile acoustic surveillance. However, the continuous operation of surveillance systems calls for efficient algorithms to conserve battery consumption. This paper proposes a power-efficient sound-event detector that exploits the redundancy in the sound frames. This is achieved by a soundevent partitioning (SEP) scheme where the acoustic vectors within a sound event are partitioned into a number of chunks, and the means and standard deviations of the acoustic features in the chucks are concatenated for classification by a support vector machine (SVM). Regularized PCA-whitening and L2 normalization are applied to the acoustic vectors to make them more amenable for the SVM. Experimental results based on 1000 sound events show that the proposed scheme is effective even if there are severe mismatches between the training and test conditions.

Event identification by acoustic signature recognition

1995

Many events of interest to the security community produce acoustic emissions that are, in principle, identifiable as to cause. Some obvious examples are gunshots, breaking glass, takeoffs and landings of small aircraft, vehicular engine noises, footsteps (high frequencies when on gravel, very low frequencies when on soil), and voices (whispers to shouts). We are investigating wavelet-based methods to extract unique features of such events for classification and identification. We also discuss methods of classification and pattern recognition specifically tailored for acoustic signatures obtained by wavelet analysis. The paper is divided into three parts: completed work, work in progress, and future applications. The completed phase has led to the successful recognition of aircraft types on landing and takeoff. Both small aircraft (twin-engine turboprop) and large (commercial airliners) were included in the study. The project considered the design of a small, field-deployable, inexpensive device. The techniques developed during the aircraft identification phase were then adapted to a multispectral electromagnetic interference monitoring device now deployed in a nuclear power plant. This is a general-purpose wavelet analysis engine, spanning 14 octaves, and can be adapted for other specific tasks. Work in progress is focused on applying the methods previously developed to speaker identification. Some of the problems to be overcome include recognition of sounds as voice patterns and as distinct from possible background noises (e.g., music}, as well as identification of the speaker from a short-duration voice sample. A generalization of the completed work and the work in progress is a device capable of classifying any number of acoustic events-particular1 y quasi-stationary events such as engine noises and voices and singular events such as gunshots and breaking glass. We will show examples of both kinds of events and discuss their recognition likelihood.

Acoustic Event Classification Using Convolutional Neural Networks

2017

The classification of human-made acoustic events is important for the monitoring and recognition of human activities or critical behavior. In our experiments on acoustic event classification for the utilization in the sector of health care, we defined different acoustic events which represent critical events for elderly or people with disabilities in ambient assisted living environments or patients in hospitals. This contribution presents our work for acoustic event classification using deep learning techniques. We implemented and trained various convolutional neural networks for the extraction of deep feature vectors making use of current best practices in neural network design to establish a baseline for acoustic event classification. We convert chunks of audio signals into magnitude spectrograms and treat acoustic events as images. Our data set contains 20 different acoustic events which were collected in two different recording sessions combining human and environmental sounds. ...