CnnSound: Convolutional Neural Networks for the Classification of Environmental Sounds (original) (raw)

Deep Convolutional Neural Networks for Environmental Sound Classification

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022

We propose a model to classify environmental sounds such as People Sounds, Vehicles Sounds, Siren Sounds, Horn, Engine Sounds. We perform Data Augmentation techniques to extract best features from the given audio to classify which class of sound. Our deep convolutional neural network architecture uses stacked convolutional and pooling layers to extract highlevel feature representations from spectrogram-like features from the given input.

A new pyramidal concatenated CNN approach for environmental sound classification

Applied Acoustics, 2020

Recently, there has been an incremental interest on Environmental Sound Classification (ESC), which is an important topic of the non-speech audio classification task. A novel approach, which is based on deep Convolutional Neural Networks (CNN), is proposed in this study. The proposed approach covers a bunch of stages such as pre-processing, deep learning based feature extraction, feature concatenation, feature reduction and classification, respectively. In the first stage, the input sound signals are denoised and are converted into sound images by using the Sort Time Fourier Transform (STFT) method. After sound images are formed, pre-trained CNN models are used for deep feature extraction. In this stage, VGG16, VGG19 and DenseNet201 models are considered. The feature extraction is performed in a pyramidal fashion which makes the dimension of the feature vector quite large. For both dimension reduction and the determination of the most efficient features, a feature selection mechanism is considered after feature concatenation stage. In the last stage of the proposed method, a Support Vector Machines (SVM) classifier is used. The efficiency of the proposed method is calculated on various ESC datasets such as ESC 10, ESC 50 and UrbanSound8K, respectively. The experimental works show that the proposed method produced 94.8%, 81.4% and 78.14% accuracy scores for ESC-10, ESC-50 and UrbanSound8K datasets. The obtained results are also compared with the state-of-the art methods achievements.

Environmental sound classification using a regularized deep convolutional neural network with data augmentation

The adoption of the environmental sound classification (ESC) tasks increases very rapidly over recent years due to its broad range of applications in our daily routine life. ESC is also known as Sound Event Recognition (SER) which involves the context of recognizing the audio stream, related to various environmental sounds. Some frequent and common aspects like non-uniform distance between acoustic source and microphone, the difference in the framework, presence of numerous sounds sources in audio recordings and overlapping various sound events make this ESC problem much complex and complicated. This study is to employ deep convolutional neural networks (DCNN) with regularization and data enhancement with basic audio features that have verified to be efficient on ESC tasks. In this study, the performance of DCNN with max-pooling (Model-1) and without max-pooling (Model-2) function are examined. Three audio attribute extraction techniques, Mel spectrogram (Mel), Mel Frequency Cepstral Coefficient (MFCC) and Log-Mel, are considered for the ESC-10, ESC-50, and Urban sound (US8K) datasets. Furthermore, to avoid the risk of overfitting due to limited numbers of data, this study also introduces offline data augmentation techniques to enhance the used datasets with a combination of L2 regularization. The performance evaluation illustrates that the best accuracy attained by the proposed DCNN without max-pooling function (Model-2) and using Log-Mel audio feature extraction on those augmented datasets. For ESC-10, ESC-50 and US8K, the highest achieved accuracies are 94.94%, 89.28%, and 95.37% respectively. The experimental results show that the proposed approach can accomplish the best performance on environment sound classification problems.

IJERT-Improved Deep CNN with Reduced Parameters for Automatic Identification of Environmental Sounds

International Journal of Engineering Research and Technology (IJERT), 2019

https://www.ijert.org/improved-deep-cnn-with-reduced-parameters-for-automatic-identification-of-environmental-sounds https://www.ijert.org/research/improved-deep-cnn-with-reduced-parameters-for-automatic-identification-of-environmental-sounds-IJERTCONV7IS13001.pdf Deep learning techniques like Convolutional Neural Network (CNN) are steadily gaining impetus in the context of environmental sound classification. Despite their excellent performance CNN poses a challenge in terms of hardware and memory requirements due to its computationally intensive nature. Recent trends in deep learning research focus on reducing the number of parameters in the deep learning framework without performance degradation. In this paper, we put forward a novel CNN architecture with reduced parameters for automatic environmental sound classification. The proposed architecture offered a parameter reduction of 24.16% and reduced the MAC operations by 20.17%. This indicates that the proposed architecture results in reduced computational complexity during hardware deployment. The impact of parameter reduction on model accuracy is analyzed by evaluating the proposed model on a publicly available database. The results indicate that the proposed architecture outshines the state of the art approaches for automatic identification of environmental sounds.

Urban Sound Classification using Neural Networks

International Journal for Research in Applied Science and Engineering Technology IJRASET, 2020

We are surrounded by sounds, we hear various types of sounds on a day to day basis whether it is music sound, different noises, etc. The urban life is filled with such sounds, which makes it important and highly useful for us to work on these sounds and get some useful information from it so that we can use it efficiently. These sounds are continuously processed by human minds to decipher information about the environment. The same can be done by a machine learning model. It has been seen that convolutional neural networks have been really successful in classifying images, so it becomes a question of interest that how good do they work with different sounds. In this paper, we have worked upon using different deep learning models to see which can be used for the purpose of sound classification. We have used the Urbansound8K dataset which contains 8732 sound excerpts of urban sounds from 10 classes.

Automatic Environmental Sound Recognition (AESR) Using Convolutional Neural Network

International Journal of Modern Education and Computer Science, 2020

Automatic Environmental Sound Recognition (AESR) is an essential topic in modern research in the field of pattern recognition. We can convert a short audio file of a sound event into a spectrogram image and feed that image to the Convolutional Neural Network (CNN) for processing. Features generated from that image are used for the classification of various environmental sound events such as sea waves, fire cracking, dog barking, lightning, raining, and many more. We have used the log-mel spectrogram auditory feature for training our six-layer stack CNN model. We evaluated the accuracy of our model for classifying the environmental sounds in three publicly available datasets and achieved an accuracy of 92.9% in the urbansound8k dataset, 91.7% accuracy in the ESC-10 dataset, and 65.8% accuracy in the ESC-50 dataset. These results show remarkable improvement in precise environmental sound recognition using only stack CNN compared to multiple previous works, and also show the efficiency of the log-mel spectrogram feature in sound recognition compared to Mel Frequency Cepstral Coefficients (MFCC), Wavelet Transformation, and raw waveform. We have also experimented with the newly published Rectified Adam (RAdam) as the optimizer. Our study also shows a comparative analysis between the Adaptive Learning Rate Optimizer (Adam) and RAdam optimizer used in training the model to correctly classifying the environmental sounds from image recognition architecture.

End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network

Expert Systems with Applications

In this paper, we present an end-to-end approach for environmental sound classification based on a 1D Convolution Neural Network (CNN) that learns a representation directly from the audio signal. Several convolutional layers are used to capture the signal's fine time structure and learn diverse filters that are relevant to the classification task. The proposed approach can deal with audio signals of any length as it splits the signal into overlapped frames using a sliding window. Different architectures considering several input sizes are evaluated, including the initialization of the first convolutional layer with a Gammatone filterbank that models the human auditory filter response in the cochlea. The performance of the proposed end-to-end approach in classifying environmental sounds was assessed on the UrbanSound8k dataset and the experimental results have shown that it achieves 89% of mean accuracy. Therefore, the propose approach outperforms most of the state-of-the-art approaches that use handcrafted features or 2D representations as input. Furthermore, the proposed approach has a small number of parameters compared to other architectures found in the literature, which reduces the amount of data required for training.

Deep Learning-based Environmental Sound Classification Using Feature Fusion and Data Enhancement

Computers, Materials & Continua

Environmental sound classification (ESC) involves the process of distinguishing an audio stream associated with numerous environmental sounds. Some common aspects such as the framework difference, overlapping of different sound events, and the presence of various sound sources during recording make the ESC task much more complicated and complex. This research is to propose a deep learning model to improve the recognition rate of environmental sounds and reduce the model training time under limited computation resources. In this research, the performance of transformer and convolutional neural networks (CNN) are investigated. Seven audio features, chromagram, Mel-spectrogram, tonnetz, Mel-Frequency Cepstral Coefficients (MFCCs), delta MFCCs, delta-delta MFCCs and spectral contrast, are extracted from the UrbanSound8K, ESC-50, and ESC-10, databases. Moreover, this research also employed three data enhancement methods, namely, white noise, pitch tuning, and time stretch to reduce the risk of overfitting issue due to the limited audio clips. The evaluation of various experiments demonstrates that the best performance was achieved by the proposed transformer model using seven audio features on enhanced database. For UrbanSound8K, ESC-50, and ESC-10, the highest attained accuracies are 0.98, 0.94, and 0.97 respectively. The experimental results reveal that the proposed technique can achieve the best performance for ESC problems.

The Application and Improvement of Deep Neural Networks in Environmental Sound Recognition

Applied Sciences, 2020

Neural networks have achieved great results in sound recognition, and many different kinds of acoustic features have been tried as the training input for the network. However, there is still doubt about whether a neural network can efficiently extract features from the raw audio signal input. This study improved the raw-signal-input network from other researches using deeper network architectures. The raw signals could be better analyzed in the proposed network. We also presented a discussion of several kinds of network settings, and with the spectrogram-like conversion, our network could reach an accuracy of 73.55% in the open-audio-dataset “Dataset for Environmental Sound Classification 50” (ESC50). This study also proposed a network architecture that could combine different kinds of network feeds with different features. With the help of global pooling, a flexible fusion way was integrated into the network. Our experiment successfully combined two different networks with differen...