Hybrid Computerized Method for Environmental Sound Classification (original) (raw)
Classification of environmental sounds plays a key role in security, investigation, robotics since the study of the sounds present in a specific environment can allow to get significant insights. Lack of standardized methods for an automatic and effective environmental sound classification (ESC) creates a need to be urgently satisfied. As a response to this limitation, in this paper, a hybrid model for automatic and accurate classification of environmental sounds is proposed. Optimum allocation sampling (OAS) is used to elicit the informative samples from each class. The representative samples obtained by OAS are turned into the spectrogram containing their time-frequency-amplitude representation by using a short-time Fourier transform (STFT). The spectrogram is then given as an input to pre-trained AlexNet and Visual Geometry Group (VGG)-16 networks. Multiple deep features are extracted using the pre-trained networks and classified by using multiple classification techniques namely decision tree (fine, medium, coarse kernel), k-nearest neighbor (fine, medium, cosine, cubic, coarse and weighted kernel), support vector machine, linear discriminant analysis, bagged tree and softmax classifiers. The ESC-10, a ten-class environmental sound dataset, is used for the evaluation of the methodology. An accuracy of 90.1%, 95.8%, 94.7%, 87.9%, 95.6%, and 92.4% is obtained with a decision tree, k-neared neighbor, support vector machine, linear discriminant analysis, bagged tree and softmax classifier respectively. The proposed method proved to be robust, effective, and promising in comparison with other existing state-of-the-art techniques, using the same dataset. INDEX TERMS Environmental sound classification, optimal allocation sampling, spectrogram, convolutional neural network, classification techniques.