Development of a Reference Platform for Generic Audio Classification (original) (raw)

Content-Based Audio Classification using Segmentation, MFCC Feature Extraction and Neural Network Approach

The access to audio data available in huge volume on public networks like Internet requires an efficient indexing and annotation mechanism. Non-stationary nature and discontinuities in audio signal had made the segmentation and classification of audio signal difficult. Also the difficulty in extracting and selecting optimal features in audio signal, automatic music classification and annotation is a challenging task. Audio classification and retrieval systems are used in application areas like speaker recognition, gender classification, music genre classification, etc. One of the major challenge in developing audio retrieval systems is identifying appropriate content-based features for representation of the audio-signals. Hence, we have proposed a solution which segments, extracts features and classify the audio signals.

Content-Based Audio Classification and Retrieval: A Novel Approach

The amount of audio data on public networks like Internet is increasing in huge volume daily. So to access these media, we need to efficiently index and annotate them. Due to non-stationary nature and discontinuities present in the audio signal, segmentation and classification of audio signal has really become a challenging task. Automatic music classification and annotation is also one of the challenging tasks due to the difficulty in extracting and selecting optimal audio features. Today, content-based audio retrieval systems are used in various application domains and scenarios such as music retrieval, speech recognition, and acoustic surveillance. During the development of an audio retrieval system, a major challenge is the identification of appropriate contentbased features for representation of the audio signals under consideration. This paper gives the overview of various techniques used for classification and retrieval of audio and also proposes a novel approach for classification and retrieval of audio signal.

Review on Design and Implementation of Audio Signal Classification System to classify the Media in Speech/Music

Over the last few years exceeding efforts have been made to develop methods for extracting information from audiovisual media, mandate that they may be stored and retrieved in databases automatically. Audio classification serves as the fundamental step towards the quickly growth in audio data volume. Automatic audio classification is very useful in content based audio retrieval and online audio distribution. The accuracy of the classification relies on the efficacy of the features and classification scheme. In this work both, time domain and frequency domain features are extracted from the input signal. Time domain feature is Root Mean Square (RMS). Frequency domain feature is spectral flux. After feature extraction, classification will be. The selection of the important features is explained as well as the classifiers used for classification are compared.

Classification of general audio data for content-based retrieval

Pattern Recognition Letters, 2001

In this paper, we address the problem of classi®cation of continuous general audio data (GAD) for content-based retrieval, and describe a scheme that is able to classify audio segments into seven categories consisting of silence, single speaker speech, music, environmental noise, multiple speakers' speech, simultaneous speech and music, and speech and noise. We studied a total of 143 classi®cation features for their discrimination capability. Our study shows that cepstralbased features such as the Mel-frequency cepstral coecients (MFCC) and linear prediction coecients (LPC) provide better classi®cation accuracy compared to temporal and spectral features. To minimize the classi®cation errors near the boundaries of audio segments of dierent type in general audio data, a segmentation±pooling scheme is also proposed in this work. This scheme yields classi®cation results that are consistent with human perception. Our classi®cation system provides over 90% accuracy at a processing speed dozens of times faster than the playing rate. Ó

Content-Based Classification, Search, and Retrieval of Audio

2013

multimedia applications would benefit from the ability to classify and search for audio based on its characteristics. The audio analysis, search, and classification engine described here reduces sounds to perceptual and acoustical features. This lets users search or retrieve sounds by any one feature or a combination of them, by specifying previously learned classes based on these features, or by selecting or entering reference sounds and asking the engine to retrieve similar or dissimilar sounds.

Audio Analysis and Classification: A Review

International Journal of Research in Advent Technology, 2019

Communication plays a vital role according to the people's emotion, as emotions and gesture play 80% role while communication. Nowadays emotion recognition and classification are used in different areas to understand the human feelings like in the robotics, Health care, Military, Home automation, Hands-free computing, Mobile Telephony, Video game,call-center system, Marketing, etc. SER can help better interaction between the machine and the human. There are various algorithms and combination of the algorithms are available to recognize and classify the audio according to their emotion. In this paper, we attempted to investigate the episodic significant works, their technique and the impact of the approaches and the scope of the correction of the results.

Methods of automatic audio content classification

2007

This study presents an overview of different methods of digital signal processing and pattern recognition that are frequently applicable to automatic recognition, classification and description of audio content. Moreover, strategies for the combination of the said methods are discussed. Some of the published practical applications from different areas are cited to illustrate the use of the basic methods and the combined recognition strategies. A brief overview of human auditory perception is also given, with emphasis on the aspects that are important for audio recognition.

A General Audio Classifier based on human perception motivated model 1

The audio channel conveys rich clues for content-based multimedia indexing. Interesting audio analysis includes, besides widely known speech recognition and speaker identification problems, speech/music segmentation, speaker gender detection, special effect recognition such as gun shots or car pursuit, and so on. All these problems can be considered as an audio classification problem which needs to generate a label from low audio signal analysis. While most audio analysis techniques in the literature are problem specific, we propose in this paper a general framework for audio classification. The proposed technique uses a perceptually motivated model of the human perception of audio classes in the sense that it makes a judicious use of certain psychophysical results and relies on a neural network for classification.

Audio Classification Based on MFCC and GMM under Noise for Embedded System

Nowadays, digital audio applications are part of our everyday lives. Those applications segment the audio stream into some kind of catalogues audio, and have the corresponding responding to each kind of catalogues audio. Such as in IP Network Camera (IPNC) system, when detected the screaming or window breaking signal, the IPNC system turns the motor towards source generating abnormal sounds. So far, a wide variety of features, being extracted from audio signals in either the temporal or frequency domains. Of these, the Mel-Frequency Cepstral features (MFCC), which are frequency transformed and logarithmically scaled, appear to be universally recognized as the most generally effective for analyzing human voice. The most common classification methods used for this audio class recognition include Gaussian Mixture Models (GMM), K-Nearest Neighbor(k-NN), Neural Networks (NN), support vector machines (SVM), and Hidden Markov Models(HMM) The choice of classification method has been shown to be largely insignificant. In this paper, we took Gaussian Mixture Models (GMM) to classify the audio signal.

Applying neural network on the content-based audio classification

Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint

Many audio and multimedia applications would benefit if they could interpret the content of audio rather than relying on descriptions or keywords. These applications include multimedia databases and file systems, digital libraries, automatic segmentation or indexing of video (e.g., news or sports storage), and surveillance. This paper describes a novel content-based audio classification approach based on neural network and genetic algorithm. Experiments show this approach achieves a good performance of the classification.