A General Audio Classifier based on human perception motivated model 1 (original) (raw)

Applying neural network on the content-based audio classification

Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint

Many audio and multimedia applications would benefit if they could interpret the content of audio rather than relying on descriptions or keywords. These applications include multimedia databases and file systems, digital libraries, automatic segmentation or indexing of video (e.g., news or sports storage), and surveillance. This paper describes a novel content-based audio classification approach based on neural network and genetic algorithm. Experiments show this approach achieves a good performance of the classification.

Automatic Discrimination in Audio Documents

In this paper, we present a content-based classification approach for audio indexing. The classification is based on low level audio features such as temporal sound, energy, fundamental frequency, zero crossing rate, auto correlation curve, and based on transformations such as Short Time Fourier Transform (STFT). The audio classification in classes such as music, noise, silence, and speech is an efficient indexing method, because it permits efficient searches, by limiting the searches in the suitable classes.

Methods of automatic audio content classification

2007

This study presents an overview of different methods of digital signal processing and pattern recognition that are frequently applicable to automatic recognition, classification and description of audio content. Moreover, strategies for the combination of the said methods are discussed. Some of the published practical applications from different areas are cited to illustrate the use of the basic methods and the combined recognition strategies. A brief overview of human auditory perception is also given, with emphasis on the aspects that are important for audio recognition.

Automatic Classification of Audio Data

Systems, Man and …, 2005

In this paper a novel content-based musical genre classification approach that uses combination of classifiers is proposed. First, musical surface features and beatrelated features are extracted from different segments of digital music in MP3 format. Three 15-dimensional feature vectors are extracted from three different parts of a music clip and three different classifiers are trained with such feature vectors. At the classification mode, the outputs provided by the individual classifiers are combined using a majority vote rule. Experimental results show that the proposed approach that combines the output of the classifiers achieves higher correct musical genre classification rate than using single feature vectors and single classifiers.

An Overview on Perceptually Motivated Audio Indexing and Classification

Proceedings of the IEEE, 2000

An audio indexing system aims at describing audio content by identifying, labeling or categorizing different acoustic events. Since the resulting audio classification and indexing is meant for direct human consumption, it is highly desirable that it produces perceptually relevant results. This can be obtained by integrating specific knowledge of the human auditory system in the design process to various extent. In this paper, we highlight some of the important concepts used in audio classification and indexing that are perceptually motivated or that exploit some principles of perception. In particular, we discuss several different strategies to integrate human perception including 1) the use of generic audition models, 2) the use of perceptually-relevant features for the analysis stage that are perceptually justified either as a component of a hearing model or as being correlated with a perceptual dimension of sound similarity, and 3) the involvement of the user in the audio indexing or classification task. In the paper, we also illustrate some of the recent trends in semantic audio retrieval that approximate higher level perceptual processing and cognitive aspects of human audio recognition capabilities including affect-based audio retrieval.

Audio feature extraction and analysis for scene classification

Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing

Understanding of the scene content of a video sequence is very important for content-based indexing and retrieval of multimedia databases. Research in this area in the past several years has focused on the use of speech recognition and image analysis techniques. As a complimentary effort to the prior work, we have focused on using the associated audio information (mainly the nonspeech portion) for video scene analysis. As an example, we consider the problem of discriminating five types of TV programs, namely commercials, basketball games, football games, news reports, and weather forecasts. A set of low-level audio features are proposed for characterizing semantic contents of short audio clips. The linear separability of different classes under the proposed feature space is examined using a clustering analysis. The effective features are identified by evaluating the intracluster and intercluster scattering matrices of the feature space. Using these features, a neural net classifier was successful in separating the above five types of TV programs. By evaluating the changes between the feature vectors of adjacent clips, we also can identify scene breaks in an audio sequence quite accurately. These results demonstrate the capability of the proposed audio features for characterizing the semantic content of an audio sequence. where s n (i) is the i-th sample in the n-th frame audio signal and N is the frame length.

CONTENT BASED AUDIO CLASSIFICATION USING ARTIFICIAL NEURAL NETWORK TECHNIQUES

IAEME PUBLICATION, 2018

Audio signals which include speech, music and environmental sounds are important types of media. The problem of distinguishing audio signals into these different audio types is thus becoming increasingly significant. A human listener can easily distinguish between different audio types by just listening to a short segment of an audio signal. However, solving this problem using computers has proven to be very difficult. Nevertheless, many systems with modest accuracy could still be implemented. The experimental results demonstrate the effectiveness of our classification system. The complete system is developed in ANN Techniques with Autonomic Computing system.

Classification of general audio data for content-based retrieval

Pattern Recognition Letters, 2001

In this paper, we address the problem of classi®cation of continuous general audio data (GAD) for content-based retrieval, and describe a scheme that is able to classify audio segments into seven categories consisting of silence, single speaker speech, music, environmental noise, multiple speakers' speech, simultaneous speech and music, and speech and noise. We studied a total of 143 classi®cation features for their discrimination capability. Our study shows that cepstralbased features such as the Mel-frequency cepstral coecients (MFCC) and linear prediction coecients (LPC) provide better classi®cation accuracy compared to temporal and spectral features. To minimize the classi®cation errors near the boundaries of audio segments of dierent type in general audio data, a segmentation±pooling scheme is also proposed in this work. This scheme yields classi®cation results that are consistent with human perception. Our classi®cation system provides over 90% accuracy at a processing speed dozens of times faster than the playing rate. Ó

A generic audio classification and segmentation approach for multimedia indexing and retrieval

2006

Abstract We focus the attention on the area of generic and automatic audio classification and segmentation for audio-based multimedia indexing and retrieval applications. In particular, we present a fuzzy approach toward hierarchic audio classification and global segmentation framework based on automatic audio analysis providing robust, bi-modal, efficient and parameter invariant classification over global audio segments. The input audio is split into segments, which are classified as speech, music, fuzzy or silent.