Developing Audio Processing Agents for Multi-Agent MPEG-7 Enabled Environment (original) (raw)

The Community of Multimedia Agents project

Proceedings. IEEE International Conference on Multimedia and Expo, 2002

Challenges in multimedia analysis are calling for the sharing of research efforts, while in practice collaboration is hindered by technical and proprietary issues. The Community of Multimedia Agents project (COMMA) attempts to solve this problem by creating an open environment ...

From Multimedia to the Semantic Web using MPEG7 and Computational Intelligence

2004

We present an architecture that provides semantic Web annotations of sound clips described by MPEG-7 audio descriptions. The great flexibility of the MPEG-7 standard makes especially difficult to compare descriptions coming from heterogeneous sources. To cope with this, the architecture would first obtain "normalized" versions of the audio descriptions using different adaptation techniques. Once in a "normalized" format, descriptions can be then projected into uniform and semantically relevant vector spaces, ready to be processed by a variety of well known computational intelligence techniques. As higher semantic results are then available, these can be exported as interoperable (RDF) annotations about the resource that was originally fed into the system. As novel aspect, through the use and interchange of MPEG-7 descriptions, the framework allows building applications (e.g. classificators) which can provide annotations on distributed audio resource sets.

Cooperating experts for soundtrack analysis of MPEG movies

Information Fusion, 2002

In this paper, we propose a multi-expert classification system (MES) for the audio classification of MPEG movies. The system has been designed according to an hybrid architecture which is made of three cascaded stages and constitutes an ensemble of different classifiers, each one implemented by means of a multi-expert architecture.Classification of the audio tracks exploits four pure classes (music, speech, silence and noise) plus three hybrid classes associated to complex patterns resulting from the overlap of different components (e.g., speech overlapped with music or noise).The soundtracks of 30 movies selected from various genres have been used for building a wide database of samples and for the successive assessment of system performance. A significant amount of experimental results obtained by the proposed MES, by other classification systems using a single classifier, and by another MES using a parallel fusion scheme, are reported in the paper together with comments and comparative analyses.In addition, the paper demonstrates the application of the knowledge arising from an analysis of intermediate classification results in order to obtain indications about the definition of the MES architecture. The results achieved by using our system are extremely encouraging when compared with those obtained by the other MES.