A visual-based late-fusion framework for video genre classification (original) (raw)

An in-depth evaluation of multimodal video genre categorization

2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI), 2013

In this paper we propose an in-depth evaluation of the performance of video descriptors to multimodal video genre categorization. We discuss the perspective of designing appropriate late fusion techniques that would enable to attain very high categorization accuracy, close to the one achieved with user-based text information. Evaluation is carried out in the context of the 2012 Video Genre Tagging Task of the MediaEval Benchmarking Initiative for Multimedia Evaluation, using a data set of up to 15.000 videos (3,200 hours of footage) and 26 video genre categories specific to web media. Results show that the proposed approach significantly improves genre categorization performance, outperforming other existing approaches. The main contribution of this paper is in the experimental part, several valuable interesting findings are reported that motivate further research on video genre classification.

Video genre categorization and representation using audio-visual information

2012

We propose an audiovisual approach to video genre classification using content descriptors that exploit audio, color, temporal, and contour information. Audio information is extracted at blocklevel, which has the advantage of capturing local temporal information. At the temporal structure level, we consider action content in relation to human perception. Color perception is quantified using statistics of color distribution, elementary hues, color properties, and relationships between colors. Further, we compute statistics of contour geometry and relationships. The main contribution of our work lies in harnessing the descriptive power of the combination of these descriptors in genre classification. Validation was carried out on over 91 hours of video footage encompassing 7 common video genres, yielding average precision and recall ratios of 87%−100% and 77%−100%, respectively, and an overall average correct classification of up to 97%. Also, experimental comparison as part of 1 the MediaEval 2011 benchmarking campaign demonstrated the superiority of the proposed audiovisual descriptors over other existing approaches. Finally, we discuss a 3D video browsing platform that displays movies using feature-based coordinates and thus regroups them according to genre.

Content-Based Video Description for Automatic Video Genre Categorization

Lecture Notes in Computer Science, 2012

In this paper, we propose an audio-visual approach to video genre categorization. Audio information is extracted at block-level, which has the advantage of capturing local temporal information. At temporal structural level, we asses action contents with respect to human perception. Further, color perception is quantified with statistics of color distribution, elementary hues, color properties and relationship of color. The last category of descriptors determines statistics of contour geometry. An extensive evaluation of this multi-modal approach based on on more than 91 hours of video footage is presented. We obtain average precision and recall ratios within [87% − 100%] and [77% − 100%], respectively, while average correct classification is up to 97%. Additionally, movies displayed according to feature-based coordinates in a virtual 3D browsing environment tend to regroup with respect to genre, which has potential application with real content-based browsing systems.

Effectively leveraging Multi-modal Features for Movie Genre Classification

2022

Movie genre classification has been widely studied in recent years due to its various applications in video editing, summarization, and recommendation. Prior work has typically addressed this task by predicting genres based solely on the visual content. As a result, predictions from these methods often perform poorly for genres such as documentary or musical, since non-visual modalities like audio or language play an important role in correctly classifying these genres. In addition, the analysis of long videos at frame level is always associated with high computational cost and makes the prediction less efficient. To address these two issues, we propose a Multi-Modal approach leveraging shot information 3 , MMShot, to classify video genres in an efficient and effective way. We evaluate our method on MovieNet and Condensed Movies for genre classification, achieving 17%∼21% improvement on mean Average Precision (mAP) over the state-of-the-art. Extensive experiments are conducted to demonstrate the ability of MMShot for long video analysis and uncover the correlations between genres and multiple movie elements. We also demonstrate our approach's ability to generalize by evaluating the scene boundary detection task, achieving 1.1% improvement on Average Precision (AP) over the state-of-the-art.

A multimodal approach for multi-label movie genre classification

Multimedia Tools and Applications, 2020

Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. The number of movie consumers interested in taking advantage of automatic movie genre classification is growing rapidly thanks to the popularization of media streaming service providers. In this paper, we addressed the multi-label classification of the movie genres in a multimodal way. For this purpose, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters taken from 152,622 movie titles from The Movie Database (TMDb) 1. The dataset was carefully curated and organized, and it was also made available 2 as a contribution of this work. Each movie of the dataset was labeled according to a set of eighteen genre labels. We extracted features from these data using different kinds of descriptors, namely Mel Frequency Cepstral Coefficients (MFCCs), Statistical Spectrum Descriptor (SSD), Local Binary Pattern (LBP) with spectrograms, Long-Short Term Memory (LSTM), and Convolutional Neural Networks (CNN). The descriptors were evaluated using different classifiers, such as BinaryRelevance and ML-kNN. We have also investigated the performance of the combination of different classifiers/features using a late fusion strategy, which obtained encouraging results. Based on the F-Score metric, our best result, 0.628, was obtained by the fusion of a classifier created using LSTM on the synopses, and a classifier created using CNN on movie trailer frames. When considering the AUC-PR metric, the best result, 0.673, was also achieved by combining those representations, but in addition, a classifier based on LSTM created from the subtitles was used. These results corroborate the existence of complementarity among classifiers based on different sources of information in this field of application. As far as we know, this is the most comprehensive study developed in terms of the diversity of multimedia sources of information to perform movie genre classification.

Automatic genre identification for content-based video categorization

Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, 2000

This paper presents a set of computational features originating from our study of editing effects, motion, and color used in videos, for the task of automatic video categorization. These features besides representing human understanding of typical attributes of different video genres, are also inspired by the techniques and rules used by many directors to endow specific characteristics to a genre-program which lead to certain emotional impact on viewers. We propose new features whilst also employing traditionally used ones for classification. This research, goes beyond the existing work with a systematic analysis of trends exhibited by each of our features in genres such as cartoons, commercials, music, news, and sports, and it enables an understanding of the similarities, dissimilarities, and also likely confusion between genres. ClassiJication results from our experiments on several hours of video establish the usefulness of this feature set. We also explore the issue of video clip duration required to achieve reliable genre identification and demonstrate its impact on classification accuracy.

Content-based Automatic Video Genre Identification

International Journal of Advanced Computer Science and Applications, 2019

Video content is evolving enormously with the heavy usage of internet and social media websites. Proper searching and indexing of such video content is a major challenge. The existing video search potentially relies on the information provided by the user, such as video caption, description and subsequent comments on the video. In such case, if users provide insufficient or incorrect information about the video genre, the video may not be indexed correctly and ignored during search and retrieval. This paper proposes a mechanism to understand the contents of video and categorize it as Music Video, Talk Show, Movie/Drama, Animation and Sports. For video classification, the proposed system uses audio and visual features like audio signal energy, zero crossing rate, spectral flux from audio and shot boundary, scene count and actor motion from video. The system is tested on popular Hollywood, Bollywood and YouTube videos to give an accuracy of 96%.

AUTOMATIC VIDEO GENRE CLASSIFICATION

Video Classification has been an active research area for many years. Video Classification algorithms can be broadly classified into two types. The first type of classifier is a category specific video classifier, which classifies video from a particular category such as sports into categories such as tennis, baseball. The second type of classifier is a generic video classifier, which classifiers the videos into generic categories, such as sports, commercials, news, animation etc. This work aims at generic video classification and exploits motion information and cross correlation measure for classification.

Genre-specific modeling of visual features for efficient content based video shot classification and retrieval

International Journal of Multimedia Information Retrieval, 2013

This paper presents a genre-specific modeling strategy capable of improving the task of content based video classification and the speed of data retrieval operations. With the ever increasing growth of video data it is important to classify video shots into groups based on its content. For that reason, it is of primary concern to design systems that could automatically classify videos into different genres based on its content. We consider the genre recognition task as a classification problem. We use support vector machines to perform the classification task and propose an improved video classification method. The experimental results show that genre-specific modeling of features can significantly improve the performance. Results have been compared with two contemporary works on video classification, to demonstrate the superiority of our proposed framework.

An automatic system for real-time video-genres detection using high-level-descriptors and a set of classifiers

2008 IEEE International Symposium on Consumer Electronics, 2008

We present a new approach for classifying mpeg-2 video sequences as 'cartoon', 'commercial', 'music', 'news' or 'sport' by analyzing specific, high-level audio-visual features of consecutive frames in real-time. This is part of the well-known video-genre-classification problem, where popular TV-broadcast genres are studied. Such applications have also been discussed in the context of MPEG-7 [1]. In our method the extracted features are logically combined using a set of classifiers to produce a reliable recognition. The results demonstrate a high identification rate based on a large representative collection of 100 video sequences (20 sequences per genre) gathered from free digital TVbroadcasting in Europe.

A visual-based late-fusion framework for video genre classification (original) (raw)

Related papers