Real-time detection of sport in MPEG-2 sequences using high-level AV-descriptors and SVM (original) (raw)

An automatic system for real-time video-genres detection using high-level-descriptors and a set of classifiers

2008 IEEE International Symposium on Consumer Electronics, 2008

We present a new approach for classifying mpeg-2 video sequences as 'cartoon', 'commercial', 'music', 'news' or 'sport' by analyzing specific, high-level audio-visual features of consecutive frames in real-time. This is part of the well-known video-genre-classification problem, where popular TV-broadcast genres are studied. Such applications have also been discussed in the context of MPEG-7 [1]. In our method the extracted features are logically combined using a set of classifiers to produce a reliable recognition. The results demonstrate a high identification rate based on a large representative collection of 100 video sequences (20 sequences per genre) gathered from free digital TVbroadcasting in Europe.

A Generic Approach for Systematic Analysis of Sports Videos

ACM Transactions on Intelligent Systems and Technology, 2012

Various innovative and original works have been applied and proposed in the field of sports video analysis. However, individual works focused on sophisticated methodologies with particular sport types and there was a lack of scalable and holistic framework in this field. This paper proposes a solution for this issue and presents a systematic and generic approach which is experimented on a relatively large-scale sports consortia. The system aims at the event detection scenario of an input video with an orderly sequential process. Initially, domain-knowledge independent local descriptors are extracted homogeneously from the input video sequence. Then the video representation is created by adopting a bag-of-visual-words (BoW) model. The video's genre is firstly identified by applying the k-nearest neighbor (k-NN) classifiers on the initially obtained video representation, with various dissimilarity measures are assessed and evaluated analytically. Subsequently, an unsupervised probabilistic latent semantic analysis (PLSA) based approach is employed at the same histogram-based video representation, in characterizing each frame of video sequence into one of four view groups, namely closed-up-view, mid-view, long-view and outer-fieldview. Finally, A hidden conditional random field (HCRF) structured prediction model is utilized for interesting event detection. From experimental results, k-NN classifier using KL-divergence measurement demonstrates the best accuracy at 82.16% for genre categorization. Supervised SVM and unsupervised PLSA have average classification accuracies at 82.86% and 68.13%, respectively. The HCRF model achieves 92.31% accuracy using the unsupervised PLSA based label input, which is comparable with the supervised SVM based input at an accuracy of 93.08%. In general, such a systematic approach can be widely applied in processing massive videos generically. This article extends the previous work by the authors appearing under the title "Automatic sports genre categorization and view-type classification over large-scale dataset," [Li et al. 2009].

Real-time event classification in field sport videos

Signal Processing: Image Communication, 2015

The paper presents a novel approach to real-time event detection in sports broadcasts. We present how the same underlying audiovisual feature extraction algorithm based on new global image descriptors is robust across a range of different sports alleviating the need to tailor it to a particular sport. In addition, we propose and evaluate three different classifiers in order to detect events using these features: a feed-forward neural network, an Elman neural network and a decision tree. Each are investigated and evaluated in terms of their usefulness for real-time event classification. We also propose a ground truth dataset together with an annotation technique for performance evaluation of each classifier useful to others interested in this problem.

New Real-Time Approaches for Video-Genre-Classification Using High-Level Descriptors and a Set of Classifiers

2008 IEEE International Conference on Semantic Computing, 2008

In this paper we describe in detail the recent publications related to video-genre-classification and present our improved approaches for classifying video sequences in real-time as 'cartoon', 'commercial', 'music', 'news' or 'sport' by analyzing the content with high-level audio-visual descriptors and classification methods. Such applications have also been discussed in the context of . The results demonstrate identification rates of more than 90% based on a large representative collection of 100 videos gathered from free digital TV and Internet.

Towards Universal and Statistical-Driven Heuristics for Automatic Classification of Sports Video Events

2006 12th International Multi-Media Modelling Conference, 2006

Researchers worldwide have been acti ely seeking for thlc most robust and power-ful solutions to detect and classify keye events (or highlights) in vanious sports domains. Most approaches have employed manual heurIistics that model the typical pattern of audio-visual featLuL-es within particular sport events To avoid manual obser\.!ation and knowledgemachine-leaming can be Used as an alternative appi-oach. To briidge the gaps betw.een these twvo alternatives, an attempt is made to

Content-Based Video Classification Using Support Vector Machines

Lecture Notes in Computer Science, 2004

In this paper, we investigate the problem of video classification into predefined genre. The approach adopted is based on spatial and temporal descriptors derived from short video sequences (20 seconds). By using support vector machines (SVMs), we propose an optimized multiclass classification method. Five popular TV broadcast genre namely cartoon, commercials, cricket, football and tennis are studied. We tested our scheme on more than 2 hours of video data and achieved an accuracy of 92.5%.

Automatic Identification of Sports Video Highlights using Viewer Interest Features

Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Classification of viewer interest using facial expression and heart rate facilitates automatic identification of interest evoking video segments. Sports video is suitable for testing the effectiveness of such a system as it has structured segments with distinguishable highlight events. Previous work has not investigated the differences in viewer interest characteristics from one sports type to another, which is crucial for appropriate classification methodology. Thus, it is still unclear whether a universal classification model can be used for analyzing viewer interest for all types of sports. This paper addresses this gap by demonstrating a significant difference (p < 0.05) in the distributions of viewer interest data in soccer compared to tennis. Based on this finding, this paper proposes an adoption of Gaussian mixture models (GMM) to integrate sports-specific and sports-independent approaches for identifying video segments, which would be of potential interest to individual viewers. The approaches achieve 52% to 64% accuracy, demonstrating that sports-specific approach gets better performance.

Semantic indexing of sports program sequences by audio-visual analysis

Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429), 2003

Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper, we propose a semantic indexing algorithm for soccer programs which uses both audio and visual information for content characterization. The video signal is processed first by extracting low-level visual descriptors from the MPEG compressed bit-stream. The temporal evolution of these descriptors during a semantic event is supposed to be governed by a controlled Markov chain. This allows to determine a list of those video segments where a semantic event of interest is likely to be found, based on the maximum likelihood criterion. The audio information is then used to refine the results of the video classification procedure by ranking the candidate video segments in the list so that the segments associated to the event of interest appear in the very first positions of the ordered list. The proposed method is applied to goal detection. Experimental results show the effectiveness of the proposed cross-modal approach.

A Statistical-driven Approach for Automatic Classification of Events in AFL Video Highlights

Proceedings of the Twenty-Eighth Australasian Computer Science Conference, 2005

Due to the repetitive and lengthy nature, automatic content-based summarization is essential to extract a more compact and interesting representation of sport video. State-of-the art approaches have confirmed that high-level semantic in sport video can be detected based on the occurrences of specific audio and visual features (also known as cinematic). However, most of them still rely heavily on manual investigation to construct the algorithms for highlight detection. Thus, the primary aim of this paper is to demonstrate how the statistics of cinematic features within play-break sequences can be used to less-subjectively construct highlight classification rules. To verify the effectiveness of our algorithms, we will present some experimental results using six AFL (Australian Football League) matches from different broadcasters. At this stage, we have successfully classified each play-break sequence into: goal, behind, mark, tackle, and non-highlight. These events are chosen since they are commonly used for broadcasted AFL highlights. The proposed algorithms have also been tested successfully with soccer video.

Time interval based modelling and classification of events in soccer video

2003

Multimodal indexing of events in video documents poses problems with respect to representation, inclusion of contextual information, and synchronization of the heterogeneous information sources involved. In this paper we propose to model events in multimodal video by means of time interval relations, to tackle aforementioned problems. Our approach exploits the powerful properties of statistical classifiers, like the Maximum Entropy and Support Vector Machine classifiers. To demonstrate the viability of our approach for event classification in multimodal video, an evaluation was performed on the domain of soccer broadcasts. It was found that the amount of video a user has to watch in order to see almost all highlights can be reduced considerably. Furthermore, we found that a Support Vector Machine performs better than a Maximum Entropy classifier.