InsightVideo: Towards hierarchical video content organization for efficient browsing, summarization and retrieval (original) (raw)

A New Hybrid Approach to Video Organization for Content-Based Indexing

Proceedings of the Ieee International Conference on Multimedia Computing and Systems, 1998

Video organization is a key step in the content-based indexing of video archives. The objective of video organization is to capture the semantic structure of a video in a form which is meaningful to the user. We present a hybrid approach to video organization which automatically processes video, creating a video table of contents (VTOC), while providing easy-to-use interfaces for verification, correction and augmentation of the automatically extracted video structure. Algorithms are developed to solve the subproblems of shot detection, shot grouping and VTOC generation without making very restrictive assumptions about the structure or content of the video. We use a nonstationary time series model of difference metrics for shot boundary detection, color and edge similarities for shot grouping and observations about the structure of a wide class of videos for the generation of the VTOC. The use of automatic processing in conjunction with input from the user allows us to produce meaningful video organization efficiently.

ClassView : Hierarchical Video Shot Classification, Indexing, and Accessing

IEEE Transactions on Multimedia, 2004

Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1 ) Semantics-sensitive video classification problem because of the semantic gap between low-level visual features and high-level semantic visual concepts; 2) Integrated video access problem because of the lack of efficient video database indexing, automatic video annotation, and concept-oriented summary organization techniques. In this paper, we have proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access. 1) A hierarchical semantics-sensitive video classifier is proposed to shorten the semantic gap. The hierarchical tree structure of the semantics-sensitive video classifier is derived from the domain-dependent concept hierarchy of video contents in a database.

A Survey on Visual Content-Based Video Indexing and Retrieval

IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2000

Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions.

A motion-based scene tree for browsing and retrieval of compressed videos

Information Systems, 2006

This paper describes a fully automatic content-based approach for browsing and retrieval of MPEG-2 compressed video. The first step of the approach is the detection of shot boundaries based on motion vectors available from the compressed video stream. The next step involves the construction of a scene tree from the shots obtained earlier.

Hierarchical Browsing of Video Key Frames

Lecture Notes in Computer Science, 2007

We propose an innovative, general purpose, method to the selection and hierarchical representation of key frames of a video sequence for video summarization. It is able to create a hierarchical storyboard that the user may easily browse. The method is composed by three different steps. The first removes meaningless key frames, using supervised classification performed by a neural network on the basis of pictorial features and a visual attention model algorithm. The second step provides for the grouping of the key frames into clusters to allow multilevel summary using both low level and high level features. The third step identifies the default summary level that is shown to the users: starting from this set of key frames, the users can then browse the video content at different levels of detail.

Hierarchical video content

2003

Video is increasingly the medium of choice for a variety of communication channels, resulting primarily from increased levels of networked multimedia systems. One way to keep our heads above the video sea is to provide summaries in a more tractable format. Many existing approaches are limited to exploring important low-level feature related units for summarization. Unfortunately, the semantics, content and structure of the video do not correspond to low-level features directly, even with closed-captions, scene detection, and audio signal processing. The drawbacks of existing methods are the following: (1) instead of unfolding semantics and structures within the video, low-level units usually address only the details, and (2) any important unit selection strategy based on low-level features cannot be applied to general videos. Providing users with an overview of the video content at various levels of summarization is essential for more efficient database retrieval and browsing. In this paper, we present a hierarchical video content description and summarization strategy supported by a novel joint semantic and visual similarity strategy. To describe the video content efficiently and accurately, a video content description ontology is adopted. Various video processing techniques are then utilized to construct a semi-automatic video annotation framework. By integrating acquired content description data, a hierarchical video content structure is constructed with group merging and clustering. Finally, a four layer video summary with different granularities is assembled to assist users in unfolding the video content in a progressive way. Experiments on real-word videos have validated the effectiveness of the proposed approach.

MultiView: Multilevel video content representation and retrieval

Journal of Electronic Imaging, 2001

In this article, several practical algorithms are proposed to support content-based video analysis, modeling, representation, summarization, indexing, and access. First, a multilevel video database model is given. One advantage of this model is that it provides a reasonable approach to bridging the gap between low-level representative features and high-level semantic concepts from a human point of view. Second, several model-based video analysis techniques are proposed. In order to detect the video shots, we present a novel technique, which can adapt the threshold for scene cut detection to the activities of variant videos or even different video shots. A seeded region aggregation and temporal tracking technique is proposed for generating the semantic video objects. The semantic video scenes can then be generated from these extracted video access units (e.g., shots and objects) according to some domain knowledge. Third, in order to categorize video contents into a set of semantic clusters, an integrated video classification technique is developed to support more efficient multilevel video representation, summarization, indexing, and access techniques.