Creating Rich Metadata in the TV Broadcast Archives Environment: The PrestoSpace Project (original) (raw)
Related papers
Automated Metadata Extraction for Semantic Access to Spoken Word Archives
2011
Although oral culture has been part of our history for thousands of years, we have only fairly recently been enabled to record and preserve that part of our heritage. Over the past century millions of hours of audiovisual data have been collected. 2 Typically, audiovisual (A/V) archival institutes are the keepers of these collections, a significant part of which contains spoken word materials, such as interviews, speeches and radio broadcasts.
2013
Abstract. The EUscreen project represents the European television archives and acts as a domain aggregator for Europeana, Europe’s digital library. The man motivation for it is to provide unified access to a representative collection of television programs, secondary sources and articles, and in this way to allow students, scholars and the general public to study the history of television in its wider context. The main goals of EUscreen are to (i) develop a state-of-the-art workflow for content ingestion, (ii) define content selection and IPR management methodology, and (iii) provide a front-end that accommodates requirements of several user groups.
Constructing and application of multimedia TV-news archives
Expert Systems with Applications, 2008
This paper addresses an integrated information mining techniques for broadcasting TV-news. This utilizes technique from the fields of acoustic, image, and video analysis, for information on news story title, newsman and scene identification. The goal is to construct a compact yet meaningful abstraction of broadcast TV-news, allowing users to browse through large amounts of data in a non-linear fashion with flexibility and efficiency. By adding acoustic analysis, a news program can be partitioned into news and commercial clips, with 90% accuracy on a data set of 400 h TV-news recorded off the air from July 2005 to August 2006. By applying speaker identification and/ or image detection techniques, each news stories can be segmented with a better accuracy of 95.92%. On-screen captions or subtitles are recognized by OCR techniques to produce the text title of each news stories. The extracted title words can be used to link or to navigate more related news contents on the WWW. In cooperation with facial and scene analysis and recognition techniques, OCR results can provide users with multimodal query on specific news stories. Some experimental results are presented and discussed for the system reliability, performance evaluation and comparison.
Multimedia Tools and Applications, 2014
This paper presents a novel multimedia information system, called SAPTE, for supporting the discourse analysis and information retrieval of television programs from their corresponding video recordings. Unlike most common systems, SAPTE uses both content independent and dependent metadata, which are determined by the application of discourse analysis techniques as well as image and audio analysis methods. The proposed system was developed in partnership with the free-to-air Brazilian TV channel Rede Minas in an attempt to provide TV researchers with computational tools to assist their studies about this media universe. The system is based on the Matterhorn framework for managing video libraries, combining: (1) discourse analysis techniques for describing and indexing the videos, by considering aspects, such as, definitions of the subject of analysis, the nature of the speaker and the corpus of data resulting from the discourse; (2) a state of the art decoder software Multimed Tools Appl for large vocabulary continuous speech recognition, called Julius; (3) image and frequency domain techniques to compute visual signatures for the video recordings, containing color, shape and texture information; and (4) hashing and k-d tree methods for data indexing. The capabilities of SAPTE were successfully validated, as demonstrated by our experimental results, indicating that SAPTE is a promising computational tool for TV researchers.
Relevance of ASR for the automatic generation of keywords suggestions for TV programs
L'accès aux documents multimédia, dans une archive audiovisuelle, dépend en grande partie de la quantité et de la qualité des métadonnées attachées aux documents, notamment la description de leur contenu. Cependant, l'annotation manuelle des collections est astreignante pour le personnel. De nombreuses archives évoluent vers des méthodes d'annotation (semi-)automatiques pour la création et/ou l'amélioration des métadonnées. Le project CATCH-CHOICE, fondé par NWO, s'est penché sur l'extraction de mots clés à partir de resources textuelles liées aux programmes TV destinés à être archivés (péritextes), en collaboration avec les archives audiovisuelles néerlandaises, Sound and Vision. Cet article se penche sur la question de l'adéquation des transcriptions de Reconnaissance Automatique de la Parole développés dans le projet CATCH-CHoral pour la génération automatique de mots-clés : les mots-clés extraits de ces ressources sont évalués par rapport à des annotations manuelles et par rapport à des mots-clés générés à partir de péritextes décrivant les programmes télévisuels.
This paper examines the issue of metadata lifecycle management, highlighting the need for conscious metadata creation to become part of the digital media production workflow in order to facilitate effective digital curation. Based on a study of public television workflow conducted in 2006, it reports on ways metadata might be captured from existing resources within the television production process, and ingested into a preservation repository. The authors identify areas in the workflow where small changes may yield significant improvements for preservation and curation of assets. This report also proposes methods by which producers and custodians of cultural content can work together to ensure the longevity of digital information, and outlines further studies that might turn these preliminary studies into standard practice. It encourages digital curators in all disciplines to begin looking back up the content production process for sources of rich metadata, rather than relying primarily on that created at the archival end.
PB Core — the Public Broadcasting Metadata Initiative: Progress Report
2003
PB Core is the result of the public broadcasting metadata initiative (PBMI). It is an effort of the public radio and television broadcasters to develop a schema for the description of their assets. PBMI is under the auspices of the Corporation for Public Broadcasting. The paper discusses the user-centered development of the schema, the elements of the PB Core, the application profile, and the feedback and evaluation process of the schema.
Automated Content Metadata Extraction Services Based on MPEG Standards
2012
This paper is concerned with the generation, acquisition, standardized representation and transport of video metadata. The use of MPEG standards in the design and development of interoperable media architectures and web services is discussed. A high-level discussion of several algorithms for metadata extraction is presented. Some architectural and algorithmic issues encountered when designing services for real-time processing of video streams, as opposed to traditional offline media processing, are addressed. A prototype real-time video analysis system for generating MPEG-7 Audiovisual Description Profile from MPEG-2 transport stream encapsulated video is presented. Such a capability can enable a range of new services such as content-based personalization of live broadcasts given that the MPEG-7 based data models fit in well with specifications for advanced television services such as TV-Anytime and Alliance for Telecommunications Industry Solutions IPTV Interoperability Forum.