Multimedia Scenario Extraction and Content Indexing for E-Learning (original) (raw)

MULTIMODAL INTERACTIONS FOR MULTIMEDIA CONTENT ANALYSIS

ICTACS 2006 - Proceedings of the First International Conference on Theories and Applications of Computer Science 2006, 2007

In this paper, we are presenting a model for multimodal content analysis. We are distinguishing between media and modality, which helps us to define and to characterize 3 inter-modal relations. Then we are applying this model for recorded course analysis for e-learning. Different useful relations between modalities are explained and detailed for this application. We are also describing on two other applications: telemonitoring and minute meetings. Then we compare the use of multimodality in these applications with existing inter-modal relations.

Multimedia analysis techniques for e-learning

International Journal of Learning Technology, 2012

Multimedia analysis techniques can enable e-learning systems and applications to understand multimedia content automatically. Therefore, such techniques can provide various novel services to both e-learning video providers and learners. This paper aims at providing lecturers and learners with a state-of-the-art overview of multimedia analysis techniques used mainly in internet video for e-learning purposes. Finally, the paper presents some of the open issues for further research in the area of video analysis for e-learning.

Multimodal approaches to video analysis of digital learning environments

Contemporary digital learning environments, such as tangibles, mobile and sensor-based technologies are inherently multimodal, both in terms of representation modalities and interaction modes. Drawing on social semiotics, multimodality offers a valuable approach for analysing video data, as it systematically attends to the interpretation of a wide range of communicational forms (e.g. gaze, posture, action, speech) used for making meaning. This paper explores a multimodal approach to video data in the context of ubiquitous technologies for learning.

TOWARDS MULTIMODAL CONTENT REPRESENTATION Discussion paper

2008

Multimodal interfaces, combining the use of speech, graphics, gestures, and facial expressions in input and output, promise to provide new possibilities to deal with information in more effective and efficient ways, supporting for instance: the understanding of possibly imprecise, partial or ambiguous multimodal input; the generation of coordinated, cohesive, and coherent multimodal presentations; the management of multimodal interaction (e.g., task completion, adapting the interface, error prevention) by representing and exploiting models of the user, the domain, the task, the interactive context, and the media (e.g. text, audio, video).

Exploiting Visual Cues in Non-Scripted Lecture Videos for Multi-modal Action Recognition

2012

The usage of non-scripted lecture videos as a part of learning material is becoming an everyday activity in most of higher education institutions due to the growing interest in flexible and blended education. Generally these videos are delivered as part of Learning Objects (LO) through various Learning Management Systems (LMS). Currently creating these video learning objects (VLO) is a cumbersome process. Because it requires thorough analyses of the lecture content for meta-data extraction and the extraction of the structural information for indexing and retrieval purposes. Current e-learning systems and libraries (such as libSCORM) lack the functionally for exploiting semantic content for automatic segmentation. Without the additional meta-data and structural information lecture videos thus do not provide the required level of interactivity required for flexible education. As a result, they fail to captivate students' attention for long time and thus their effective use remains a challenge.

Instructional Video Content Analysis Using Audio Information

IEEE Transactions on Audio, Speech and Language Processing, 2000

Automatic media content analysis and understanding for efficient topic searching and browsing are current challenges in the management of e-learning content repositories. This paper presents our current work on analyzing and structuralizing instructional videos using pure audio information. Specifically, an audio classification scheme is first developed to partition the soundtrack of an instructional video into homogeneous audio segments where each segment has a unique sound type such as speech or music. We then apply a statistical approach to extract discussion scenes in the video by modeling the instructor with a Gaussian mixture model (GMM) and updating it on the fly. Finally, we categorize obtained discussion scenes into either two-speaker or multispeaker discussions using an adaptive mode-based clustering approach. Experiments carried out on four training videos and five IBM Mi-croMBA class videos have yielded encouraging results. It is our belief that by detecting and identifying various types of discussions, we are able to better understand and annotate the learning media content and subsequently facilitate its content access, browsing, and retrieval.

Creating Topic-Specific Automatic Multimodal Presentation Mining the World Wide Web Information

The paper describes the integration between web intelligence and character-based software agent manipulation with the notion of autonomous information services. The system, 'Auto-Presentation', builds a presentation automatically by parsing, summarizing and correlating information collected from the Internet based knowledge sources after receiving the presentation topic from the user. The system, with the help of a group of character based software-agents, presents automatically the retrieved information about the topic verbally with accompanied slides, different gestures and affects associated with presenter (e.g. the character agents). With a brief literature re-view, in section 1 the basic idea of the system is explained. Section 2 describes the architecture and explains different components of 'Auto-Presentation'. Section 3 describes necessary algorithms. Section 4 depicts some test results and evaluations. Section 5 concludes the paper with the trail of future work.

A Pipeline for Extracting Multi-Modal Markers for Meaning in Lectures

2018

This article introduces initial concepts for a context sensitive computing pipeline to detect multimodal markers for meaning from video and audio data, to notify the audience of markers of importance and then to classify sequences of a recorded video into segments by content and importance in order to summarise the content as video and audio and in other modalities. In this paper, we first consider the linguistic background, then show the input data for the pipeline. Finally, we outline the concepts which are to be implemented in each step of this pipeline and discuss how the evaluation for this pipeline can be achieved.