Research in multimedia systems at DFKI (original) (raw)

Review of: Intelligent Multimedia Interfaces, Mark T. Maybury (editor) (MITRE Corporation) Menlo Park, CA: AAAI Press and Cambridge, MA: The MIT Press, 1993, vi + 405 pp. Paperbound, ISBN 0-262-63150-4

tems" Section 2: Intelligent Multimedia Interfaces J. D. Burger and R. J. Marshall, "The application of natural language models to intelligent multimedia" O. Stock and the ALFRESCO Project Team, "ALFRESCO: Enjoying the 501 combination of natural language processing and hypermedia for information exploration" S. Abu-Hakima, M. Halasz, and S. Phan, "An approach to hypermedia in diagnostic systems" D. B. Koons, C. J. Sparrell, and K. R. Thorisson, "Integrating simultaneous input from speech, gaze, and hand gestures" Section 3: Architectural and Theoretical Issues Y. Arens, E. H. Hovy, and M. Vossers, "n the knowledge underlying multimedia presentations" M. Cornell, B. P. WooIf, and D. Suthers, "Using 'live information' in a multimedia framework" J. Krause, "A multilayered empirical approach to multimodality: Towards mixed solutions of natural language and graphical interfaces" A. Bonarini, "Modeling...

Intelligent Multimedia Interfaces Mark T. Maybury (editor)

Intelligent Multimedia Interfaces is a follow-up to the AAAI Workshop on Intelligent Multimedia Interfaces held in Anaheim, California, in August 1991. Of interest to computational linguists is the fact that natural language, discourse, and planning research have a much greater presence here than in the previous installment of this line of workshops. A comparable collection of papers, Intelligent User Interfaces (Sullivan and Tyler 1991), emerged from a 1988 AAAI/CHI workshop named Architectures for Intelligent Interfaces: Elements and Prototypes. By my informal, ad hoc count, about half of the papers in the earlier collection had something to do with the AI side of computational linguistics, whereas all but perhaps two of the papers in Maybury's volume do. The work is divided into three sections. Here is the lineup:

Book Reviews: Intelligent Multimedia Interfaces

1994

Intelligent Multimedia Interfaces is a follow-up to the AAAI Workshop on Intelligent Multimedia Interfaces held in Anaheim, California, in August 1991. Of interest to computational linguists is the fact that natural language, discourse, and planning research have a much greater presence here than in the previous installment of this line of workshops. A comparable collection of papers, Intelligent User Interfaces (Sullivan and Tyler 1991), emerged from a 1988 AAAI/CHI workshop named Architectures for Intelligent Interfaces: Elements and Prototypes. By my informal, ad hoc count, about half of the papers in the earlier collection had something to do with the AI side of computational linguistics, whereas all but perhaps two of the papers in Maybury's volume do. The work is divided into three sections. Here is the lineup:

Developing intelligent multimedia applications

… in Language and …, 2002

Intelligent MultiMedia (IntelliMedia) focusses on the computer processing and understanding of signal and symbol input from at least speech, text and visual images in terms of semantic representations. We have developed a general suite of tools in the form of a software and hardware platform called CHAMELEON that can be tailored to conducting IntelliMedia in various application domains. CHAMELEON has an open distributed processing architecture and currently includes ten agent modules: blackboard, dialogue manager, domain model, gesture recogniser, laser system, microphone array, speech recogniser, speech synthesiser, natural language processor, and a distributed Topsy learner. Most of the modules are programmed in C and C++ and are glued together using the DACS communications system. In eect the blackboard, dialogue manager and DACS form the kernel of CHAMELEON. Modules can communicate with each other and the blackboard which keeps a record of interactions over time via semantic representations in frames. Inputs to CHAMELEON can include synchronised spoken dialogue and images and outputs include synchronised laser pointing and spoken dialogue. An initial prototype application of CHAMELEON is an IntelliMedia WorkBench where a user will beable to ask for information about things (e.g. 2D/3D models, pictures, objects, gadgets, people, or whatever) on a physical table. The current domain is a Campus Information System for 2D building plans which provides information about tenants, rooms and routes and can answer questions like \Whose oce is this?" and \Show me the route from Paul Mc Kevitt's oce to Paul Dalsgaard's oce?" in real time. CHAMELEON and the IntelliMedia WorkBench are ideal for testing integrated signal and symbolprocessing of language and vision for the future of SuperinformationhighwayS. 1.1 Background Although there has been much success in developing theories, models and systems in the areas of Natural Language Processing (NLP) and Vision Processing (VP) (Partridge 1991, Rich and Knight 1991) there has been little progress in integrating these two subareas of Articial Intelligence (AI). In the beginning although the general aim of the eld was to build integrated language and vision systems, few were, and these two subelds quickly arose. It is not clear why there has not already been much activity i n i n tegrating NLP and VP. Is it because of the long-time reductionist trend in science up until the recent emphasis on chaos theory, non-linear systems, and emergent behaviour? Or, is it because the people who have tended to work on NLP tend to bein other Departments, or of a dierent ilk, from those who have w orked on VP? Dennett (1991, p. 57-58) says \Surely a major source The research at LIA is directed towards three areas: Systems for computer vision, computer vision for autonomous robots, and medical and industrial application of image analysis. Research within all three areas is sponsored by national and international (EU ESPRIT) research programmes. The main emphasis has been development of methods for continual interpretation of dynamically changing scenes. Example applications include surveillance of indoor and outdoor scenes, vision-guided navigation, and interpretation of human and machine manipulation. Research projects concern extraction of features for description of actions in an environment (i.e. the movement of people, sh, and blood cells) and utilising these descriptions for recognition, monitoring and control of actuators such as mobile robots (safe movements in a dynamically changing environment). This includes recognising and tracking dynamically changing objects, such as hands and human bodies, which has applications in IntelliMedia systems. So far the research has referred to sensory processing using single modalities, but it 1.4.2 Candidate applications In general, applications within IntelliMedia may conceptually be divided into a numberof broad categories such as intelligent assistant applications, teaching, information browsers, database-access, command control and surveillance, and transaction services (banking). Examples of applications which may result within a short term perspective are enhanced reality (e.g. library guide), newspaper reader for blind/near-blind people, intelligent manuals, dedicated personal communicator (DPC), diagnosis systems (e.g. medical data processing) and mixed reality (e.g. surgery support systems). Our next step was to choose an application for CHAMELEON. A number of candidate applications were selected and discussed during the course of a number of meetings. These are listed below: Received: nlp(intention(instruction(pointing)),location(person(tb),type(office)), time(889524794)) which is passed on to dialog_manager Received: dialog_manager(output(laser(point(coordinates(249,623))), speech_synthesizer(utterance("This is Toms office")))) Calling laser: laser(point(coordinates(249,623))) Calling speech_synthesizer: speech_synthesizer(utterance("This is Toms office")) Received: nlp(intention(instruction(pointing)),location(person(tbm),type(office)), time(889524818)) which is passed on to dialog_manager Received: dialog_manager(output(laser(point(coordinates(278,623))), speech_synthesizer(utterance("This is Thomass office")))) Calling laser: laser(point(coordinates(278,623))) Calling speech_synthesizer: speech_synthesizer(utterance("This is Thomass office")) Received: nlp(intention(query(where)),location(place(a2_221)), time(889524831)) which is passed on to dialog_manager Received: dialog_manager(output(laser(point(coordinates(132,500))), speech_synthesizer(utterance("computer room is here")))) Calling laser: laser(point(coordinates(132,500))) Calling speech_synthesizer: speech_synthesizer(utterance("computer room is here")) Received: nlp(intention(query(who)),location(this($Deixis),type(office)), time(889524864)) which is passed on to dialog_manager Received: dialog_manager(output(laser(point(coordinates(658,546))), speech_synthesizer(utterance("This is not an office, this is instrument repair")))) Calling laser: laser(point(coordinates(658,546))) Calling speech_synthesizer: speech_synthesizer(utterance("This is not an office, this is instrument repair")) Received: nlp(intention(query(who)),location(this($Deixis),type(office)), time(889524885)) which is passed on to dialog_manager Received: dialog_manager(output(laser(point(coordinates(223,568))), speech_synthesizer(utterance("This is Pauls office")))) Calling laser: laser(point(coordinates(223,568))) Calling speech_synthesizer: speech_synthesizer(utterance("This is Pauls office")) Received: nlp(intention(instruction(show_route)),source(location(person(lbl), type(office))), destination(location(person(hg),type(office))),time(889524919)) which is passed on to dialog_manager

Multimodal interfaces for multimedia information agents

1997 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1997

When lhumans communicate they take advantage of a rich spectrum of cues. Some arc: verbal and acoustic. Some are non-verbal and non-acoustic. Signal processing technology has devoted much attention to the recognition of speech, as a single human communication signal. Most other complementary communication cues, however, remain unexplored and unused in human-computer interaction. In this paper we show that the addition of non-acoustic or non-verbal cues can significantly enhance robustness, flexibility, naturalness and performance of human-computer interaction. We demonstrate computer agents that use speech, gesture, handwriting, pointing, spelling jointly for more robust, natural and flexible human-computer interaction in the various tasks of an information worker: information creation, access, manipulation or dissemination.

Intelligent Multimedia Presentation Systems

1996

This document presents a proposal of a reference model for intelligent multimedia presentation systems (IMMPSs). Many papers in the literature consider these systems, and some large-size prototypes have been developed (MMI2, WIP, MIPS, and COMET). However, no generic model has been proposed so far. Each project began from scratch, relying only on the past experience of the members. This has meant a lack of uniform logical approach to the problem. Consequently, there is no agreement on the terminology to be ...

A framework for the intelligent multimodal presentation of information

Signal Processing, 2006

Intelligent multimodal presentation of information aims at using several communication modalities to produce the most relevant user outputs. This problem involves different concepts related to information structure, interaction components (modes, modalities, devices) and context. In this paper we study the influence of interaction context on system outputs. More precisely, we propose a conceptual model for intelligent multimodal presentation of information. This model called WWHT is based on four concepts: "What", "Which", "How" and "Then". These concepts describe the life cycle of a multimodal presentation from its "birth" to its "death" including its evolution. On the basis of this model, we present the ELOQUENCE software platform for the specification, the simulation and the execution of output multimodal systems. Finally, we describe two applications of this framework: the first one concerns the simulation of an incoming call in an intelligent mobile phone and the second one is related to a task of marking out a target on the ground in a fighter plane cockpit. model called WWHT (What-Which-How-Then) describes the design process of a multimodal presentation. More precisely, it introduces the different output multimodality components and presents the different steps of the presentation's life-cycle. On the basis of this model, the ELOQUENCE software platform, allowing the specification, the simulation and the execution of multimodal system outputs, has been developed. Two examples illustrate the application of the model and of the platform: the first one concerns the mobile telephony field (simulation of an incoming call) and the second one the military avionic field (task of marking out a target on the ground). The first example illustrates all along the paper the introduced concepts and the second example sums them up. Finally, the paper concludes with a discussion about the relations between inputs and outputs in an intelligent multimodal system.

Intelligent Multimedia Presentation Systems: Research and Principles

The development of the so called intelligent multimedia presentation systems (IMMPSs) is currently very actively addressed by research groups worldwide. A common goal of the research community i s t o d e v elop mechanisms for the automated generation of multimedia presentations. Up to now, some large-sized prototypes of IMMPSs have been built and user interfaces of some applications already include automated components for certain generation tasks such as text or graphics generation. Unfortunately, there is no common agreement on a generic architecture for IMMPS with clear functional de nitions of involved subcomponents. Moreover, even the terminology used in the descriptions of existing IMMPSs varies considerably across research teams. With the proposal of a reference model for IMMPSs, this paper tries to ll a major methodological gap and thus may provide a sound basis for ongoing and future developments of IMMPSs. In essence, the proposed reference model consists of several layers referring to the particular subtasks which occur in multimedia presentation generation. Following the paradigm of knowledge-based computing, we i n troduce a minimum set of explicitly encoded knowledge, and assign it to some logically distinct knowledge sources. These sources allow to share knowledge among components of di erent l a yers in a client-server fashion. In order to demonstrate the use of the reference model, we provide a comparison of two IMMPS by redescribing them in terms of the model.

Editorial for intelligent interactive multimedia systems and services

2012

Users worldwide are spending more and more of their time on smart device platforms providing ubiquitous multimedia experience. Smart phones, tablets, e-Readers, web enabled television sets and other device platforms are all participating in this revolution. From the point of view of human-system interaction, however, such platforms pose distinct research problems. A consumer engaged in playing fantasy football has different interaction needs to one researching an address on a cell phone.