MURML: A Multimodal Utterance Representation Markup Language for Conversational Agents (original) (raw)

Multimodal expressive embodied conversational agents

2005

Abstract In this paper we present our work toward the creation of a multimodal expressive Embodied Conversational Agent (ECA). Our agent, called Greta, exhibits nonverbal behaviors synchronized with speech. We are using the taxonomy of communicative functions developed by Isabella Poggi [22] to specify the behavior of the agent. Based on this taxonomy a representation language, Affective Presentation Markup Language, APML has been defined to drive the animation of the agent [4].

Embodied conversational characters: representation formats for multimodal communicative behaviours

This contribution deals with the requirements on representation languages employed in planning and displaying communicative multimodal behaviour of Embodied Conversational Agents (ECAs). We focus on the role of behaviour representation frameworks as part of the processing chain from intent planning to the planning and generation of multimodal communicative behaviours. On the one hand, the field is fragmented, with almost everybody working on ECAs developing their own tailor-made representations, which is amongst others reflected in the extensive references list. On the other hand, there are general aspects that need to be modelled in order to generate multimodal behaviour. Throughout the chapter we take different perspectives on existing representation languages and outline the fundament of a common framework.

Towards a common framework for multimodal generation: The behavior markup language

Intelligent Virtual …, 2006

This paper describes an international effort to unify a multimodal behavior generation framework for Embodied Conversational Agents (ECAs). We propose a three stage model we call SAIBA where the stages represent intent planning, behavior planning and behavior realization. A Function Markup Language (FML), describing intent without referring to physical behavior, mediates between the first two stages and a Behavior Markup Language (BML) describing desired physical realization, mediates between the last two stages. In this paper we will focus on BML.

Specification and realisation of multimodal output in dialogue systems

2002

We present a high level formalism for specifying verbal and nonverbal output from a multimodal dialogue system. The output specification is XML-based and provides information about communicative functions of the output without detailing the realisation of these functions. The specification can be used to control an animated character that uses speech and gestures. We give examples from an implementation in a multimodal spoken dialogue system, and describe how facial gestures are implemented in a 3Danimated talking agent within this system.

Synthesizing multimodal utterances for conversational agents

Computer animation and virtual worlds, 2004

Conversational agents are supposed to combine speech with non-verbal modalities for intelligible multimodal utterances. In this paper, we focus on the generation of gesture and speech from XML-based descriptions of their overt form. An incremental production model is presented that combines the synthesis of synchronized gestural, verbal, and facial behaviors with mechanisms for linking them in fluent utterances with natural co-articulation and transition effects. In particular, an efficient kinematic approach for animating hand gestures from shape specifications is presented, which provides fine adaptation to temporal constraints that are imposed by cross-modal synchrony.

EVALUATION OF MULTIMODAL BEHAVIOUR OF EMBODIED AGENTS Cooperation between Speech and Gestures

2000

Individuality of Embodied Conversational Agents (ECAs) may depend on both the look of the agent and the way it combines different modalities such as speech and gesture. In this chapter, we describe a study in which male and female users had to listen to three short technical presentations made by ECAs. Three multimodal strategies of ECAs for using arm gestures with speech were compared: redundancy, complementarity, and speech-specialization. These strategies were randomly attributed to different-looking 2D ECAs, in order to test independently the effects of multimodal strategy and ECA's appearance. The variables we examined were subjective impressions and recall performance. Multimodal strategies proved to influence subjective ratings of quality of explanation, in particular for male users. On the other hand, appearance affected likeability, but also recall performance. These results stress the importance of both multimodal strategy and appearance to ensure pleasantness and effectiveness of presentation ECAs.

Realizing Multimodal Behavior

Lecture Notes in Computer Science, 2010

Generating coordinated multimodal behavior for an embodied agent (speech, gesture, facial expression. . . ) is challenging. It requires a high degree of animation control, in particular when reactive behaviors are required. We suggest to distinguish realization planning, where gesture and speech are processed symbolically using the behavior markup language (BML), and presentation which is controlled by a lower level animation language (EMBRScript). Reactive behaviors can bypass planning and directly control presentation. In this paper, we show how to define a behavior lexicon, how this lexicon relates to BML and how to resolve timing using formal constraint solvers. We conclude by demonstrating how to integrate reactive emotional behaviors.

Generation of multimodal dialogue for net environments

2002

In this paper an architecture and special purpose markup language for simulated affective face-to-face communication is presented. In systems based on this architecture, users will be able to watch embodied conversational agents interact with each other in virtual locations on the internet. The markup language, or Rich Representation Language (RRL), has been designed to provide an integrated representation of speech, gesture, posture and facial animation. environments are also inhabited by agents which are fully system defined, and (ii) each avatar, after creation, has its autonomous existence in the net environment.

Realizing Multimodal Behavior Closing the gap between behavior planning and embodied agent presentation

2011

Abstract. Generating coordinated multimodal behavior for an embodied agent (speech, gesture, facial expression...) is challenging. It requires a high degree of animation control, in particular when reactive behaviors are required. We suggest to distinguish realization planning, where gesture and speech are processed symbolically using the behavior markup language (BML), and presentation which is controlled by a lower level animation language (EMBRScript). Reactive behaviors can bypass planning and directly control presentation. In this paper, we show how to define a behavior lexicon, how this lexicon relates to BML and how to resolve timing using formal constraint solvers. We conclude by demonstrating how to integrate reactive emotional behaviors. 1

Specification And Realisation Of Multimodal Output

We present a high level formalism for specifying verbal and nonverbal output from a multimodal dialogue system. The output specification is XML-based and provides information about communicative functions of the output without detailing the realisation of these functions. The specification can be used to control an animated character that uses speech and gestures. We give examples from an implementation in a multimodal spoken dialogue system, and describe how facial gestures are implemented in a 3Danimated talking agent within this system.