Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents (original) (raw)

Open-source software for developing anthropomorphic spoken dialog agent

2002

An architecture for highly-interactive human-like spoken-dialog agent is discussed in this paper. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial-image synthesizer and dialog controller, each module is modeled as a virtual machine that has a simple common interface and is connected to each other through a broker (communication manager). The agent system under development is supported by the IPA and it will be publicly available as a software toolkit this year.

Open-source Software for Developing Anthropomorphic Spoken Dialog Agents

2003

An architecture for highly-interactive human-like spoken-dialog agent is discussed in this paper. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial-image synthesizer and dialog controller, each module is modeled as a virtual machine that has a simple common interface and is connected to each other through a broker (communication manager). The agent system under development is supported by the IPA and it will be publicly available as a software toolkit this year.

Development of a Toolkit for Spoken Dialog Systems with an Anthropomorphic Agent: Galatea

Proceedings Apsipa Asc 2009 Asia Pacific Signal and Information Processing Association 2009 Annual Summit and Conference, 2009

The Interactive Speech Technology Consortium (ISTC) has been developing a toolkit called Galatea that comprises four fundamental modules for speech recognition, speech synthesis, face synthesis, and dialog control, that can be used to realize an interface for spoken dialog systems with an anthropomorphic agent. This paper describes the development of the Galatea toolkit and the functions of each module; in addition, it discusses the standardization of the description of multi-modal interactions.

A NATURAL CONVERSATIONAL VIRTUAL HUMAN WITH MULTIMODAL DIALOG SYSTEM

The making of virtual human character to be realistic and credible in real time automated dialog animation system is necessary. This kind of animation carries importance elements for many applications such as games , virtual agents and movie animations. It is also considered important for applications which require interaction between human and computer. However, for this purpose, it is compulsory that the machine should have sufficient intelligence for recognizing and synthesizing human voices. As one of the most vital interaction method between human and machine, speech has recently received significant attention, especially in avatar research innovation. One of the challenges is to create precise lip movements of the avatar and synchronize it with a recorded audio. This paper specifically introduces the innovative concept of multimodal dialog systems of the virtual character and focuses the output part of such systems. More specifically, its focus is on behavior planning and developing the data control languages (DCL).

Life-Like Characters. Tools, Affective Functions, and Applications

2003

Galatea is a software toolkit to develop a human-like spoken dialog agent. In order to easily integrate the modules of different characteristics including speech recognizer, speech synthesizer, facial animation synthesizer[ facial-image synthesizer ] and dialog controller, each module is modeled as a virtual machine having a simple common interface and connected to each other through a broker (communication manager). Galatea employs model-based speech and facial animation[ facial-image ] synthesizers whose model parameters are adapted easily to those for an existing person if his/her training data is given. The software toolkit that runs on both UNIX/Linux and Windows operating systems will be publicly available in the middle of 2003 [1, 2].

The OLGA project: An animated talking agent in a dialogue system

1997

The object of the Olga project is to develop an interactive 3D animated talking agent. The final target could be the future, digital TV-set, where the Olga agent would guide naive users through various new services on the networks. The current application is consumer information about microwave ovens. Olga implicates the development of a system with components from many different fields: dialogue management, speech recognition, multimodal speech synthesis, graphics, animation, facilities for direct manipulation and database handling. To integrate all knowledge sources Olga is implemented with separate modules communicating with a central dialogue interaction manager. In this paper we mainly describe the talking animated agent and the dialogue manager. There is also a short description of the preliminary speech recogniser used in the project.

Expressive speech for a virtual talking head

This paper presents our work on building an expressive facial speech synthesis system Eface, which can be used on a social or service robot. Eface aims at enabling a robot to deliver information clearly with empathetic speech and an expressive virtual face. The system is built on two open source software packages: the Festival speech synthesis system, which provides robots the capability to speak with different voices and emotions, and Xface-a 3D talking head, which enables the robot to display various human facial expressions. This paper addresses how to express different speech emotions with Festival and how to integrate the synthesized speech with Xface. We have also implemented Eface on a physical robot and tested it with some service scenarios.

Prototype of animated agent for application

2002

Executive summary This Document describes the Prototype of animated agent for application 1. In particular, it describes the different phases involved in the computation of the final animation of the agents. This document discusses the method we are using to resolve conflicts arising when combining several facial expressions. We also present our lip and coarticulation model.

Speech transcription for Embodied Conversational Agent animation

The Journal of the Acoustical Society of America, 2008

This article investigates speech transcription within a framework of Embodied Conversational Agent (ECA) animation by voice. The idea is to detect some pronounced expressions/keywords in order to animate automatically the face and the body of an avatar. Extensibility, speed and precision are the main constraints of this interactive application. So after defining the set of the relevant words (to the application), a fast large vocabulary speech recognition system was developed and the keyword detection was evaluated. In order to fasten the recognition system without decreasing its efficiency, the acoustic models have been shortened by an original process. It consists in decreasing the number of shared central states of context dependent models which are considered stationary. The shared states located in the border of the models remain inchanged. Then all the models are retrained. The system is evaluated on an hour of the ESTER database (a French broadcast news database). The experiments show that reducing the number central states of triphones is advantageous. Indeed, the length of models is reduced by 20% with no loss of accuracy.