Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with Intention (original) (raw)

Spoken language interaction with robots: Recommendations for future research

Computer Speech & Language, 2022

aspects of language, improving robustness, creating new methods for rapid adaptation, better integrating speech and language with other communication modalities, giving speech and language components access to rich representations of the robot's current knowledge and state, making all components operate in real time, and improving research infrastructure and resources. Research and development that prioritizes these topics will, we believe, provide a solid foundation for the creation of speech-capable robots that are easy and effective for humans to work with.

Voice in Human–Agent Interaction

ACM Computing Surveys, 2021

Social robots, conversational agents, voice assistants, and other embodied AI are increasingly a feature of everyday life. What connects these various types of intelligent agents is their ability to interact with people through voice. Voice is becoming an essential modality of embodiment, communication, and interaction between computer-based agents and end-users. This survey presents a meta-synthesis on agent voice in the design and experience of agents from a human-centered perspective: voice-based human–agent interaction (vHAI). Findings emphasize the social role of voice in HAI as well as circumscribe a relationship between agent voice and body, corresponding to human models of social psychology and cognition. Additionally, changes in perceptions of and reactions to agent voice over time reveals a generational shift coinciding with the commercial proliferation of mobile voice assistants. The main contributions of this work are a vHAI classification framework for voice across vari...

Can you understand me? Speaking robots and accented speech

CALL in a climate of change: adapting to turbulent global conditions – short papers from EUROCALL 2017, 2017

The results of our previous research on the pedagogical use of Speaking Robots (SRs) revealed positive effects on motivating students to practice their oral skills in a stress-free environment. However, our findings indicated that the SR was sometimes unable to understand students' foreign accented speech. In this paper, we report the results of a study that investigated the ability of an SR to recognize and process non-native English speech from different levels of accentedness. The analysis is based on how the SR handled the participants' speech in terms of accuracy, the number and types of communication breakdowns observed, and how the participants behaved to solve the interaction problems that they experienced with the SR. Based on the study's surveys, interviews, and observations of users' interactions with the device, the results emphasize SRs' potential to recognize different types of accented L2 speech and their use as pedagogical tools.

Utilizing Prior Knowledge to Improve Automatic Speech Recognition in Human-Robot Interactive Scenarios

Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction

The prolificacy of human-robot interaction not only depends on a robot's ability to understand the intent and content of the human utterance but also gets impacted by the automatic speech recognition (ASR) system. Modern ASR can provide highly accurate (grammatically and syntactically) translation. Yet, the general purpose ASR often misses out on the semantics of the translation by incorrect word prediction due to open-vocabulary modeling. ASR inaccuracy can have significant repercussions as this can lead to a completely different action by the robot in the real world. Can any prior knowledge be helpful in such a scenario? In this work, we explore how prior knowledge can be utilized in ASR decoding. Using our experiments, we demonstrate how our system can significantly improve ASR translation for robotic task instruction. CCS CONCEPTS • Computing methodologies → Speech recognition; Knowledge representation and reasoning.

How We Talk with Robots: Eliciting Minimally-Constrained Speech to Build Natural Language Interfaces and Capabilities

Proceedings of the Human Factors and Ergonomics Society Annual Meeting

Industry, military, and academia are showing increasing interest in collaborative human-robot teaming in a variety of task contexts. Designing effective user interfaces for human-robot interaction is an ongoing challenge, and a variety of single-and multiple-modality interfaces have been explored. Our work is to develop a bi-directional natural language interface for remote human-robot collaboration in physically situated tasks. When combined with a visual interface and audio cueing, we intend for the natural language interface to provide a naturalistic user experience that requires little training. Building the language portion of this interface requires first understanding how potential users would speak to the robot. In this paper, we describe our elicitation of minimally-constrained robot-directed language, observations about the users' language behavior, and future directions for constructing an automated robotic system that can accommodate these language needs.

Confidence in uncertainty: Error cost and commitment in early speech hypotheses

PLOS ONE, 2018

Interactions with artificial agents often lack immediacy because agents respond slower than their users expect. Automatic speech recognisers introduce this delay by analysing a user's utterance only after it has been completed. Early, uncertain hypotheses of incremental speech recognisers can enable artificial agents to respond more timely. However, these hypotheses may change significantly with each update. Therefore, an already initiated action may turn into an error and invoke error cost. We investigated whether humans would use uncertain hypotheses for planning ahead and/or initiating their response. We designed a Ghost-in-the-Machine study in a bar scenario. A human participant controlled a bartending robot and perceived the scene only through its recognisers. The results showed that participants used uncertain hypotheses for selecting the best matching action. This is comparable to computing the utility of dialogue moves. Participants evaluated the available evidence and the error cost of their actions prior to initiating them. If the error cost was low, the participants initiated their response with only suggestive evidence. Otherwise, they waited for additional, more confident hypotheses if they still had time to do so. If there was time pressure but only little evidence, participants grounded their understanding with echo questions. These findings contribute to a psychologically plausible policy for human-robot interaction that enables artificial agents to respond more timely and socially appropriately under uncertainty.

Text-to-Speech in Human-Robot Communication

Analele Universităţii "Dunărea de Jos" Galaţi: Fascicula III, Electrotehnică, Electronică, Automatică, Informatică, 2019

Human-robot communication is an important field of robotics. In order to facilitate the communication and interactions between humans and robots, in this work we propose a Text-to-Speech system, which allows robots to speak naturally and as close as possible to the human voice. The robot's decision system must eliminate the ambiguities and archive the data obtained in the human-robot communication. The proposed solution involves a voice synthesizer, which generates voice messages starting from text messages, adding environmental sounds for a more natural impression.

Exploring miscommunication and collaborative behaviour in human-robot interaction

This paper presents the first step in designing a speech-enabled robot that is capable of natural management of miscommunication. It describes the methods and results of two WOz studies, in which dyads of naïve participants interacted in a collaborative task. The first WOz study explored human miscommunication management. The second study investigated how shared visual space and monitoring shape the processes of feedback and communication in task-oriented interactions. The results provide insights for the development of human-inspired and robust natural language interfaces in robots.