Evaluation of Word Representations in Grounding Natural Language Instructions through Computational Human-Robot Interaction (original) (raw)

A Probabilistic Framework for Comparing Syntactic and Semantic Grounding of Synonyms through Cross-Situational Learning

Conference: ICRA-2018 Workshop on "Representing a Complex World: Perception, Inference, and Learning for Joint Semantic, Geometric, and Physical Understanding", Brisbane, Australia, 2018

Natural human-robot interaction requires robots to link words to objects and actions through grounding. Although grounding has been investigated in previous studies, none of them considered grounding of synonyms. In this paper, we try to fill this gap by introducing a Bayesian learning model for grounding synonymous object and action names using cross-situational learning. Three different word representations are employed with the probabilistic model and evaluated according to their grounding performance. Words are grounded through geometric characteristics of objects and kinematic features of the robot joints during action execution. An interaction experiment between a human tutor and HSR robot is used to evaluate the proposed model. The results show that representing words by syntactic and/or semantic information achieves worse grounding results than representing them by unique numbers.

Acquiring Vocabulary through Human Robot Interaction: A Learning Architecture for Grounding Words with Multiple Meanings

2010 AAAI Fall Symposium Series, 2010

This paper presents a robust methodology for grounding vocabulary in robots. A social language grounding experiment is designed, where, a human instructor teaches a robotic agent the names of the objects present in a visually shared environment. Any system for grounding vocabulary has to incorporate the properties of gradual evolution and lifelong learning. The learning model of the robot is adopted from an ongoing work on developing systems that conform to these properties. Significant modifications have been introduced to the adopted model, especially to handle words with multiple meanings. A novel classification strategy has been developed for improving the performance of each classifier for each learned category. A set of six new nearest-neighbor based classifiers have also been integrated into the agent architecture. A series of experiments were conducted to test the performance of the new model on vocabulary acquisition. The robot was shown to be robust at acquiring vocabulary and has the potential to learn a far greater number of words (with either single or multiple meanings).

Learning from Implicit Information in Natural Language Instructions for Robotic Manipulations

Proceedings of the Combined Workshop on Spatial Language Understanding (, 2019

Human-robot interaction often occurs in the form of instructions given from a human to a robot. For a robot to successfully follow instructions, a common representation of the world and objects in it should be shared between humans and the robot so that the instructions can be grounded. Achieving this representation can be done via learning, where both the world representation and the language grounding are learned simultaneously. However, in robotics this can be a difficult task due to the cost and scarcity of data. In this paper, we tackle the problem by separately learning the world representation of the robot and the language grounding. While this approach can address the challenges in getting sufficient data, it may give rise to inconsistencies between both learned components. Therefore, we further propose Bayesian learning to resolve such inconsistencies between the natural language grounding and a robot's world representation by exploiting spatio-relational information that is implicitly present in instructions given by a human. Moreover, we demonstrate the feasibility of our approach on a scenario involving a robotic arm in the physical world.

Robot learning of lexical semantics from sensorimotor interaction and the unrestricted speech of human tutors

2010

This paper describes a HRI case study which demonstrates how a humanoid robot can use simple heuristics to acquire and use vocabulary in the context of being shown a series of shapes presented to it by a human and how the interaction style of the human changes as the robot learns and expresses its learning through speech. The case study is based on findings on how adults use childdirected speech when socially interacting with infants. The results indicate that humans are generally willing to engage with a robot in a similar manner to their engagement with a human infant and use similar styles of interaction varying as the shared understanding between them becomes more apparent. The case study also demonstrates that a rudimentary form of shared intentional reference can sufficiently bias the learning procedure. As a result, the robot associates humantaught lexical items for a series of presented shapes with its own sensorimotor experience, and is able to utter these words, acquired from the particular tutor, appropriately in an interactive, embodied context exhibiting apparent reference and discrimination.

Natural Language Acquisition and Grounding for Embodied Robotic Systems

Proceedings of the AAAI Conference on Artificial Intelligence

We present a cognitively plausible novel framework capable of learning the grounding in visual semantics and the grammar of natural language commands given to a robot in a table top environment. The input to the system consists of video clips of a manually controlled robot arm, paired with natural language commands describing the action. No prior knowledge is assumed about the meaning of words, or the structure of the language, except that there are different classes of words (corresponding to observable actions, spatial relations, and objects and their observable properties). The learning process automatically clusters the continuous perceptual spaces into concepts corresponding to linguistic input. A novel relational graph representation is used to build connections between language and vision. As well as the grounding of language to perception, the system also induces a set of probabilistic grammar rules. The knowledge learned is used to parse new commands involving previously un...

Teaching semantics and skills for human-robot collaboration

Paladyn, Journal of Behavioral Robotics, 2019

Recent advances in robotics allow for collaboration between humans and machines in performing tasks at home or in industrial settings without harming the life of the user. While humans can easily adapt to each other and work in team, it is not as trivial for robots. In their case, interaction skills typically come at the cost of extensive programming and teaching. Besides, understanding the semantics of a task is necessary to work efficiently and react to changes in the task execution process. As a result, in order to achieve seamless collaboration, appropriate reasoning, learning skills and interaction capabilities are needed. For us humans, a cornerstone of our communication is language that we use to teach, coordinate and communicate. In this paper we thus propose a system allowing (i) to teach new action semantics based on the already available knowledge and (ii) to use natural language communication to resolve ambiguities that could arise while giving commands to the robot. Rea...

Grounded situation models for robots: Bridging language, perception, and action

AAAI-05 Workshop on Modular Construction of …, 2005

"Our long-term objective is to develop robots that engage in natural language-mediated cooperative tasks with humans. To support this goal, we are developing an amodal representation called a grounded situation model (GSM), as well as a modular architecture in which the GSM resides in a centrally located module. We present an implemented system that allows of a range of conversational and assistive behavior by a manipulator robot. The robot updates beliefs about its physical environment and body, based on a mixture of linguistic, visual and proprioceptive evidence. It can answer basic questions about the present or past and also perform actions through verbal interaction. Most importantly, a novel contribution of our approach is the robot’s ability for seamless integration of both language and sensor-derived information about the situation: For example, the system can acquire parts of situations either by seeing them or by “imagining” them through descriptions given by the user: “There is a red ball at the left”. These situations can later be used to create mental imagery, thus enabling bidirectional translation between perception and language. This work constitutes a step towards robots that use situated natural language grounded in perception and action."

A probabilistic approach to learning a visually grounded language model through human-robot interaction

2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010

Language is among the most fascinating and complex cognitive activities that develops rapidly since the early months of infants' life. The aim of the present work is to provide a humanoid robot with cognitive, perceptual and motor skills fundamental for the acquisition of a rudimentary form of language. We present a novel probabilistic model, inspired by the findings in cognitive sciences, able to associate spoken words with their perceptually grounded meanings. The main focus is set on acquiring the meaning of various perceptual categories (e.g. red, blue, circle, above, etc.), rather than specific world entities (e.g. an apple, a toy, etc.). Our probabilistic model is based on a variant of multi-instance learning technique, and it enables a robotic platform to learn grounded meanings of adjective/noun terms. The systems could be used to understand and generate appropriate natural language descriptions of real objects in a scene, and it has been successfully tested on the NAO humanoid robotic platform.

Conversational robots: Building blocks for grounding word meaning

Proceedings of the HLT-NAACL 2003 …, 2003

How can we build robots that engage in fluid spoken conversations with people, moving beyond canned responses to words and towards actually understanding? As a step towards addressing this question, we introduce a robotic architecture that provides a basis for grounding word meanings. The architecture provides perceptual, procedural, and affordance representations for grounding words. A perceptually coupled on-line simulator enables sensorymotor representations that can shift points of view. Held together, we show that this architecture provides a rich set of data structures and procedures that provide the foundations for grounding the meaning of certain classes of words.