Robot Language Learning, Generation, and Comprehension (original) (raw)
Related papers
Robot Program Construction via Grounded Natural Language Semantics & Simulation
2018
Robots acting in semi-structured, human environments need to understand the effects of their actions and the instructions given by a human user. Simulation has been considered a promising reasoning technique to help tackle both problems. In this paper, we present a system that constructs an executable robot program from a linguistic semantic specification produced by parsing a natural language sentence; in effect, our system grounds the semantic specification into the produced robot plan. The plan can then be run in a simulated environment, which allows one to infer more about the plan than was present in the initial semantic specification. Our system allows modeling how actions can be modified by subclauses, which we showcase by a transport action. Simulation runs allow discovery of better parameters, either locally for a subtask or such that the entire task is better performed; simulation reveals these parameterizations may differ.
Driving Under the Influence (of Language)
IEEE Transactions on Neural Networks and Learning Systems, 2017
We present a unified framework which supports grounding natural-language semantics in robotic driving. This framework supports acquisition (learning grounded meanings of nouns and prepositions from human sentential annotation of robotic driving paths), generation (using such acquired meanings to generate sentential description of new robotic driving paths), and comprehension (using such acquired meanings to support automated driving to accomplish navigational goals specified in natural language). We evaluate the performance of these three tasks by having independent human judges rate the semantic fidelity of the sentences associated with paths. Overall, machine performance is 74.9%, while the performance of human annotators is 83.8%.
Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2019
Abstract—In order to interact with people in a natural way, a robot must be able to link words to objects and actions. Although previous studies in the literature have investigated grounding, they did not consider grounding of unknown synonyms. In this paper, we introduce a probabilistic model for grounding unknown synonymous object and action names using cross-situational learning. The proposed Bayesian learning model uses four different word representations to determine synonymous words. Afterwards, they are grounded through geometric characteristics of objects and kinematic features of the robot joints during action execution. The proposed model is evaluated through an interaction experiment between a human tutor and HSR robot. The results show that semantic and syntactic information both enable grounding of unknown synonyms and that the combination of both achieves the best grounding.
Natural Language Acquisition and Grounding for Embodied Robotic Systems
Proceedings of the AAAI Conference on Artificial Intelligence
We present a cognitively plausible novel framework capable of learning the grounding in visual semantics and the grammar of natural language commands given to a robot in a table top environment. The input to the system consists of video clips of a manually controlled robot arm, paired with natural language commands describing the action. No prior knowledge is assumed about the meaning of words, or the structure of the language, except that there are different classes of words (corresponding to observable actions, spatial relations, and objects and their observable properties). The learning process automatically clusters the continuous perceptual spaces into concepts corresponding to linguistic input. A novel relational graph representation is used to build connections between language and vision. As well as the grounding of language to perception, the system also induces a set of probabilistic grammar rules. The knowledge learned is used to parse new commands involving previously un...
Natural semantics for a mobile robot
1999
Functionalism is the view that a system (or system component) grasps the meaning of its inputs to the extent that it produces the right outputs. If a system retrieves all and only relevant documents in response to a query, we say it understands the query. If a robot avoids bumping into walls, we say it understands its sensors and its environment. If a chess program beats the world champion, we say it understands the game. One kind of functionalism, conventional functionalism, is currently very popular and productive in artificial intelligence and the other cognitive sciences, but it requires humans to specify the meanings of assertions. A second kind of functionalism, natural semantics requires computers to learn these meanings. This paper discusses the limitations of conventional functionalism and describes some robotics work from our laboratory on natural semantics.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018
We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. The proposed model uses attention mechanisms to connect information from user instructions with a topological representation of the environment. To evaluate this model, we collected a new dataset for the translation problem containing 11,051 pairs of user instructions and navigation plans. Our results show that the proposed model outperforms baseline approaches on the new dataset. Overall, our work suggests that a topological map of the environment can serve as a relevant knowledge base for translating natural language instructions into a sequence of navigation behaviors.
Natural Language For Human Robot Interaction
2015
Natural Language Understanding (NLU) was one of the main original goals of artificial intelligence and cognitive science. This has proven to be extremely challenging and was nearly abandoned for decades. We describe an implemented system that supports full NLU for tasks of moderate complexity. The natural language interface is based on Embodied Construction Grammar and simulation semantics. The system described here supports human dialog with an agent controlling a simulated robot, but is flexible with respect to both input language and output task.
Grounding Spatial Relations for Outdoor Robot Navigation
We propose a language-driven navigation approach for commanding mobile robots in outdoor environments. We consider unknown environments that contain previously unseen objects. The proposed approach aims at making interactions in human-robot teams natural. Robots receive from human teammates commands in natural language, such as " Navigate around the building to the car left of the fire hydrant and near the tree ". A robot needs first to classify its surrounding objects into categories, using images obtained from its sensors. The result of this classification is a map of the environment, where each object is given a list of semantic labels, such as " tree " and " car " , with varying degrees of confidence. Then, the robot needs to ground the nouns in the command. Grounding, the main focus of this paper, is mapping each noun in the command into a physical object in the environment. We use a probabilistic model for interpreting the spatial relations, such ...
Natural Language Grounding and Grammar Induction for Robotic Manipulation Commands
2017
We present a cognitively plausible system capable of acquiring knowledge in language and vision from pairs of short video clips and linguistic descriptions. The aim of this work is to teach a robot manipulator how to execute natural language commands by demonstration. This is achieved by first learning a set of visual 'concepts' that abstract the visual feature spaces into concepts that have human-level meaning. Second, learning the mapping/grounding between words and the extracted visual concepts. Third, inducing grammar rules via a semantic representation known as Robot Control Language (RCL). We evaluate our approach against state-of-the-art supervised and unsupervised grounding and grammar induction systems, and show that a robot can learn to execute never seenbefore commands from pairs of unlabelled linguistic and visual inputs.