David Filliat | ENSTA ParisTech (original) (raw)
Papers by David Filliat
The human perception of the external world appears as a natural, immediate and effortless task. I... more The human perception of the external world appears as a natural, immediate and effortless task. It is achieved through a number of "low-level" sensory-motor processes that provide a high-level representation adapted to complex reasoning and decision. Compared to these representations, mobile robots usually provide only low-level obstacle maps that lack such highlevel information. We present a mobile robot whose goal is to autonomously explore an unknown indoor environment and to build a semantic map containing high-level information similar to those extracted by humans and that will be rapidly and easily interpreted by users to assess the situation. This robot was developed under the Panoramic and Active Camera for Object Mapping (PACOM) 1 project whose goal is to participate in a French exploration and mapping contest called CAROTTE 2 . We will detail in particular how we integrated visual object recognition, room detection, semantic mapping, and exploration. We demonstrate the performances of our system in an indoor environment.
2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2015
Learning word meanings during natural interaction with a human faces noise and ambiguity that can... more Learning word meanings during natural interaction with a human faces noise and ambiguity that can be solved by analysing regularities across different situations. We propose a model of this cross-situational learning capacity and apply it to learning nouns and adjectives from noisy and ambiguous speeches and continuous visual input. This model uses two different strategy: a statistical filtering to remove noise in the speech part and the Non Negative Matrix Factorization algorithm to discover word-meaning in the visual domain. We present experiments on learning object names and color names showing the performance of the model in real interactions with humans, dealing in particular with strong noise in the speech recognition.
Cognitive Systems Research
a AnimatLab -LIP6 8,rue du Capitaine Scott 75015 Paris -France 1 b DGA/Centre Technique d'Arcueil... more a AnimatLab -LIP6 8,rue du Capitaine Scott 75015 Paris -France 1 b DGA/Centre Technique d'Arcueil 16 bis Av Prieur de la Cote d'Or
In this paper we introduce MCA-NMF, a computational model of the acquisition of multimodal concep... more In this paper we introduce MCA-NMF, a computational model of the acquisition of multimodal concepts by an agent grounded in its environment. More precisely our model finds patterns in multimodal sensor input that characterize associations across modalities. We propose this computational model as an answer to the question of how some class of concepts can be learnt. The model is also a way of defining such a class of plausibly learnable concepts. We detail why the multimodal nature of perception is essential to lower the ambiguity of learnt concepts as well as communicate about them. We then present a set of experiments that demonstrate the learning of such concepts from real non-symbolic data consisting of speech sounds, images, and motion acquisitions. Finally we consider structure in perceptual signals and demonstrate that a detailed knowledge of this structure, named compositional understanding can emerge from, instead of being a prerequisite of, global understanding. An open-sou...
Autonomous Robots, 2015
Service robots, working in evolving human environments, need the ability to continuously learn to... more Service robots, working in evolving human environments, need the ability to continuously learn to recognize new objects. Ideally, they should act as humans do, by observing their environment and interacting with objects, without specific supervision. Taking inspiration from infant development, we propose a developmental approach that enables a robot to progressively learn objects appearances in a social environment: first, only through observation, then through active object manipulation. We focus on incremental, continuous, and unsupervised learning that does not require prior knowledge about the environment or the robot. In the first phase, we analyse the visual space and detect protoobjects as units of attention that are learned and recognized as possible physical entities. The appearance of each entity is represented as a multi-view model based on complementary visual features. In the second phase, entities are classified into three categories: parts of the body of the robot, parts of a human partner, and manipulable objects. The categorization approach is based on mutual information between the visual and proprioceptive data, and on motion behaviour of entities. The ability to categorize entities is then used during interactive object exploration to improve the previ-N. Lyubova Aldebaran-Robotics -Perception team, ously acquired objects models. The proposed system is implemented and evaluated with an iCub and a Meka robot learning 20 objects. The system is able to recognize objects with 88.5% success and create coherent representation models that are further improved by interactive learning.
SURF {f} Mid-features m i = (f i1 , f i2 ) VIEW V k = {m i ; m j } OBJECT O n = {v k } HSV-S {f} ... more SURF {f} Mid-features m i = (f i1 , f i2 ) VIEW V k = {m i ; m j } OBJECT O n = {v k } HSV-S {f} Mid-features m j = (f j1 , f j2 ) Depth contours KLT-tracking and clustering Motion detection Depth data Proto-objects segmentation P1 P2 P3 1 2 3 4
Proceedings - IEEE International Conference on Robotics and Automation, 2008
In robotic applications of visual simultaneous localization and mapping, loop-closure detection a... more In robotic applications of visual simultaneous localization and mapping, loop-closure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it possible to detect when an image comes from an already perceived scene using local shape information. Our approach extends the bag of visual words method used in image recognition to incremental conditions and relies on Bayesian filtering to estimate loopclosure probability. We demonstrate the efficiency of our solution by real-time loop-closure detection under strong perceptual aliasing conditions in an indoor image sequence taken with a handheld camera.
For a socially intelligent robot, different levels of situation assessment are required, ranging ... more For a socially intelligent robot, different levels of situation assessment are required, ranging from basic processing of sensor input to high-level analysis of semantics and intention. However, the attempt to combine them all prompts new research challenges and the need of a coherent framework and architecture. This paper presents the situation assessment aspect of Romeo2, a unique project aiming to bring multi-modal and multi-layered perception on a single system and targeting for a unified theoretical and functional framework for a robot companion for everyday life. It also discusses some of the innovation potentials, which the combination of these various perception abilities adds into the robot's socio-cognitive capabilities.
This paper addresses the problem of active object learning by a humanoid child-like robot, using ... more This paper addresses the problem of active object learning by a humanoid child-like robot, using a developmental approach. We propose a cognitive architecture where the visual representation of the objects is built incrementally through active exploration. We present the design guidelines of the cognitive architecture, its main functionalities, and we outline the cognitive process of the robot by showing how it learns to recognize objects in a human-robot interaction scenario inspired by social parenting. The robot actively explores the objects through manipulation, driven by a combination of social guidance and intrinsic motivation. Besides the robotics and engineering achievements, our experiments replicate some observations about the coupling of vision and manipulation in infants, particularly how they focus on the most informative objects. We discuss the further benefits of our architecture, particularly how it can be improved and used to ground concepts.
Each driver reacts differently to the same traffic conditions, however, most Advanced Driving Ass... more Each driver reacts differently to the same traffic conditions, however, most Advanced Driving Assistant Systems (ADAS) assume that all drivers are the same. This paper proposes a method to learn and to model the velocity profile that the driver follows as the vehicle decelerates towards a stop intersection. Gaussian Processes (GP), a machine learning method for non-linear regressions are used to model the velocity profiles. It is shown that GP are well adapted for such an application, using data recorded in real traffic conditions. It consists of the generation of a normally distributed speed, given a position on the road. By comparison with generic velocity profiles, benefits of using individual driver patterns for ADAS issues are presented.
Within a vehicle driving space, different entities such as vehicles and vulnerable road users are... more Within a vehicle driving space, different entities such as vehicles and vulnerable road users are in constant interaction. That governs their behaviour. Whilst smart sensors provide information about the state of the perceived objects, considering the spatio-temporal relationships between them with respect to the subject vehicle remains a challenge. This paper proposes to fill this gap by using contextual information to infer how perceived entities are expected to behave, and thus what are the consequences of these behaviours on the subject vehicle. For this purpose, an ontology is formulated about the vehicle, perceived entities and context (map information) to provide a conceptual description of all road entities with their interaction. It allows for inferences of knowledge about the situation of the subject vehicle with respect to the environment in which it is navigating. The framework is applied to the navigation of a vehicle as it approaches road intersections, to demonstrate its applicability. Results from the real-time implementation on a vehicle operating under controlled conditions are included. They show that the proposed ontology allows for a coherent understanding of the interactions between the perceived entities and contextual data. Further, it can be used to improve the situation awareness of an ADAS (Advanced Driving Assistance System), by determining which entities are the most relevant for the subject vehicle navigation.
Objectives: We present a cognitive developmental approach for a humanoid robot exploring its clos... more Objectives: We present a cognitive developmental approach for a humanoid robot exploring its close environment in an interactive scenario, taking inspiration from the way infants learn about objects . The proposed approach allows to detect physical entities in the visual space, to create multi-view appearance models of these entities and to categorize them into robot parts, human parts and manipulated objects without supervision and without prior knowledge about their appearances. All information about the entities appearances and behaviour is incrementally acquired while the robot and its human partner interact with objects.
ABSTRACT The recent availability of inexpensive RGB-D cameras, such as the Microsoft Kinect, has ... more ABSTRACT The recent availability of inexpensive RGB-D cameras, such as the Microsoft Kinect, has raised interest in the robotics community for point cloud segmentation. We are interested in the semantic segmentation task in which the goal is to find some relevant classes for navigation, wall, ground, objects, etc. Several effective solutions have been proposed, mainly based on the recursive decomposition of the point cloud into planes. We compare such a solution to a non-associative MRF method inspired by some recent work in computer vision. The MRF yields interesting results that are however less good than those of a carefully tuned geometric method. Nevertheless, MRF still has some advantages and we suggest some improvements.
The human perception of the external world appears as a natural, immediate and effortless task. I... more The human perception of the external world appears as a natural, immediate and effortless task. It is achieved through a number of "low-level" sensory-motor processes that provide a high-level representation adapted to complex reasoning and decision. Compared to these representations, mobile robots usually provide only low-level obstacle maps that lack such highlevel information. We present a mobile robot whose goal is to autonomously explore an unknown indoor environment and to build a semantic map containing high-level information similar to those extracted by humans and that will be rapidly and easily interpreted by users to assess the situation. This robot was developed under the Panoramic and Active Camera for Object Mapping (PACOM) 1 project whose goal is to participate in a French exploration and mapping contest called CAROTTE 2 . We will detail in particular how we integrated visual object recognition, room detection, semantic mapping, and exploration. We demonstrate the performances of our system in an indoor environment.
2015 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2015
Learning word meanings during natural interaction with a human faces noise and ambiguity that can... more Learning word meanings during natural interaction with a human faces noise and ambiguity that can be solved by analysing regularities across different situations. We propose a model of this cross-situational learning capacity and apply it to learning nouns and adjectives from noisy and ambiguous speeches and continuous visual input. This model uses two different strategy: a statistical filtering to remove noise in the speech part and the Non Negative Matrix Factorization algorithm to discover word-meaning in the visual domain. We present experiments on learning object names and color names showing the performance of the model in real interactions with humans, dealing in particular with strong noise in the speech recognition.
Cognitive Systems Research
a AnimatLab -LIP6 8,rue du Capitaine Scott 75015 Paris -France 1 b DGA/Centre Technique d'Arcueil... more a AnimatLab -LIP6 8,rue du Capitaine Scott 75015 Paris -France 1 b DGA/Centre Technique d'Arcueil 16 bis Av Prieur de la Cote d'Or
In this paper we introduce MCA-NMF, a computational model of the acquisition of multimodal concep... more In this paper we introduce MCA-NMF, a computational model of the acquisition of multimodal concepts by an agent grounded in its environment. More precisely our model finds patterns in multimodal sensor input that characterize associations across modalities. We propose this computational model as an answer to the question of how some class of concepts can be learnt. The model is also a way of defining such a class of plausibly learnable concepts. We detail why the multimodal nature of perception is essential to lower the ambiguity of learnt concepts as well as communicate about them. We then present a set of experiments that demonstrate the learning of such concepts from real non-symbolic data consisting of speech sounds, images, and motion acquisitions. Finally we consider structure in perceptual signals and demonstrate that a detailed knowledge of this structure, named compositional understanding can emerge from, instead of being a prerequisite of, global understanding. An open-sou...
Autonomous Robots, 2015
Service robots, working in evolving human environments, need the ability to continuously learn to... more Service robots, working in evolving human environments, need the ability to continuously learn to recognize new objects. Ideally, they should act as humans do, by observing their environment and interacting with objects, without specific supervision. Taking inspiration from infant development, we propose a developmental approach that enables a robot to progressively learn objects appearances in a social environment: first, only through observation, then through active object manipulation. We focus on incremental, continuous, and unsupervised learning that does not require prior knowledge about the environment or the robot. In the first phase, we analyse the visual space and detect protoobjects as units of attention that are learned and recognized as possible physical entities. The appearance of each entity is represented as a multi-view model based on complementary visual features. In the second phase, entities are classified into three categories: parts of the body of the robot, parts of a human partner, and manipulable objects. The categorization approach is based on mutual information between the visual and proprioceptive data, and on motion behaviour of entities. The ability to categorize entities is then used during interactive object exploration to improve the previ-N. Lyubova Aldebaran-Robotics -Perception team, ously acquired objects models. The proposed system is implemented and evaluated with an iCub and a Meka robot learning 20 objects. The system is able to recognize objects with 88.5% success and create coherent representation models that are further improved by interactive learning.
SURF {f} Mid-features m i = (f i1 , f i2 ) VIEW V k = {m i ; m j } OBJECT O n = {v k } HSV-S {f} ... more SURF {f} Mid-features m i = (f i1 , f i2 ) VIEW V k = {m i ; m j } OBJECT O n = {v k } HSV-S {f} Mid-features m j = (f j1 , f j2 ) Depth contours KLT-tracking and clustering Motion detection Depth data Proto-objects segmentation P1 P2 P3 1 2 3 4
Proceedings - IEEE International Conference on Robotics and Automation, 2008
In robotic applications of visual simultaneous localization and mapping, loop-closure detection a... more In robotic applications of visual simultaneous localization and mapping, loop-closure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it possible to detect when an image comes from an already perceived scene using local shape information. Our approach extends the bag of visual words method used in image recognition to incremental conditions and relies on Bayesian filtering to estimate loopclosure probability. We demonstrate the efficiency of our solution by real-time loop-closure detection under strong perceptual aliasing conditions in an indoor image sequence taken with a handheld camera.
For a socially intelligent robot, different levels of situation assessment are required, ranging ... more For a socially intelligent robot, different levels of situation assessment are required, ranging from basic processing of sensor input to high-level analysis of semantics and intention. However, the attempt to combine them all prompts new research challenges and the need of a coherent framework and architecture. This paper presents the situation assessment aspect of Romeo2, a unique project aiming to bring multi-modal and multi-layered perception on a single system and targeting for a unified theoretical and functional framework for a robot companion for everyday life. It also discusses some of the innovation potentials, which the combination of these various perception abilities adds into the robot's socio-cognitive capabilities.
This paper addresses the problem of active object learning by a humanoid child-like robot, using ... more This paper addresses the problem of active object learning by a humanoid child-like robot, using a developmental approach. We propose a cognitive architecture where the visual representation of the objects is built incrementally through active exploration. We present the design guidelines of the cognitive architecture, its main functionalities, and we outline the cognitive process of the robot by showing how it learns to recognize objects in a human-robot interaction scenario inspired by social parenting. The robot actively explores the objects through manipulation, driven by a combination of social guidance and intrinsic motivation. Besides the robotics and engineering achievements, our experiments replicate some observations about the coupling of vision and manipulation in infants, particularly how they focus on the most informative objects. We discuss the further benefits of our architecture, particularly how it can be improved and used to ground concepts.
Each driver reacts differently to the same traffic conditions, however, most Advanced Driving Ass... more Each driver reacts differently to the same traffic conditions, however, most Advanced Driving Assistant Systems (ADAS) assume that all drivers are the same. This paper proposes a method to learn and to model the velocity profile that the driver follows as the vehicle decelerates towards a stop intersection. Gaussian Processes (GP), a machine learning method for non-linear regressions are used to model the velocity profiles. It is shown that GP are well adapted for such an application, using data recorded in real traffic conditions. It consists of the generation of a normally distributed speed, given a position on the road. By comparison with generic velocity profiles, benefits of using individual driver patterns for ADAS issues are presented.
Within a vehicle driving space, different entities such as vehicles and vulnerable road users are... more Within a vehicle driving space, different entities such as vehicles and vulnerable road users are in constant interaction. That governs their behaviour. Whilst smart sensors provide information about the state of the perceived objects, considering the spatio-temporal relationships between them with respect to the subject vehicle remains a challenge. This paper proposes to fill this gap by using contextual information to infer how perceived entities are expected to behave, and thus what are the consequences of these behaviours on the subject vehicle. For this purpose, an ontology is formulated about the vehicle, perceived entities and context (map information) to provide a conceptual description of all road entities with their interaction. It allows for inferences of knowledge about the situation of the subject vehicle with respect to the environment in which it is navigating. The framework is applied to the navigation of a vehicle as it approaches road intersections, to demonstrate its applicability. Results from the real-time implementation on a vehicle operating under controlled conditions are included. They show that the proposed ontology allows for a coherent understanding of the interactions between the perceived entities and contextual data. Further, it can be used to improve the situation awareness of an ADAS (Advanced Driving Assistance System), by determining which entities are the most relevant for the subject vehicle navigation.
Objectives: We present a cognitive developmental approach for a humanoid robot exploring its clos... more Objectives: We present a cognitive developmental approach for a humanoid robot exploring its close environment in an interactive scenario, taking inspiration from the way infants learn about objects . The proposed approach allows to detect physical entities in the visual space, to create multi-view appearance models of these entities and to categorize them into robot parts, human parts and manipulated objects without supervision and without prior knowledge about their appearances. All information about the entities appearances and behaviour is incrementally acquired while the robot and its human partner interact with objects.
ABSTRACT The recent availability of inexpensive RGB-D cameras, such as the Microsoft Kinect, has ... more ABSTRACT The recent availability of inexpensive RGB-D cameras, such as the Microsoft Kinect, has raised interest in the robotics community for point cloud segmentation. We are interested in the semantic segmentation task in which the goal is to find some relevant classes for navigation, wall, ground, objects, etc. Several effective solutions have been proposed, mainly based on the recursive decomposition of the point cloud into planes. We compare such a solution to a non-associative MRF method inspired by some recent work in computer vision. The MRF yields interesting results that are however less good than those of a carefully tuned geometric method. Nevertheless, MRF still has some advantages and we suggest some improvements.