Jean-Philippe Mercier - Academia.edu (original) (raw)

Uploads

Papers by Jean-Philippe Mercier

Research paper thumbnail of Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images

arXiv (Cornell University), Jun 18, 2018

Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of o... more Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. When deep learning approaches are employed to perform this task, they typically require a large amount of training data. However, obtaining precise 6 degrees of freedom for ground-truth can be prohibitively expensive. This work therefore proposes an architecture and a training process to solve this issue. More precisely, we present a weak object detector that enables localizing objects and estimating their 6D poses in cluttered and occluded scenes. To minimize the human labor required for annotations, the proposed detector is trained with a combination of synthetic and a few weakly annotated real images (as little as 10 images per object), for which a human provides only a list of objects present in each image (no time-consuming annotations, such as bounding boxes, segmentation masks and object poses). To close the gap between real and synthetic images, we use multiple domain classifiers trained adversarially. During the inference phase, the resulting class-specific heatmaps of the weak detector are used to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publicly available datasets for pose estimation. We also evaluated our model on classification and localization in unsupervised and semi-supervised settings. The results clearly indicate that this approach could provide an efficient way toward fully automating the training process of computer vision models used in robotics.

Research paper thumbnail of Learning to Match Templates for Unseen Instance Detection

Detecting objects in images is a quintessential problem in computer vision. Much of the focus in ... more Detecting objects in images is a quintessential problem in computer vision. Much of the focus in the literature has been on the problem of identifying the bounding box of a particular type of objects in an image. Yet, in many contexts such as robotics and augmented reality, it is more important to find a specific object instance---a unique toy or a custom industrial part for example---rather than a generic object class. Here, applications can require a rapid shift from one object instance to another, thus requiring fast turnaround which affords little-to-no training time. In this context, we propose a method for detecting objects that are unknown at training time. Our approach frames the problem as one of learned template matching, where a network is trained to match the template of an object in an image. The template is obtained by rendering a textured 3D model of the object. At test time, we provide a novel 3D object, and the network is able to successfully detect it, even under s...

Research paper thumbnail of Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images

2019 International Conference on Robotics and Automation (ICRA), 2019

Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of o... more Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. When deep learning approaches are employed to perform this task, they typically require a large amount of training data. However, obtaining precise 6 degrees of freedom for ground-truth can be prohibitively expensive. This work therefore proposes an architecture and a training process to solve this issue. More precisely, we present a weak object detector that enables localizing objects and estimating their 6D poses in cluttered and occluded scenes. To minimize the human labor required for annotations, the proposed detector is trained with a combination of synthetic and a few weakly annotated real images (as little as 10 images per object), for which a human provides only a list of objects present in each image (no time-consuming annotations, such as bounding boxes, segmentation masks and object poses). To close the gap between real and synthetic images, we use multiple domain classifiers trained adversarially. During the inference phase, the resulting class-specific heatmaps of the weak detector are used to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publicly available datasets for pose estimation. We also evaluated our model on classification and localization in unsupervised and semi-supervised settings. The results clearly indicate that this approach could provide an efficient way toward fully automating the training process of computer vision models used in robotics.

Research paper thumbnail of Multisensor placement in 3D environments via visibility estimation and derivative-free optimization

2015 IEEE International Conference on Robotics and Automation (ICRA), 2015

This paper proposes a complete system for robotic sensor placement in initially unknown arbitrary... more This paper proposes a complete system for robotic sensor placement in initially unknown arbitrary threedimensional environments. The system uses a novel approach for computing the quality of acquisition of a mobile sensor group in such environments. The quality of acquisition is based on a geometric model of a camera which allows accurate sensor models and simple occlusion computation. The proposed system combines this new metric with a global derivative-free optimization algorithm to find simultaneously the number of sensors and their configuration to sense accordingly the environment. The presented framework compares favourably with current techniques working in two-dimensional environments. Furthermore, simulation and experimental results demonstrate the ability of the system to cope with full three-dimensional environments, a domain still unexplored by previous methods.

Research paper thumbnail of Drone vs. Bird Detection: Deep Learning Algorithms and Results from a Grand Challenge

Sensors, 2021

Adopting effective techniques to automatically detect and identify small drones is a very compell... more Adopting effective techniques to automatically detect and identify small drones is a very compelling need for a number of different stakeholders in both the public and private sectors. This work presents three different original approaches that competed in a grand challenge on the “Drone vs. Bird” detection problem. The goal is to detect one or more drones appearing at some time point in video sequences where birds and other distractor objects may be also present, together with motion in background or foreground. Algorithms should raise an alarm and provide a position estimate only when a drone is present, while not issuing alarms on birds, nor being confused by the rest of the scene. In particular, three original approaches based on different deep learning strategies are proposed and compared on a real-world dataset provided by a consortium of universities and research centers, under the 2020 edition of the Drone vs. Bird Detection Challenge. Results show that there is a range in d...

Research paper thumbnail of Template-based Unseen Instance Detection

arXiv: Computer Vision and Pattern Recognition, 2019

Much of the focus in the object detection literature has been on the problem of identifying the b... more Much of the focus in the object detection literature has been on the problem of identifying the bounding box of a particular class of object in an image. Yet, in contexts such as robotics and augmented reality, it is often necessary to find a specific object instance---a unique toy or a custom industrial part for example---rather than a generic object class. Here, applications can require a rapid shift from one object instance to another, thus requiring fast turnaround which affords little-to-no training time. In this context, we propose a generic approach to detect unseen instances based on templates rendered from a textured 3D model. To this effect, we introduce a network architecture which employs tunable filters, and leverage learned feature embeddings to correlate object templates and query images. At test time, our approach is able to successfully detect a previously unknown (not seen in training) object, even under significant occlusion. For instance, our method offers an imp...

Research paper thumbnail of Deep Template-based Object Instance Detection

2021 IEEE Winter Conference on Applications of Computer Vision (WACV)

Much of the focus in the object detection literature has been on the problem of identifying the b... more Much of the focus in the object detection literature has been on the problem of identifying the bounding box of a particular class of object in an image. Yet, in contexts such as robotics and augmented reality, it is often necessary to find a specific object instance-a unique toy or a custom industrial part for example-rather than a generic object class. Here, applications can require a rapid shift from one object instance to another, thus requiring fast turnaround which affords little-to-no training time. What is more, gathering a dataset and training a model for every new object instance to be detected can be an expensive and time-consuming process. In this context, we propose a generic 2D object instance detection approach that uses example viewpoints of the target object at test time to retrieve its 2D location in RGB images, without requiring any additional training (i.e. fine-tuning) step. To this end, we present an end-to-end architecture that extracts global and local information of the object from its viewpoints. The global information is used to tune early filters in the backbone while local viewpoints are correlated with the input image. Our method offers an improvement of almost 30 mAP over the previous template matching methods on the challenging Occluded Linemod [3] dataset (overall mAP of 50.7). Our experiments also show that our single generic model (not trained on any of the test objects) yields detection results that are on par with approaches that are trained specifically on the target objects.

Research paper thumbnail of Deep Object Ranking for Template Matching

Pick-and-place is an important task in robotic manipulation. In industry, template-matching appro... more Pick-and-place is an important task in robotic manipulation. In industry, template-matching approaches are often used to provide the level of precision required to locate an object to be picked. However, if a robotic workstation is to handle numerous objects, brute-force template-matching becomes expensive, and is subject to notoriously hard-totune thresholds. In this paper, we explore the use of Deep Learning methods to speed up traditional methods such as template matching. In particular, we employed a Single Shot Detection (SSD) and a Residual Network (ResNet) for object detection and classification. Classification scores allows the re-ranking of objects so that template matching is performed in order of likelihood. Tests on a dataset containing 10 industrial objects demonstrated the validity of our approach, by getting an average ranking of 1.37 for the object of interest. Moreover, we tested our approach on the standard Pose dataset which contains 15 objects and got an average ...

Research paper thumbnail of Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images

arXiv (Cornell University), Jun 18, 2018

Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of o... more Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. When deep learning approaches are employed to perform this task, they typically require a large amount of training data. However, obtaining precise 6 degrees of freedom for ground-truth can be prohibitively expensive. This work therefore proposes an architecture and a training process to solve this issue. More precisely, we present a weak object detector that enables localizing objects and estimating their 6D poses in cluttered and occluded scenes. To minimize the human labor required for annotations, the proposed detector is trained with a combination of synthetic and a few weakly annotated real images (as little as 10 images per object), for which a human provides only a list of objects present in each image (no time-consuming annotations, such as bounding boxes, segmentation masks and object poses). To close the gap between real and synthetic images, we use multiple domain classifiers trained adversarially. During the inference phase, the resulting class-specific heatmaps of the weak detector are used to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publicly available datasets for pose estimation. We also evaluated our model on classification and localization in unsupervised and semi-supervised settings. The results clearly indicate that this approach could provide an efficient way toward fully automating the training process of computer vision models used in robotics.

Research paper thumbnail of Learning to Match Templates for Unseen Instance Detection

Detecting objects in images is a quintessential problem in computer vision. Much of the focus in ... more Detecting objects in images is a quintessential problem in computer vision. Much of the focus in the literature has been on the problem of identifying the bounding box of a particular type of objects in an image. Yet, in many contexts such as robotics and augmented reality, it is more important to find a specific object instance---a unique toy or a custom industrial part for example---rather than a generic object class. Here, applications can require a rapid shift from one object instance to another, thus requiring fast turnaround which affords little-to-no training time. In this context, we propose a method for detecting objects that are unknown at training time. Our approach frames the problem as one of learned template matching, where a network is trained to match the template of an object in an image. The template is obtained by rendering a textured 3D model of the object. At test time, we provide a novel 3D object, and the network is able to successfully detect it, even under s...

Research paper thumbnail of Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images

2019 International Conference on Robotics and Automation (ICRA), 2019

Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of o... more Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. When deep learning approaches are employed to perform this task, they typically require a large amount of training data. However, obtaining precise 6 degrees of freedom for ground-truth can be prohibitively expensive. This work therefore proposes an architecture and a training process to solve this issue. More precisely, we present a weak object detector that enables localizing objects and estimating their 6D poses in cluttered and occluded scenes. To minimize the human labor required for annotations, the proposed detector is trained with a combination of synthetic and a few weakly annotated real images (as little as 10 images per object), for which a human provides only a list of objects present in each image (no time-consuming annotations, such as bounding boxes, segmentation masks and object poses). To close the gap between real and synthetic images, we use multiple domain classifiers trained adversarially. During the inference phase, the resulting class-specific heatmaps of the weak detector are used to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publicly available datasets for pose estimation. We also evaluated our model on classification and localization in unsupervised and semi-supervised settings. The results clearly indicate that this approach could provide an efficient way toward fully automating the training process of computer vision models used in robotics.

Research paper thumbnail of Multisensor placement in 3D environments via visibility estimation and derivative-free optimization

2015 IEEE International Conference on Robotics and Automation (ICRA), 2015

This paper proposes a complete system for robotic sensor placement in initially unknown arbitrary... more This paper proposes a complete system for robotic sensor placement in initially unknown arbitrary threedimensional environments. The system uses a novel approach for computing the quality of acquisition of a mobile sensor group in such environments. The quality of acquisition is based on a geometric model of a camera which allows accurate sensor models and simple occlusion computation. The proposed system combines this new metric with a global derivative-free optimization algorithm to find simultaneously the number of sensors and their configuration to sense accordingly the environment. The presented framework compares favourably with current techniques working in two-dimensional environments. Furthermore, simulation and experimental results demonstrate the ability of the system to cope with full three-dimensional environments, a domain still unexplored by previous methods.

Research paper thumbnail of Drone vs. Bird Detection: Deep Learning Algorithms and Results from a Grand Challenge

Sensors, 2021

Adopting effective techniques to automatically detect and identify small drones is a very compell... more Adopting effective techniques to automatically detect and identify small drones is a very compelling need for a number of different stakeholders in both the public and private sectors. This work presents three different original approaches that competed in a grand challenge on the “Drone vs. Bird” detection problem. The goal is to detect one or more drones appearing at some time point in video sequences where birds and other distractor objects may be also present, together with motion in background or foreground. Algorithms should raise an alarm and provide a position estimate only when a drone is present, while not issuing alarms on birds, nor being confused by the rest of the scene. In particular, three original approaches based on different deep learning strategies are proposed and compared on a real-world dataset provided by a consortium of universities and research centers, under the 2020 edition of the Drone vs. Bird Detection Challenge. Results show that there is a range in d...

Research paper thumbnail of Template-based Unseen Instance Detection

arXiv: Computer Vision and Pattern Recognition, 2019

Much of the focus in the object detection literature has been on the problem of identifying the b... more Much of the focus in the object detection literature has been on the problem of identifying the bounding box of a particular class of object in an image. Yet, in contexts such as robotics and augmented reality, it is often necessary to find a specific object instance---a unique toy or a custom industrial part for example---rather than a generic object class. Here, applications can require a rapid shift from one object instance to another, thus requiring fast turnaround which affords little-to-no training time. In this context, we propose a generic approach to detect unseen instances based on templates rendered from a textured 3D model. To this effect, we introduce a network architecture which employs tunable filters, and leverage learned feature embeddings to correlate object templates and query images. At test time, our approach is able to successfully detect a previously unknown (not seen in training) object, even under significant occlusion. For instance, our method offers an imp...

Research paper thumbnail of Deep Template-based Object Instance Detection

2021 IEEE Winter Conference on Applications of Computer Vision (WACV)

Much of the focus in the object detection literature has been on the problem of identifying the b... more Much of the focus in the object detection literature has been on the problem of identifying the bounding box of a particular class of object in an image. Yet, in contexts such as robotics and augmented reality, it is often necessary to find a specific object instance-a unique toy or a custom industrial part for example-rather than a generic object class. Here, applications can require a rapid shift from one object instance to another, thus requiring fast turnaround which affords little-to-no training time. What is more, gathering a dataset and training a model for every new object instance to be detected can be an expensive and time-consuming process. In this context, we propose a generic 2D object instance detection approach that uses example viewpoints of the target object at test time to retrieve its 2D location in RGB images, without requiring any additional training (i.e. fine-tuning) step. To this end, we present an end-to-end architecture that extracts global and local information of the object from its viewpoints. The global information is used to tune early filters in the backbone while local viewpoints are correlated with the input image. Our method offers an improvement of almost 30 mAP over the previous template matching methods on the challenging Occluded Linemod [3] dataset (overall mAP of 50.7). Our experiments also show that our single generic model (not trained on any of the test objects) yields detection results that are on par with approaches that are trained specifically on the target objects.

Research paper thumbnail of Deep Object Ranking for Template Matching

Pick-and-place is an important task in robotic manipulation. In industry, template-matching appro... more Pick-and-place is an important task in robotic manipulation. In industry, template-matching approaches are often used to provide the level of precision required to locate an object to be picked. However, if a robotic workstation is to handle numerous objects, brute-force template-matching becomes expensive, and is subject to notoriously hard-totune thresholds. In this paper, we explore the use of Deep Learning methods to speed up traditional methods such as template matching. In particular, we employed a Single Shot Detection (SSD) and a Residual Network (ResNet) for object detection and classification. Classification scores allows the re-ranking of objects so that template matching is performed in order of likelihood. Tests on a dataset containing 10 industrial objects demonstrated the validity of our approach, by getting an average ranking of 1.37 for the object of interest. Moreover, we tested our approach on the standard Pose dataset which contains 15 objects and got an average ...