DiPCAN: Distilling Privileged information for Crowd-Aware Navigation (original) (raw)
Robotics: Science and Systems (RSS), New York City, NY, USA, 27 June - 1 July, 2022
Nominated for RSS 2022 Best Paper Award
Mobile robots need to navigate in crowded environments to provide services to humans. Traditional approaches to crowd-aware navigation decouple people motion prediction from robot motion planning, leading to undesired robot behaviours. Recent deep learning-based methods integrate crowd forecasting in the planner, assuming precise tracking of the agents in the scene. To do this they require expensive LiDAR sensors and tracking algorithms that are complex and brittle. In this work use a two-step approach to first learn a robot navigation policy based on privileged information about exact pedestrian locations available in simulation. A second learning step distills the knowledge acquired by the first network into an adaptation network that uses only narrow field-of-view image data from the robot sensor. While the navigation policy is trained in simulation without any expert supervision such as trajectories computed by a planner, it exhibits state-of-the-art performance on a broad range of dense crowd simulations and real-world experiments.
Example Results
LoCoBot running the DiPCAN-D agent navigates across environments with an average of 20 pedestrians to reach the goal indicated by the red cylinder. Depth images captured by the robot camera and used for navigation are displayed on the top-left corner.
@INPROCEEDINGS{Monaci-RSS-22, AUTHOR = {Gianluca Monaci AND Michel Aractingi AND Tomi Silander}, TITLE = {{DiPCAN: Distilling Privileged Information for Crowd-Aware Navigation}}, BOOKTITLE = {Proceedings of Robotics: Science and Systems}, YEAR = {2022}, ADDRESS = {New York City, NY, USA}, MONTH = {June}, DOI = {10.15607/RSS.2022.XVIII.045} }
Copied!
VISION
The research we conduct on expressive visual representations is applicable to visual search, object detection, image classification and the automatic extraction of 3D human poses and shapes that can be used for human behavior understanding and prediction, human-robot interaction or even avatar animation. We also extract 3D information from images that can be used for intelligent robot navigation, augmented reality and the 3D reconstruction of objects, buildings or even entire cities.
Our work covers the spectrum from unsupervised to supervised approaches, and from very deep architectures to very compact ones. We’re excited about the promise of big data to bring big performance gains to our algorithms but also passionate about the challenge of working in data-scarce and low-power scenarios.
Furthermore, we believe that a modern computer vision system needs to be able to continuously adapt itself to its environment and to improve itself via lifelong learning. Our driving goal is to use our research to deliver embodied intelligence to our users in robotics, autonomous driving, via phone cameras and any other visual means to reach people wherever they may be.
This web site uses cookies for the site search, to display videos and for aggregate site analytics.
Learn more about these cookies in our privacy notice.
Cookie settings
You may choose which kind of cookies you allow when visiting this website. Click on "Save cookie settings" to apply your choice.
FunctionalThis website uses functional cookies which are required for the search function to work and to apply for jobs and internships.
AnalyticalOur website uses analytical cookies to make it possible to analyse our website and optimize its usability.
Social mediaOur website places social media cookies to show YouTube and Vimeo videos. Cookies placed by these sites may track your personal data.
This content is currently blocked. To view the content please either 'Accept social media cookies' or 'Accept all cookies'.
For more information on cookies see our privacy notice.
Accept social media cookiesAccept all cookiesPersonalise cookies