TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach (original) (raw)
Jun Lv2, Yuwei Zeng1, Haonan Chen3, Siheng Zhao3, Jichen Sun2, Cewu Lu2, Lin Shao1,✝
1National University of Singapore, 2Shanghai Jiao Tong University, 3Nanjing University
✝Corresponding Author
CoRL 2024 (Oral)
Abstract
The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes used as subgoals, we first learn a teacher policy using privileged information. Then, we learn a student policy with point cloud observation by imitating teacher policy. Lastly, our pipeline learns a residual policy when the learned policy is applied to real-world execution, mitigating the Sim2Real gap. We demonstrate the effectiveness of TieBot in simulation and the real world. In the real-world experiment, a dual-arm robot successfully knots a tie, achieving 50% success rate among 10 trials.
Video Summary
Pipeline Overview
Our pipeline first apply local feature matching and keypoints detection to estimate the mesh of the tie in the human demonstration, with a given mesh model. Then, we learn to select grasping points for robots using RL, and train a student policy to imitate the teacher policy. Finally, we learn a residual policy when deploying to real robots.
Human Demonstration
Human demonstration of the first tie-knotting task.
Human demonstration of the second tie-knotting task.
Human demonstration of the towel-folding task.
Local Feature Matching
We use LoFTr to build feature matching between two consecutive images. Here are some examples of feature matching results.
First tie-knotting task
Second tie-knotting task
Towel-folding task
Keypoints Detection
In our Real2Sim pipeline, we use existing estimated results to train keypoints detection model. Here we illustrate several detection results. The blue arrow is the predicted z-axis. The green arrow is the predicted x-axis.
First tie-knotting task
Second tie-knotting task
Real2Sim Results
Here we show several estimated meshes using our Real2Sim pipeline. The first and third columns are point clouds extracted from human demonstration video, and the second and forth columns are estimated meshes.
First tie-knotting task
| | | | | | - | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Second tie-knotting task
| | | | | | - | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Towel-folding task
Real-World Experiment
Tie-Knotting Task
Towel-Folding Task
Acknowledgement
The authors would like to thank Zihao Xu from National University of Singapore for setting up the tie-knotting experiment in the real setting, Zhixuan Xu and Haoyu Zhou from National University of Singapore for the support of computation resources, Hongjie Fang from Shanghai Jiao Tong University and Flexiv Robotics for help with towel-folding experiment in the real setting.
BibTeX
@inproceedings{peng2024tiebot,
title={TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach},
author={Weikun Peng and Jun Lv and Yuwei Zeng and Hoanan Chen and Siheng Zhao and Jichen Sun and Cewu Lu and Lin Shao},
booktitle={8th Annual Conference on Robot Learning},
year={2024},
url={https://openreview.net/forum?id=Si2krRESZb}
}