Shadow: Leveraging Segmentation Masks for Zero-Shot Cross-Embodiment Policy Transfer (original) (raw)

1Stanford University, 2UC Berkeley

CoRL 2024

Interpolate start reference image.

Data collection in robotics is spread across diverse hardware, and this variation will increase as new hardware is developed. Effective use of this growing body of data requires methods capable of learning from diverse robot embodiments. We consider the setting of training a policy using expert trajectories from a single robot arm (the source), and evaluating on a different robot arm for which no data was collected (the target). We present a data editing scheme termed Shadow, in which the robot during training and evaluation is replaced with a composite segmentation mask of the source and target robots. In this way, the input data distribution at train and test time match closely, enabling robust policy transfer to the new unseen robot while being far more data efficient than approaches that require co-training on large amounts of data from diverse embodiments. We demonstrate that an approach as simple as Shadow is effective both in simulation on varying tasks and robots, and on real robot hardware, where Shadow demonstrates an average of over 2x improvement in success rate compared to the strongest baseline.

Real-world experiments

Different Robot

These are evaluation roll-outs on the source (Panda robot + Robotiq gripper), and the target (UR5e robot + Robotiq gripper). Compared to the strongest baseline (Mirage), Shadow achieves an additive increase in success rate on the target robot of +30%, +40%, +61%, and +38% over the Mug, Blocks, Cups, and Hexagon tasks, respectively. Videos at 1x speed unless otherwise specified.

Different Gripper

These are evaluation roll-outs on the source (Panda robot + Robotiq gripper), and the target (Panda robot + Franka gripper). Compared to the strongest baseline (Mirage), Shadow achieves an additive increase in success rate on the target robot of +72%, +21%, +18%, and +37% over the Mug, Blocks, Cups, and Hexagon tasks, respectively. Videos at 1x speed unless otherwise specified.

Simulation experiments

For each task, we show evaluation roll-outs on the source (Panda robot + Robotiq gripper), and each target robot with either the Robotiq gripper or the Franka gripper. For each target robot, we show the raw image input, and the Shadow-edited image (i.e., the overlay of the source and target segmentation masks).

All models except for the "Mug Cleanup" task were trained using 84x84 images. Models for the "Mug Cleanup" task were trained using 240x240 images.

BibTeX

@inproceedings{lepert2024shadow,
        title={Shadow: Leveraging Segmentation Masks for Zero-Shot Cross-Embodiment Policy Transfer},
        author={Marion Lepert and Ria Doshi and Jeannette Bohg},
        booktitle = {Conference on Robot Learning (CoRL)},
        address  = {Munich, Germany},
        year = {2024},
  }