Tengyu Liu (original) (raw)

“Computer science is no more about computers than astronomy is about telescopes. – Edsger Wybe Dijkstra”

I am currently a senior research scientist at the General Vision Lab of Beijing Institute of General Artifical Intelligence (BIGAI). I obtained my PhD degree in computer science from UCLA in 2021 under the supervision of Prof. Song-Chun Zhu. Before that, I received my master’s degree in computer science from UCLA and my bachelor’s degree in computer science from UIUC.

My research interest lies at the intersection between 3D computer vision, computer graphics and robotics. My long term goal is to create intelligent agents that can interact with virtual or physical environments just like us humans do. My recent works include generalizable dexterous grasping and manipulation and whole body control of humanoid and quadruped robots.

news

Sep 6, 2024	I am thrilled to share that I have been promoted to Senior Researcher at BIGAI! I am profoundly grateful for the unwavering support and invaluable contributions of my talented students and esteemed colleagues. Your hard work, dedication, and innovative spirit have been instrumental.I will continue my research in generalizable embodied intelligence and strive for more exciting breakthroughs in the field.Thank you everyone for your hard work and dedication! Here’s to the exciting journey ahead!🚀
Jun 30, 2024	I’m excited to share that our paper on learning agent-agnostic representations for robotic manipulation is accepted by IROS 2024 as an Oral Pitch! We will also present our RA-L work on grasping multiple objects with a single dexterous hand as an Oral Presentation. Looking forward to meeting everyone in Abu Dhabi!
Feb 27, 2024	I am excited to share that 3 out of my 3 submissions in human motion generation are accepted by CVPR 2024! Congratulations to the incredible authors! See you all in Seattle!
Feb 24, 2024	Our new paper Grasp Multiple Objects with One Hand has been accepted by RA-L and will be presented at IROS 2024! Congratulations to the first author Yuyang!
Apr 23, 2023	I am thrilled to share that our ICRA submission DexGraspNet has been selected as an Outstanding Paper Finalist (Manipulation)! Congratulations to my co-authors and collaborators!

selected publications

RA-L

Grasp Multiple Objects with One Hand
RA-L 2024 [Oral Presentation]
The human hand’s complex kinematics allow for simultaneous grasping and manipulation of multiple objects, essential for tasks like object transfer and in-hand manipulation. Despite its importance, robotic multi-object grasping remains underexplored and presents challenges in kinematics, dynamics, and object configurations. This paper introduces MultiGrasp, a two-stage method for multi-object grasping on a tabletop with a multi-finger dexterous hand. It involves (i) generating pre-grasp proposals and (ii) executing the grasp and lifting the objects. Experimental results primarily focus on dual-object grasping and report a 44.13% success rate, showcasing adaptability to unseen object configurations and imprecise grasps. The framework also demonstrates the capability to grasp more than two objects, albeit at a reduced inference speed.
IROS

Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
IROS 2024 [Oral Pitch]
Enhancing the ability of robotic systems to autonomously acquire novel manipulation skills is vital for applications ranging from assembly lines to service robots. Existing methods (e.g., VIP, R3M) rely on learning a generalized representation for manipulation tasks but overlook (i) the domain gap between distinct embodiments and (ii) the sparseness of successful task trajectories within the embodiment-specific action space, leading to misaligned and ambiguous task representations with inferior learning efficiency. Our work addresses the above challenges by introducing Ag2Manip (Agent-Agnostic representations for Manipulation) for learning novel manipulation skills. Our approach encompasses two principal innovations: (i) a novel agent-agnostic visual representation trained on human manipulation videos with embodiments masked to ensure generalizability, and (ii) an agent-agnostic action representation that abstracts the robot’s kinematic chain into an agent proxy with a universally applicable action space to focus on the core interaction between the end-effector and the object. Through our experiments, Ag2Manip demonstrates remarkable improvements across a diverse array of manipulation tasks without necessitating domain-specific demonstrations, substantiating a significant 325% improvement in average success rate across 24 tasks from FrankaKitchen, ManiSkill, and PartManip. Further ablation studies underscore the critical role of both representations in achieving such improvements.
CVPR

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
CVPR 2024
Traditional approaches in physics-based motion generation, centered around imitation learning and reward shaping, often struggle to adapt to new scenarios. To tackle this limitation, we propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions. Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning. Upon receiving an open-vocabulary textual instruction, AnySkill employs a high-level policy that selects and integrates these atomic actions to maximize the CLIP similarity between the agent’s rendered images and the text. An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering. We demonstrate AnySkill’s capability to generate realistic and natural motion sequences in response to unseen instructions of varying lengths, marking it the first method capable of open-vocabulary physical skill learning for interactive humanoid agents.
CVPR

Scaling Up Dynamic Human-Scene Interaction Modeling
CVPR 2024 [Highlight]
The advancing of human-scene interaction modeling confronts substantial challenges in the scarcity of high-quality data and advanced motion synthesis methods. Previous endeavors have been inadequate in offering sophisticated datasets that effectively tackle the dual challenges of scalability and data quality. In this work, we overcome these challenges by introducing TRUMANS (TRacking hUMan ActioNs in Scenes), a large-scale MoCap dataset created by efficiently and precisely replicating the synthetic scenes in the physical environment. TRUMANS, featuring the most extensive motion-captured human-scene interaction datasets thus far, comprises over 15 hours of diverse human behaviors, including concurrent interactions with dynamic and articulated objects, across 100 indoor scene configurations. It provides accurate pose sequences of both humans and objects, ensuring a high level of contact plausibility during the interaction. To further enhance adaptivity, we propose a data augmentation approach that automatically adapts collision-free and interaction-precise human motions. Leveraging the benefits of TRUMANS, we propose a novel approach that employs a diffusion-based autoregressive mechanism for the real-time generation of human-scene interaction sequences with arbitrary length. The efficacy of TRUMANS and our motion synthesis method is validated through extensive experimental results, surpassing all existing baselines in terms of quality and diversity. Notably, our method demonstrates superb zero-shot generalizability on existing 3D scene datasets (e.g., PROX, Replica, ScanNet, ScanNet++), capable of generating even more realistic motions than the ground-truth annotations on PROX. Our human study further indicates that our generated motions are almost indistinguishable from the original motion-captured sequences, highlighting their superior quality. Our dataset and model will be released for research purposes.
ICRA

GenDexGrasp: Generalizable Dexterous Grasping
ICRA 2023
Generating dexterous grasping has been a long-standing and challenging robotic task. Despite recent progress, existing methods primarily suffer from two issues. First, most prior arts focus on a specific type of robot hand, lacking generalizable capability of handling unseen ones. Second, prior arts oftentimes fail to rapidly generate diverse grasps with a high success rate. To jointly tackle these challenges with a unified solution, we propose GenDexGrasp, a novel hand-agnostic grasping algorithm for generalizable grasping. GenDexGrasp is trained on our proposed large-scale multi-hand grasping dataset MultiDex synthesized with force closure optimization. By leveraging the contact map as a hand-agnostic intermediate representation, GenDexGrasp efficiently generates diverse and plausible grasping poses with a high success rate and can transfer among diverse multi-fingered robotic hands. Compared with previous methods, GenDexGrasp achieves a three-way trade-off among success rate, inference speed, and diversity.
ICRA

DexGraspNet: A Large-Scale Robotic Dexterous Grasp Dataset for General Objects Based on Simulation
ICRA 2023 [Outstanding Paper Candidate]
Object grasping using dexterous hands is a crucial yet challenging task for robotic dexterous manipulation. Compared with the field of object grasping with parallel grippers, dexterous grasping is very under-explored, partially owing to the lack of a large-scale dataset. In this work, we present a large-scale simulated dataset, DexGraspNet, for robotic dexterous grasping, along with a highly efficient synthesis method for diverse dexterous grasping synthesis. Leveraging a highly accelerated differentiable force closure estimator, we, for the first time, are able to synthesize stable and diverse grasps efficiently and robustly. We choose ShadowHand, a dexterous gripper commonly seen in robotics, and generated 1.32 million grasps for 5355 objects, covering more than 133 object categories and containing more than 200 diverse grasps for each object instance, with all grasps having been validated by the physics simulator. Compared to the previous dataset generated by GraspIt!, our dataset has not only more objects and grasps, but also higher diversity and quality. Via performing cross-dataset experiments, we show that training several algorithms of dexterous grasp synthesis on our datasets significantly outperforms training on the previous one, demonstrating the large scale and diversity of DexGraspNet. We will release the data and tools upon acceptance.
RA-L

Synthesizing Diverse and Physically Stable Grasps With Arbitrary Hand Structures Using Differentiable Force Closure Estimator
RA-L 2021
Existing grasp synthesis methods are either analytical or data-driven. The former one is oftentimes limited to specific application scope. The latter one depends heavily on demonstrations, thus suffers from generalization issues; e.g., models trained with human grasp data would be difficult to transfer to 3-finger grippers. To tackle these deficiencies, we formulate a fast and differentiable force closure estimation method, capable of producing diverse and physically stable grasps with arbitrary hand structures, without any training data. Although force closure has commonly served as a measure of grasp quality, it has not been widely adopted as an optimization objective for grasp synthesis primarily due to its high computational complexity; in comparison, the proposed differentiable method can test a force closure within milliseconds. In experiments, we validate the proposed method’s efficacy in 6 different settings.

services

2024 Reviewer for ICLR, ICRA, CVPR, ICML, T-PAMI

2023 Reviewer for NeurIPS, SIGGRAPH Asia, IROS, T-RO