Publication (original) (raw)

LARA: Latent Action Representation Alignment for Vision-Language-Action Models
International Conference on Machine Learning (ICML) 2026
(* indicates equal contribution. ✉ indicates corresponding author. † indicates project lead.)

3D-RFT: Reinforcement Fine-Tuning for Video-based 3D Scene Understanding
International Conference on Machine Learning (ICML) 2026
(* indicates equal contribution. ✉ indicates corresponding author.)

OmniXtreme: Breaking the Generality Barrier in High-Dynamic Humanoid Control
Robotics Science and Systems (RSS) 2026
(* indicates equal contribution. ✉ indicates corresponding author.)

Lifting Unlabeled Internet-scale Data for 3D Scene Understanding
Yixin Chen , Yaowei Zhang , Huangyue Yu , Junchao He , Yan Wang , Jiangyong Huang , Hongyu Shen , Junfeng Ni , Shaofei Wang , Baoxiong Jia , Song-Chun Zhu , Siyuan Huang .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2026

Learning Human-Humanoid Coordination for Collaborative Object Carrying
International Conference on Robotics and Automation (ICRA) 2026
(* indicates equal contribution. ✉ indicates corresponding author.)

SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes
International Conference on Learning Representations (ICLR) 2026
(✉ indicates corresponding author.)

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
Advances in Neural Information Processing Systems (NeurIPS) 2025 ( RoboGen@IROS 2025 Best Paper Award )
(* indicates equal contribution. ✉ indicates corresponding author. † indicates project lead.)

Learning Unified Force and Position Control for Legged Loco-Manipulation
Conference on Robot Learning (CoRL) 2025 ( Best Paper Award )
(* indicates equal contribution. ✉ indicates corresponding author. † indicates project lead.)

GWM: Toward Scalable Gaussian World Models for Robotic Manipulation
International Conference on Computer Vision (ICCV) 2025
(* indicates equal contribution. ✉ indicates corresponding author. † indicates project lead.)

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
Ziyu Zhu , Xilin Wang , Yixuan Li , Zhuofan Zhang , Xiaojian Ma , Yixin Chen , Baoxiong Jia , Wei Liang , Qian Yu , Zhidong Deng , Siyuan Huang , Qing Li .
International Conference on Computer Vision (ICCV) 2025
OpenSUN3D @ ECCV 2024

MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans
Huangyue Yu* , Baoxiong Jia*,† , Yixin Chen* , Yandan Yang , Puhao Li , Rongpeng Su , Jiaxin Li , Qing Li , Wei Liang , Song-Chun Zhu , Tengyu Liu , Siyuan Huang .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
(* indicates equal contribution. † indicates project lead.)
![]()
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
(* indicates equal contribution. † indicates project lead.)

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
(* indicates equal contribution. † indicates project lead.)

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
(* indicates equal contribution.)

RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Haoran Geng* , Feishi Wang* , Songlin Wei* , Yuyang Li* , Bangjun Wang* , Boshi An* , Charlie Tianyue Cheng* , Haozhe Lou , Peihao Li , Yen-Jen Wang , Yutong Liang , Dylan Goetting , Chaoyi Xu , Haozhe Chen , Yuxi Qian , Yiran Geng , Jiageng Mao , Weikang Wan , Mingtong Zhang , Jiangran Lyu , Siheng Zhao , Jiazhao Zhang , Jialiang Zhang , Chengyang Zhao , Haoran Lu , Yufei Ding , Ran Gong , Yuran Wang , Yuxuan Kuang , Ruihai Wu , Baoxiong Jia , Carlo Sferrazza , Hao Dong , Siyuan Huang✉ , Yue Wang✉ , Jitendra Malik✉ , Pieter Abbeel✉ .
Robotics Science and Systems (RSS) 2025 ( RoboGen@IROS2 2025 Best Open-source Award )
(* indicates equal contribution. ✉ indicates corresponding author.)

Buliding Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
International Conference on Learning Representations (ICLR) 2025
(* indicates equal contribution. † indicates project lead.)

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
International Conference on Robotics and Automation (ICRA) 2025
(* indicates equal contribution. ✉ indicates corresponding author. † indicates project lead.)

PhysPart: Physically Plausible Part Completion for Interactable Objects
International Conference on Robotics and Automation (ICRA) 2025
(* indicates equal contribution.)

MSR3D: Multi-modal Situated Reasoning in 3D Scenes
Advances in Neural Information Processing Systems (NeurIPS) 2024
(* indicates equal contribution. ✉ indicates corresponding author.)

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
European Conference on Computer Vision (ECCV) 2024
OpenSUN3D @ ECCV 2024 (* indicates equal contribution. † indicates project lead.)

SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields
European Conference on Computer Vision (ECCV) 2024
Wild3D @ ECCV 2024 (* indicates equal contribution. † indicates project lead.)

Unifying 3D Vision-Language Understanding via Promptable Queries
European Conference on Computer Vision (ECCV) 2024
OpenSUN3D @ ECCV 2024

An Embodied Generalist Agent in 3D World
International Conference on Machine Learning (ICML) 2024
GenAI4DM & AGI @ ICLR 2024 (* indicates equal contribution.)

Human-level Few-shot Concept Induction through Minimax Entropy Learning
Science Advances (SciAdv) 2024

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024 ( Highlight )
AI3DG @ CVPR 2024 (* indicates equal contribution. † indicates project lead.)

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024 ( Highlight )
HuMoGen @ CVPR 2024

ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
Advances in Neural Information Processing Systems (NeurIPS) 2023
(* indicates equal contribution.)

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events
International Conference on Computer Vision (ICCV) 2023 ( Oral )

ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic Scenes
Ran Gong* , Jiangyong Huang* , Yizhou Zhao , Haoran Geng , Xiaofeng Gao , Qingyang Wu , Wensi Ai , Ziheng Zhou , Demetri Terzopoulos , Song-Chun Zhu , Baoxiong Jia✉ , Siyuan Huang✉ .
International Conference on Computer Vision (ICCV) 2023
LangRob @ CoRL 2022 (* indicates equal contribution. ✉ indicates corresponding author.)

Learning a Causal Transition Model for Object Cutting
International Conference on Intelligent Robots and Systems (IROS) 2023
(* indicates equal contribution.)

Diffusion-based Generation, Optimization, and Planning in 3D Scenes
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023
(* indicates equal contribution.)

Improving Unsupervised Object-centric Learning with Query Optimization
International Conference on Learning Representations (ICLR) 2023
(* indicates equal contribution. † indicates project lead.)

EgoTaskQA: Understanding Human Tasks in Egocentric Videos
Advances in Neural Information Processing Systems (NeurIPS) 2022

Learning Algebraic Representation for Systematic Generalization in Contextual Decision Processes
European Conference on Computer Vision (ECCV) 2022
(* indicates equal contribution.)

Latent Diffusion Energy-Based Model for Interpretable Text Modeling
International Conference on Machine Learning (ICML) 2022

ACRE: Abstract Causal REasoning Beyond Covariation
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021
(* indicates equal contribution.)

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities
European Conference on Computer Vision (ECCV) 2020

A Generalized Earley Parser for Human Activity Parsing and Prediction
Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020

Learning Perceptual Inference by Contrasting
Advances in Neural Information Processing Systems (NeurIPS) 2019 ( Spotlight )
(* indicates equal contribution.)

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019
(* indicates equal contribution.)

Learning Human-Object Interactions by Graph Parsing Neural Networks
European Conference on Computer Vision (ECCV) 2018
(* indicates equal contribution.)

Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction
International Conference on Machine Learning (ICML) 2018

Mining User Reviews for Mobile App Comparison
ACM International Joint Conference on Pervasive and Ubiquitous Computing (IMWUT) 2017