Heng Yu (original) (raw)

Heng (Henry) Yu I am a CS PhD student at Stanford University, advised by Prof. Ehsan Adeli in the STAI Lab. My research focuses on video generation and understanding, world modeling, and Embodied AI. I am also interested in their real-world applications, such as healthcare. Before Stanford, I received my master's degree from the CMU Robotics Institute, where I worked on 3D vision with Prof. Laszlo Jeni. I also collaborated closely with Prof. Berkin Bilgic at Harvard Medical School on MRI reconstruction, and with Prof. Cheng Jin at Shanghai Jiao Tong University on medical vision. I obtained my bachelor's degree from Tsinghua University, majoring in Automation with a second major in Economics and Management. My long-term goal is to build AI systems that are both technically strong and practically useful, especially in embodied perception, visual generation, and healthcare settings. I enjoy working with motivated people across academia and industry. If you'd like to collaborate, feel free to reach out. yuheng[at]stanford.edu CV / Google Scholar / GitHub / LinkedIn

Research Interests

I am primarily interested in video generation and understanding, world modeling, and Embodied AI. More broadly, I want to build visual intelligence systems that can model dynamic environments, understand how the world evolves over time, and support decision-making and interaction in open-world settings.

Selected News

Nov 2025: SocialGen is accepted by 3DV 2026.
Sep 2024: 4Real paper is accepted by NeurIPS 2024.
Feb 2024: CoGS paper is accepted by CVPR 2024.
Feb 2023: DyLiN paper is accepted by CVPR 2023.
Feb 2023: SubZero abstract is accepted by ISMRM 2023 as a power pitch.
Dec 2022: CoNFies paper is nominated as a best paper candidate.

Service

Reviewer for CVPR, ICCV, ECCV, NeurIPS, SIGGRAPH, MICCAI, ISBI, Computer Graphics Forum, and ISMRM.

Selected Publications

* indicates co-first author. Please see my Google Scholar for the full publication list.

	SocialGen: Modeling Multi-Human Social Interaction with Language Models Heng Yu, Juze Zhang, Changan Chen, Tiange Xiang, Yusu Fang, Juan Carlos Niebles, Ehsan Adeli 3DV 2026 paper /project page SocialGen is the first unified motion-language model for multi-human interactions, enabling state-of-the-art social motion modeling with a new representation, benchmark, and dataset.
	4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, László A. Jeni, Sergey Tulyakov, Hsin-Ying Lee NeurIPS 2024 paper /project page We propose 4Real, the first photorealistic text-to-4D scene generation pipeline.
	CoGS: Controllable Gaussian Splatting Heng Yu,Joel Julin,Zoltan Adam Milacski,Koichiro Niinuma,László A. Jeni CVPR 2024 paper /project page /code CoGS enables controllable Gaussian Splatting for dynamic scenes with direct scene manipulation and real-time control.
	DyLiN: Making Light Field Networks Dynamic Heng Yu,Joel Julin,Zoltan Adam Milacski,Koichiro Niinuma,László A. Jeni CVPR 2023 paper /project page /code /CMU RI News DyLiN extends light field networks to dynamic, non-rigid scenes with strong visual fidelity and efficiency.
	CoNFies: Controllable Neural Face Avatars Heng Yu,Koichiro Niinuma,László A. Jeni FG 2023 - Best Paper Award Finalist paper /project page /code CoNFies is a fully automatic controllable neural representation for face self-portraits.
	SubZero: Subspace Zero-Shot MRI Reconstruction Heng Yu,Yamin Arefeen,Berkin Bilgic ISMRM 2023 - Power Pitch paper /code SubZero improves subspace-based zero-shot self-supervised MRI reconstruction with a parallel architecture and attention mechanism.
	eRAKI: Fast Robust Artificial Neural Networks for K-space Interpolation with Coil Combination and Joint Reconstruction Heng Yu, Zijing Dong,Yamin Arefeen, Congyu Liao,Kawin Setsompop, Berkin Bilgic ISMRM 2021 - Oral Presentation paper /code eRAKI accelerates RAKI by directly learning a coil-combined target for robust and efficient MRI reconstruction.
	Predicting Treatment Response from Longitudinal Images using Multi-task Deep Learning Cheng Jin, Heng Yu*, Jia Ke, Peirong Ding, Yongju Yi, Xiaofeng Jiang, Xin Duan, Jinghua Tang,Daniel T. Chang, Xiaojian Wu, Feng Gao,Ruijiang Li Nature Communications 2021* paper /code A multi-task deep learning framework for tumor segmentation and treatment response prediction from longitudinal medical images.