Feng Gao (original) (raw)
Feng Gao received his Ph.D. from UCLA in 2022 co-advised by Ying Nian Wu and Mark Handcock. From 2017 to 2021, he was advised by Song-Chun Zhu.
He is currenty a Research Scientist at ByteDance. Specifically, he is
- Working on
Multimodal Foundation Models - 🔬 Actively research on
- Multimodal Understanding & Generation
- Reasoning
- Embodied AI
At ByteDance, he works on LLM post-training, building Multimodal LLM agents.
Before that, he was a researcher at Amazon, and he
- 🐶 Built
Rufus[News1], [News2], Amazon’s LLM-powered Shopping Assistant. - 🚀 Launch multimodal Rufus (
Rufus-MM).- Full-stack M-LLM development: data, pre-training, post-training, evaluation.
Feel free to contact me: fenggao [dot] pub [at] gmail [dot] com.
news
selected publications
- Tech Report

Vidi2.5: Large Multimodal Models for Video Understanding and Creation
ByteDance Technical Report, 2026 - CVPR

M-LLM Based Video Frame Selection for Efficient Video Understanding
Kai Hu, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, and 3 more authors
CVPR, 2025 - NeurIPS

Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication
Yunuo Chen, Tianyi Xie, Zeshun Zong, Xuan Li, Feng Gao, Yin Yang, Ying Nian Wu, and Chenfanfu Jiang
NeurIPS, 2024 - ECCV

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang, Yasi Zhang, Zhiyuan Fang, Yingnian Wu, Yonatan Bisk, and Feng Gao
ECCV, 2024 - NeurIPS

Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty
Cheng-Fu Yang, Haoyang Xu, Te-Lin Wu, Xiaofeng Gao, Kai-Wei Chang, and Feng Gao
NeurIPS OWA, 2024 - NeurIPS

Learning non-Markovian Decision-Making from State-only Sequences
Aoyang Qin, Feng Gao, Qing Li, Song-Chun Zhu, and Sirui Xie
NeurIPS, 2023 - CVPR

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
Da Yin, Feng Gao, Govind Thattai, Michael Johnston, and Kai-Wei Chang
CVPR, 2023 
TPA-Net: Generate A Dataset for Text to Physics-based Animation
Yuxing Qiu, Feng Gao, Minchen Li, Govind Thattai, Yin Yang, and Chenfanfu Jiang
arXiv preprint arXiv:2211.13887, 2022- CVPR

Transform-Retrieve-Generate: Natural Language-centric Outside-Knowledge Visual Question Answering
Feng Gao, Qing Ping, Govind Thattai, Aishwarya Reganti, Ying Nian Wu, and Prem Natarajan
CVPR, 2022 - Science Robotics

A Tale of Two Explanations: Enhancing Human Trust by Explaining Robot Behavior
Mark Edmonds*, Feng Gao*, Hangxin Liu*, Xu Xie*, Siyuan Qi, Brandon Rothrock, Yixin Zhu, Ying Nian Wu, and 2 more authors
Science Robotics, 2019
(* co-first author) - NeurIPS

Learning Perceptual Inference by Contrasting
Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, Hongjing Lu, and Song-Chun Zhu
NeurIPS, 2019 - CVPR

RAVEN: A Dataset for Relational and Analogical Visual Reasoning
Chi Zhang*, Feng Gao*, Baoxiong Jia, Yixin Zhu, and Song-Chun Zhu
CVPR, 2019
(* co-first author) - ICRA

Unsupervised Learning of Hierarchical Models for Hand-object Interactions
Xu Xie, Hangxin Liu, Mark Edmonds, Feng Gao, Siyuan Qi, Yixin Zhu, Brandon Rothrock, and Song-Chun Zhu
ICRA, 2018 - IROS

A Glove-based System for Studying Hand-object Manipulation via Joint Pose and Force Sensing
Hangxin Liu, Xu Xie, Matt Millar, Mark Edmonds, Feng Gao, Yixin Zhu, Veronica J Santos, Brandon Rothrock, and 1 more author
IROS, 2017 - IROS

Feeling the Force: Integrating Force and Pose for Fluent Discovery through Imitation Learning to Open Medicine Bottles
Mark Edmonds*, Feng Gao*, Xu Xie, Hangxin Liu, Siyuan Qi, Yixin Zhu, Brandon Rothrock, and Song-Chun Zhu
IROS, 2017
(* co-first author)