Feng Gao (original) (raw)

Feng Gao received his Ph.D. from UCLA in 2022 co-advised by Ying Nian Wu and Mark Handcock. From 2017 to 2021, he was advised by Song-Chun Zhu.

He is currenty a Research Scientist at ByteDance. Specifically, he is

Working on Multimodal Foundation Models
🔬 Actively research on
- Multimodal Understanding & Generation
- Reasoning
- Embodied AI

At ByteDance, he works on LLM post-training, building Multimodal LLM agents.

Before that, he was a researcher at Amazon, and he

🐶 Built Rufus [News1], [News2], Amazon’s LLM-powered Shopping Assistant.
🚀 Launch multimodal Rufus (Rufus-MM).
- Full-stack M-LLM development: data, pre-training, post-training, evaluation.

Feel free to contact me: fenggao [dot] pub [at] gmail [dot] com.

news

selected publications

Tech Report

Vidi2.5: Large Multimodal Models for Video Understanding and Creation
ByteDance Technical Report, 2026
CVPR

M-LLM Based Video Frame Selection for Efficient Video Understanding
Kai Hu, Feng Gao, Xiaohan Nie, Peng Zhou, Son Tran, Tal Neiman, Lingyun Wang, Mubarak Shah, and 3 more authors
CVPR, 2025
NeurIPS

Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication
Yunuo Chen, Tianyi Xie, Zeshun Zong, Xuan Li, Feng Gao, Yin Yang, Ying Nian Wu, and Chenfanfu Jiang
NeurIPS, 2024
ECCV

Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang, Yasi Zhang, Zhiyuan Fang, Yingnian Wu, Yonatan Bisk, and Feng Gao
ECCV, 2024
NeurIPS

Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty
Cheng-Fu Yang, Haoyang Xu, Te-Lin Wu, Xiaofeng Gao, Kai-Wei Chang, and Feng Gao
NeurIPS OWA, 2024
NeurIPS

Learning non-Markovian Decision-Making from State-only Sequences
Aoyang Qin, Feng Gao, Qing Li, Song-Chun Zhu, and Sirui Xie
NeurIPS, 2023
CVPR

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
Da Yin, Feng Gao, Govind Thattai, Michael Johnston, and Kai-Wei Chang
CVPR, 2023
TPA-Net: Generate A Dataset for Text to Physics-based Animation
Yuxing Qiu, Feng Gao, Minchen Li, Govind Thattai, Yin Yang, and Chenfanfu Jiang
arXiv preprint arXiv:2211.13887, 2022
CVPR

Transform-Retrieve-Generate: Natural Language-centric Outside-Knowledge Visual Question Answering
Feng Gao, Qing Ping, Govind Thattai, Aishwarya Reganti, Ying Nian Wu, and Prem Natarajan
CVPR, 2022
Science Robotics

A Tale of Two Explanations: Enhancing Human Trust by Explaining Robot Behavior
Mark Edmonds*, Feng Gao*, Hangxin Liu*, Xu Xie*, Siyuan Qi, Brandon Rothrock, Yixin Zhu, Ying Nian Wu, and 2 more authors
Science Robotics, 2019
(* co-first author)
NeurIPS

Learning Perceptual Inference by Contrasting
Chi Zhang, Baoxiong Jia, Feng Gao, Yixin Zhu, Hongjing Lu, and Song-Chun Zhu
NeurIPS, 2019
CVPR

RAVEN: A Dataset for Relational and Analogical Visual Reasoning
Chi Zhang*, Feng Gao*, Baoxiong Jia, Yixin Zhu, and Song-Chun Zhu
CVPR, 2019
(* co-first author)
ICRA

Unsupervised Learning of Hierarchical Models for Hand-object Interactions
Xu Xie, Hangxin Liu, Mark Edmonds, Feng Gao, Siyuan Qi, Yixin Zhu, Brandon Rothrock, and Song-Chun Zhu
ICRA, 2018
IROS

A Glove-based System for Studying Hand-object Manipulation via Joint Pose and Force Sensing
Hangxin Liu, Xu Xie, Matt Millar, Mark Edmonds, Feng Gao, Yixin Zhu, Veronica J Santos, Brandon Rothrock, and 1 more author
IROS, 2017
IROS

Feeling the Force: Integrating Force and Pose for Fluent Discovery through Imitation Learning to Open Medicine Bottles
Mark Edmonds*, Feng Gao*, Xu Xie, Hangxin Liu, Siyuan Qi, Yixin Zhu, Brandon Rothrock, and Song-Chun Zhu
IROS, 2017
(* co-first author)