Yuanhan Zhang (original) (raw)

Yuanhan (John) Zhang Hi! I'm Yuanhan Zhang, here is the standard Chinese pronunciation for my first name : Yuanhan, a third-year PhD student at MMLab@NTU, supervised by Prof. Ziwei Liu. My research interests lie in computer vision and deep learning. In particular, I am focused on adapting foundation models—from vision to multi-modal—for real-world exploration. This involves benchmarking model performance and adapting models through parameter-efficient tuning, in-context learning, instruction tuning. Email (yuanhan002@e.ntu.edu.sg) / Google Scholar / Twitter / Github profile photo
LLaVA-Video: Video Instruction Tuning With Synthetic Data Yuanhan Zhang,Jinming Wu,Wei Li,Bo Li,Zejun Ma,Ziwei Liu Chunyuan Li arXiv Preprint, 2024 PDF / Dataset, Model and Code GitHub Repo stars Fully open-sourced video LMM model with competitive ability, including code, model, and data.
Otter: A multi-modal model with in-context instruction tuning Bo Li*,Yuanhan Zhang*, Liangyu Chen,Jinghao Wan,Fanyi Pu, Jingkang Yang,Chunyuan Li,Ziwei Liu arXiv Preprint, 2023 PDF / Dataset and Code GitHub Repo stars A vision-language model with in-context instruction tuning.
LLaVA-OneVision: Easy Visual Task Transfer Bo Li,Yuanhan Zhang,Dong Guo,Renrui Zhang,Feng Li,Hao Zhang,Kaichen Zhang,Yanwei Li,Ziwei Liu Chunyuan Li TMLR, 2025 PDF / Dataset and Code GitHub Repo stars A family of LMMs developed by consolidating insights into data, models, and visual representations.
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Feng Li*,Renrui Zhang*,Hao Zhang*,Yuanhan Zhang,Bo Li,Wei Li,Zejun Ma, ICLR, 2025 (Spotlight) PDF / Dataset and Code GitHub Repo stars Tackling Multi-image, Video, and 3D in Large Multimodal Models.
MMBench: Is Your Multi-modal Model an All-around Player? Yuan Liu*,Haodong Duan*,Yuanhan Zhang*,Bo Li*,Songyang Zhang*,Wangbo Zhao,Yike Yuan,Jiaqi Wang,Conghui He,Ziwei Liu,Kai Chen,Dahua Lin ECCV, 2024 (Oral) PDF / Dataset and Code GitHub Repo stars Benchmarking the 20 abilities of vision-language models.
Octopus: Embodied Vision-Language Programmer from Environmental Feedback Jingkang Yang,Yuhan Dong,Shuai Liu,Bo Li, Ziyue Wang, Chencheng Jiang, Haoran Tan, Jiamu Kang,Yuanhan Zhang,Kaiyang Zhou, Ziwei Liu ECCV, 2024 PDF / Dataset and Code GitHub Repo stars An embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.
FunQA: Towards Surprising Video Comprehension Binzhu Xie,Sicheng Zhang,Zitang Zhou,Bo Li,Yuanhan Zhang,Jack Hessel,Jingkang Yang,Ziwei Liu ECCV, 2024 PDF / Dataset and Code GitHub Repo stars FunQA benchmarks funny, creative, and magic videos for challenging tasks.
Knowledge augmented instruction tuning for zero-shot animal species recognition Zalan Fabian,Zhongqi Miao,Chunyuan Li,Yuanhan Zhang,Ziwei Liu,Andrés Hernández,Andrés Montes-Rojas,Rafael Escucha,Laura Siabatto, Andrés Link, Pablo Arbeláez,Rahul Dodhia, Juan Lavista Ferres Instruction Tuning and Instruction Following Workshop@NeurIPS 2023. PDF A knowledge augmented vision-language model for AI conservation.
What Makes Good Examples for Visual In-Context Learning? Yuanhan Zhang,Kaiyang Zhou, Ziwei Liu NeurIPS, 2023 PDF / Code GitHub Repo stars Retrieving prompt for visual in-context learning.
Learning without Forgetting for Vision-Language Models Da-Wei Zhou,Yuanhan Zhang;Yan Wang,Jingyi Ning,Han-Jia Ye,De-Chuan Zhan,Ziwei Liu TPAMI PDF / Code GitHub Repo stars Learning without Forgetting for Vision-Language Models.
Neural Prompt Search Yuanhan Zhang,Kaiyang Zhou, Ziwei Liu TPAMI PDF / Project Page / Code GitHub Repo stars Searching prompt modules for parameter-efficient transfer learning.
3D Point Cloud Pre-training with Knowledge Distillation from 2D Images? Yuan Yao,Yuanhan Zhang,Zhenfei Yin, Jiebo Luo,Wanli Ouyang,Xiaoshui Huang. ICME, 2023 PDF / Code 3D Point Cloud Pre-training with Knowledge Distillation from 2D Images.
Benchmarking Omni-Vision Representation through the Lens of Visual Realms Yuanhan Zhang, Zhenfei Yin, Jing Shao, Ziwei Liu ECCV, 2022 PDF / Project Page / Leaderboard / Challenge:ImageNet1k-Pretrain Track / Challenge:Open-Pretrain Track /Dataset and Code GitHub Repo stars New benchmark for evaluating vision foundation models; New supervised contrastive learning framework.
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy Yuanhan Zhang, Qinghong Sun, Yichun Zhou, Zexin He, Zhenfei Yin, Kun Wang,Lu Sheng, Yu Qiao,Jing Shao, Ziwei Liu IJCV, 2025 PDF / Project Page / Demo /Code GitHub Repo stars 4 times larger than ImageNet; 2 time larger than Object365; Built by active learning.
CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations Yuanhan Zhang, Zhenfei Yin,Yidong Li, Guojun Yin, Junjie Yan, Jing Shao, Ziwei Liu ECCV, 2020 PDF /Dataset /Demo /Code GitHub Repo stars Large-scale face-antispoofing Dataset.

Last updated in Jan. 2025.

Homepage credits: Jon Barron.