Yuhui Zhang - Stanford CS PhD Student (original) (raw)

Hi! I am a PhD student in Computer Science at Stanford University, advised by Serena Yeung-Levy. Previously, I received my Bachelor's degree in Computer Science from Tsinghua University.

Research

My research focuses on multi-modal machine learning (e.g., vision and language) and applications to biomedicine and broad science. My recent works explore:

Understanding the representation space of multi-modal contrastive learning (ICLR'24, ICLR'23, NeurIPS'22)
Analyzing behaviors of multi-modal foundation models (NeurIPS'24, EMNLP'24, ICLR'23, ACL'23, NeurIPS'23, TMLR'23)
Solving novel tasks by compounding vision-language foundation models (CVPR'25, CVPR'24, ECCV'24, NEJM AI'24)
Advancing biomedical research through foundation models (ICML'25, ML4H'22, JAMIA'21, npj Digital Medicine'19, npj Digital Medicine'18)

News

05/2025: Our latest advance in virtual cell modeling (a.k.a. world model for cells), CellFlux (formerly CellFlow), has been accepted to ICML 2025.
04/2025: Three papers presented at ICLR 2025: VLM Interpretability, VidDiff, Inverse Scaling.
03/2025: Three papers accepted to CVPR 2025: AutoConverter, MicroVQA, BIOMEDICA. We are organizing DataWorld Workshop at ICML 2025.
02/2025: Introduce CellFlow, a flow-matching based method for cellular morphology prediction. We are organizing XAI4CV Workshop at CVPR 2025, MMFM-BIOMED Workshop at CVPR 2025, XLLM Workshop at ACL 2025.
01/2025: Introduce AutoConverter, an agentic framework to convert open-ended VQA questions into the multiple-choice format. VidDiff accepted to ICLR 2025, VLM Interpretability accepted to ICLR 2025 Blog Track.
12/2024: Two papers presented at NeurIPS 2024 and two at ML4H 2024.
11/2024: Our work is selected as an oral presentation (198/6105) at EMNLP 2024! Also selected as one of NeurIPS 2024 Top Reviewers and EMNLP 2024 Outstanding Reviewers.
10/2024: VLMClassifier is accepted to NeurIPS 2024; Micro-Bench is accepted to NeurIPS 2024 Datasets and Benchmarks Track.
09/2024: Our work analyzing pre-trained language models for image generation is accepted to EMNLP 2024 main conference.
07/2024: VideoAgent is accepted to ECCV 2024; AI scientific feedback is published in NEJM AI.
06/2024: Our new work investigates why visually-grounded language models are bad at the basic image classification task.
05/2024: Selected as a Citadel GQS Fellowship finalist and gave a talk in Chicago.
04/2024: VisDiff is selected as an oral presentation (90/11532) at CVPR 2024!
03/2024: Introduce VideoAgent, where we leverage a large language model as an agent for long-form video understanding.
02/2024: VisDiff accepted to CVPR 2024.
01/2024: ICLR 2024: C3 explains the geometry in multi-modal contrastive representation space and introduces a three-step method to bridge the modality gap.
12/2023: Introduce VisDiff, an algorithm that automatically describes differences between two image sets, joint work with Berkeley AI Research!
11/2023: Honored to be selected as one of NeurIPS 2023 Top Reviewers.
10/2023: Large language models generate scientific feedback, answer moral and causal questions, show inverse scaling on 11 tasks.
05/2023: Larger language models are not necessarily better on all the tasks. Check our work in ACL 2023 Findings!
01/2023: Can you diagnose and rectify a vision model using language inputs? Check our work in ICLR 2023!
11/2022: We won the 3rd prize in the first-round Inverse Scaling Prize! Also check out HELM that holistically evaluates language models.
10/2022: Honored to receive a NeurIPS 2022 Scholar Award. Thank you NeurIPS organizers!
10/2022: Two more works will be presented in ML4H and NeurIPS 2022!
09/2022: Our work studying the modality gap accepted to NeurIPS 2022!
07/2020: Stanza now supports biomedical and clinical text processing!
03/2020: Announce Stanza: A Python NLP Library for Many Human Languages! Star
05/2019: Selected as the best oral presentation at 36th Tsinghua CS Forum for Graduate Students!
04/2019: How to infer thousands of diagnoses from EHRs? Check our paper in npj (Nature) Digital Medicine!
12/2018: Awarded the SenseTime Scholarship (USD 3,000). Thanks SenseTime Inc.!
10/2018: Awarded highly selective National Scholarship!
06/2018: Received Tsinghua Research Fellowship with a funding of 7,500 USD!

Awards

2025

CVPR Doctoral Consortium

2024

EMNLP Outstanding Reviewer

2024

Citadel GQS Fellowship Finalist

2023

Stanford Data Science Scholar Finalist

2022

NeurIPS Scholar Award

2019

Best Oral Presentation Award (Presented VetTag at 36th Tsinghua CS Graduate Forum)

2019

SenseTime Scholarship (30 in China)

2018

National Scholarship (0.2% in China)

2018

Qualcomm Scholarship (100 in China)

2018

Tsinghua Research Fellowship (50 at Tsinghua)

2016-18

Tsinghua Academic / Comprehensive Excellence Scholarship

2015

Freshman Scholarship (300 at Tsinghua)

2014

Chinese Chemistry Olympiad (CChO) 1st Prize (50 in China)

Services

Program Committee

Area Chair: NeurIPS 2025, ACL 2025, EMNLP 2025

Reviewer: NeurIPS 2022-2024, ICML 2023-2025, ICLR 2024-2025, CVPR 2025, ICCV 2025, ACL 2020-2024, EMNLP 2020-2024, NAACL 2021-2024, COLM 2024, TPAMI 2023, Scientific Reports 2023

Miscellaneous

I enjoy reading books. Some of my favorites: To Live (Hua Yu), Walden (Henry David Thoreau), Principles of Economics (N. Gregory Mankiw). I enjoy hiking, jogging, and swimming. I am a fan of classical music, and I was fortunate to learn basics about how to play the guitar, piano, and pipa at Tsinghua University.