Yuexiang Zhai's Home Page (original) (raw)
YuexiangFeel free to call me Simon. Zhai
Bio
I am a member of technical staff at AMI. Previously, I was a member of technical staff at xAI. Before xAI, I was a research scientist at Google DeepMind, where I was fortunate to work with Quoc Le and Dale Schuurmans. I finished my PhD at Berkeley EECS (advisors:Yi Ma,Sergey Levine). I hold an MS degree from Columbia University, and a BS degree in Math & Applied MathIt is only one major, I do not know why Zhejiang University (ZJU) created a major named "Math & Applied Math".Perhaps ZJU is trying to suggest that some math majors are not applicable? from Zhejiang UniversityNot from the Chu Kochen Honors College, because I did pretty poorly during undergrad..
Industry Experience
- AMI — World model pretraining.
- xAI — Post-training, computer use agent.
- Google DeepMind — Post-training, computer use agent.
Contacts
- Email: simonzhai20 at {gmail dot com} or simonzhai at {berkeley dot edu}.
- Twitter / X
- Github
Research
- Past experience: My past experience spans different topics in machine learning, reinforcement learning, and large models.
- Interest: I am interested in anything thatI don't understand yet. Such as (1) how to make multimodal models better, (2) how enable models to learn from interactions.
Fun Projects
- 🃏 Poker Arena — A browser-based Texas Hold'em arena where LLM models (Claude, GPT, Gemini, DeepSeek, Grok, …) play poker against each other — or against you. [website] [code] [highlights]
- 🔥 FIRE Calculator (alpha) — Calculate your Financial Independence number for any city in the world. Life is more than working and dying — when and where will you be free? [website]
Publications ▶
Please refer to my Google Scholar profile for my full publicationGoogle scholar is a much better organizer than me list. Some papers selected by topics are listed below. I shamelessly borrowed the style from here.
Foundation Models
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Gemini Team
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu*, Yuexiang Zhai*, Jihan Yang, Shengbang Tong, Saining Xie, Dale Schuurmans, Quoc V Le, Sergey Levine, Yi Ma
International Conference on Machine Learning (ICML), 2025.
paper (openreview) project code
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Yuexiang Zhai, Hao Bai*, Zipeng Lin*, Jiayi Pan*, Shengbang Tong*, Yifei Zhou*, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine
Advances in Neural Information Processing Systems (NIPS), 2024.
paper (arXiv) project code MarketTechPost
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie
Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Oral).
Investigating the Catastrophic Forgetting in Multimodal Large Language Model Fine-Tuning
Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma
Conference on Parsimony and Learning (CPAL), 2024.
Reinforcement Learning
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto*, Yuexiang Zhai*, Anikait Singh, Max Sobol Mark, Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine
Advances in Neural Information Processing Systems (NIPS), 2023.
Understanding the Complexity Gains of Single-Task RL with a Curriculum
Qiyang Li*, Yuexiang Zhai*, Yi Ma, Sergey Levine
International Conference on Machine Learning (ICML), 2023.
Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning
Yuexiang Zhai, Christina Baek, Zhengyuan Zhou, Jiantao Jiao, Yi Ma
Journal of Artificial Intelligence Research (JAIR), 2022.
Machine Learning
Complete Dictionary Learning via L4-Norm Maximization over the Orthogonal Group
Yuexiang Zhai, Zitong Yang, Zhenyu Liao, John Wright, Yi Ma
Journal of Machine Learning Research, 2020 (JMLR).
Signal Processing with Adaptive Sparse Structured Representations 2019 (SPARS), Best student paper finalist.
Understanding L4-based Dictionary Learning: Interpretation, Stability, and Robustness
Yuexiang Zhai, Hermish Mehta, Zhengyuan Zhou, Yi Ma
International Conference on Learning Representations (ICLR), 2020.
Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning
Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, Zhihui Zhu
International Conference on Learning Representations (ICLR), 2020 (Oral).