Yaodong Yang · 杨耀东 (original) (raw)
Yaodong Yang 杨耀东 · Boya Young Scholar
Dr. Yaodong Yang is an Assistant Professor (Boya Young Scholar) and Assistant Dean at the Institute for Artificial Intelligence, Peking University, and Chief Scientist of the PKU–PsiBot Joint Laboratory. His research focuses on experience learning and alignment of AI / Embodied agents, aiming to advance the trustworthy deployment and real-world alignment of large models, spanning the areas of reinforcement learning, AI alignment, and embodied intelligence.
He has published over 200 papers in leading journals and conferences, including Nature Machine Intelligence, Cell Matter, Artificial Intelligence Journal, and IEEE TPAMI, with more than 17,000 Google Scholar citations. Since 2022, he has been ranked as the top scholar in AI & ML at Peking University according to CSRankings.
Dr. Yang has received numerous honors, including the ACL 2025 Best Paper Award, UKRI 2026 Best Paper Award in AI, ICCV 2023 Best Paper Finalist, CoRL 2020 Best System Paper Award, and the AAMAS 2021 Blue Sky Idea Award.
He was named to the MIT Technology Review "AI 100 Young Innovators", the 2025 Forbes China Technology & Innovation Innovative Leader list, received the WAIC 2022 "Yunfan Star Award", and the ACM SIGAI China Rising Star Award. His work has been featured by CCTV, People's Daily, Xinhua News, the National Natural Science Foundation of China (NSFC), and MIT Technology Review.
He serves as an Area Chair for major conferences including ICML, ICLR, NeurIPS, AAAI, IJCAI, AAMAS, and IROS, and as an Associate Editor for Scientific Reports, Transactions on Machine Learning Research, and Neural Networks.
Previously, Dr. Yang was an Assistant Professor at King's College London, a Principal Researcher at Huawei Research U.K., and a Senior Manager at AIG. He received his B.Sc. from the University of Science and Technology of China, M.Sc. from Imperial College London, and Ph.D. from University College London, where he was the university's sole nominee for the ACM SIGAI Doctoral Dissertation Award.
| CSRanking · #1 PKU AI+ML | Best Paper Award · Five times | Elsevier · World Top 2% Scientist
200+
Publications
Nature MI · Matter · JMLR · TPAMI
17k+
Citations
Google Scholar · h-index 60
#1
PKU AI+ML Rank since 2022
CSRankings · AIRankings
5+
Best-Paper-Level Awards
ACL · UKRI · CoRL · ICCV · AAMAS
— Industrial Collaborations · partners
News
Headlines · recent updates
★ Headline · Jun 2026
Our paper RoboSafe wins the Outstanding Paper Award at the ICLR 2026 Workshop on Efficient Spatial Reasoning.
"RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic" proposes a neuro-symbolic framework that compiles natural-language safety rules into executable logic to monitor and constrain embodied agents at run-time.
★ Headline · Apr 2026
AI breaks the human records in the Kissing Number Problem
PKU mathematicians used AI and reinforcement learning to explore the kissing number problem, achieving breakthroughs in higher dimensions.
★ Headline · Apr 2026
PsiBot releases WAM — world-action model ranking #1 globally on MolmoSpace
Joint work with PKU–PsiBot Lab. A generalist world-action model for embodied agents, outperforming prior SOTA on spatial reasoning benchmarks.
★ Headline · Jul 2025
Our paper wins the ACL 2025 Best Paper Award — "Language Models Resist Alignment"
The paper shows that post-aligned language models tend to revert to their pre-training distributions — a theoretical "elasticity" result with implications for RLHF and safety.
★ Headline · Apr 2025
I delivered a 3-hour tutorial at ICML 2025 (virtual) on Alignment Methods for LLMs.
A comprehensive ICML tutorial covering RLHF, DPO, safe alignment, preference learning and super-alignment.
★ Headline · Dec 2024
We published Matter (Cell Press) paper on applying LLMs for generating carbon nanotubes automatically.
A cross-disciplinary work applying LLMs to steer autonomous experimental synthesis of carbon nanotubes, featured in Cell Press's flagship materials journal Matter.
★ Headline · Sep 2024
We published Nature Machine Intelligence paper on large-scale multi-agent networked RL and its applications on pandemics, smart grid and traffic control.
The first multi-agent RL paper led by a Chinese team on a Nature sister journal. Scalable method for controlling 1000+ networked agents with real-world deployments.
Browse the full timeline ICML 2026 · NeurIPS 2025 · ACL 2025 Best Paper · ICLR · CoRL 2020 … 37 entries
Research
Five directions · methods, benchmarks, and representative works
01 / RL for Alignment
LLM Alignment & RLHF
Centered on RLHF, preference learning, safe alignment, red-teaming, and interpretability — making LLMs remain helpful, harmless, and honest as capabilities grow.
Featured
- Language Models Resist Alignment — ACL 2025 ★ Best Paper
- Safe RLHF — ICLR 2024 Spotlight
- BeaverTails — NeurIPS 2023 Spotlight
- Aligner — NeurIPS 2024 Oral
- OmniSafe — JMLR 2024
02 / RL for Embodied AI
Embodied Reinforcement Learning
Driving bimanual dexterous manipulation, vision-language-action models, and sim-to-real transfer with reinforcement learning — building generalist robots that reach human-level dexterity in the physical world.
Featured
- UniDexGrasp++ — ICCV 2023 ★ Best Paper Finalist
- SMARTS — CoRL 2020 ★ Best System Paper
- Bi-DexHands — IEEE TPAMI 2024
- DexGraspVLA — AAAI 2026 Oral
- Safe VLA — NeurIPS 2025 Spotlight
03 / Multi-agent RL
Multi-Agent RL
Studying the game-theoretic foundations and scalable algorithms of cooperative and competitive multi-agent reinforcement learning — from policy gradients and Nash equilibria to population-based training at scale.
Featured
- Mean Field MARL — ICML 2018 Long Oral
- Diverse Auto-Curriculum — AAMAS 2021 ★ Best Blue-Sky Paper
- Heterogeneous-Agent RL — JMLR 2024
- ASP: Universal Neural Solver — IEEE TPAMI 2024
- Complexity of Markov Perfect Equilibrium — NSR 2023
04 / Agentic RL
Agentic RL & Social Simulation
Studying policy learning and alignment for LLM-based agents, covering negotiation, consensus, macroeconomic modelling, and world models that unify physical and social dynamics.
Featured
- JARVIS-1 — IEEE TPAMI 2024
- CivRealm — ICLR 2024 Spotlight
- ProAgent — AAAI 2024 Oral
- ProgressGym — NeurIPS 2024 Spotlight
- Social World Model-Augmented Mechanism Design — NeurIPS 2025
05 / RL for Science
RL for Science
Applying reinforcement learning and LLMs to scientific problems in mathematics, medicine, physics and materials, with results published in Nature and Cell sister journals.
Featured
- Efficient and Scalable RL for Large-Scale Network Control — Nature MI 2024 ★ UKRI Best Paper
- Transforming Carbon Nanotube Synthesis — Matter / Cell Press 2024
- LLMs in Medicine: A Scoping Review — iScience / Cell Press 2024
- Finding Kissing Numbers with Game-theoretic RL — arXiv 2025
- PHYBench — NeurIPS 2025
Press
National coverage · CCTV · Xinhua · NSFC · MIT Tech Review
CCTV · Xinhua News · People's Daily · MIT Tech Review
Awards
Best papers · talent programs · academic honors · competitions
I. Best-Paper Awards 5 awards
2026
UKRI Best Research Paper in AI
Efficient and Scalable Reinforcement Learning for Large-Scale Network Control · Nature Machine Intelligence
2025
ACL 2025 Best Paper Award
Language Models Resist Alignment: Evidence From Data Compression
2023
ICCV 2023 Best Paper Finalist
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
2021
AAMAS 2021 Blue-Sky Idea Award
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems
2020
CoRL 2020 Best System Paper Award
SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving
II. Talent Programs 3 programs
2024
National Young Talent
NSFC Excellent Young Scientist
2022
High-Level Overseas Talent
Ministry of Human Resources — 30 nationwide
2023
CAST Youth Talent Support Program
CAAI — 6 selected nationally
III. Academic Honors 5 honors
2025
Elsevier / Stanford World Top 2% Scientists
Global Top 2% career-impact ranking
2025
MIT Tech Review — AI 100 Young Innovators
MIT Technology Review · "AI 100 Young Innovators"
2026
Forbes China — Innovation Leader
Forbes China · Innovation & Tech Leaders
2022
ACM SIGAI China Rising Star Award
ACM SIGAI China · 3 awardees nationwide
2022
WAIC Yunfan Award — Rising Star
WAIC · 10 awardees nationwide
IV. Competitions & Industry 4 awards
2025
Wu Wenjun AI S&T Award · 2nd Prize
Wu Wenjun AI S&T Award · 2nd Prize — Knowledge-Enhanced Trustworthy Multimodal Interaction
2025
CMSA Meteorological Tech Invention Award · 1st Prize
CMSA · 1st Prize for Technological Invention — BeiDou + AI for Extreme-Wind Emergency Navigation
2022
NeurIPS 2022 MyoChallenge · Winner
Physiological dexterity manipulation · 1 / 340 teams
2025
Digital China Innovation Contest · AI Track 1st Prize
Digital China Innovation Contest · AI Track · National 1st Prize
Mentorship
Highest PKU student honors · Apple & Tencent fellowships · NSFC grants
2024 Highest Student Honor · PKU
PKU May-4th Medal
耿 Yiran Geng 耿逸然 (2024) 陈 Boyuan Chen 陈博远 (2026)
PKU's highest honor for students (once every two years).
2024 University-Wide · PKU
PKU Annual Figures
吉 Jiaming Ji 吉嘉铭 (2025) 陈 Boyuan Chen 陈博远 (2025)
Only ten students university-wide are named PKU Annual Figures each year.
2024 · 2026 Highest Graduate Honor · PKU
PKU President's Scholarship
吉 Jiaming Ji 吉嘉铭 (2024) 马 Chengdong Ma 马成栋 (2026)
PKU's highest scholarship for students.
2025 Industry Fellowship · Apple
Apple Scholars in AI / ML
吉 Jiaming Ji 吉嘉铭
Apple PhD Fellowship — 12 globally; only 2 from mainland China.
2024 NSFC · PhD Student Grant
NSFC Young Student
Basic Research (PhD)
吉 Jiaming Ji 吉嘉铭
Sole PhD awardee in PKU's AI direction — NSFC Young Student Basic Research Program (PhD).
2024 NSFC · Undergraduate Grant
NSFC Young Student
Basic Research (UG)
邱 Tianyi Qiu 邱天异
Sole undergraduate awardee in PKU's AI direction.
Teaching Awards
2026
PKU Teaching Achievement Award · 2nd Prize
For the course "Foundations and Alignment of Large Language Models" (《大语言模型基础与对齐》).
2025
Digital China Innovation Contest · AI Track 1st Prize
2025 Digital China Innovation Competition — AI Track, First Prize National.
2025
ICBC Teaching Award · PKU
ICBC Teaching Award · PKU · 2025
2022–
Class Advisor · Yuanpei AGI Experimental Class
Yuanpei College · Class Advisor & Curriculum Committee · AGI Experimental Class (2022 cohort)
2023 – 2025
Outstanding Undergraduate Research Supervisor · PKU
Awarded three years in a row (2023, 2024, 2025) by Peking University.
Undergraduate Theses Supervised Yuanpei College · School of Information Science & Technology · 2022 → 2026 23 theses
- 2026 Minghao Liu Information & Computing Science Exploring Ramsey Number Constructions via Artificial Intelligence Methods
- 2026 Tianyi Qiu Computer Science ★ PKU Top-10 Undergraduate Thesis Convergence and Locality of Reasoning in Language Models: A Probability-Weighted Graph Analysis
- 2026 Jianan Lyu Intelligence Science & Tech Dataset Construction and VLA Training for Multi-task Generalizable Dexterous Hand Manipulation
- 2026 Minqi Wang Intelligence Science & Tech Large-scale Optimization Pipeline for Human-to-Dexterous-Hand Transfer Based on Semantic Correspondence and Trajectory Optimization
- 2026 Lingyun Xu Intelligence Science & Tech ResMerge: Residual Policy Learning and Merging for Continual Adaptation of Pre-trained Robot Policies
- 2026 Muyao Li Yuanpei · AGI Class Improving Long-Horizon Decision-Making with Foundation Agentic Models
- 2026 Kaile Wang Yuanpei · AGI Class Reducing Deceptive Alignment through Self-Regulation
- 2026 Boyuan Chen Yuanpei · AGI Class The Shadow of Intelligence: Benchmarking the Scaling Laws of Catastrophic Risks in LLMs
- 2026 Xuchuan Huang Data Science A Hierarchical Vision-Language-Action Framework for Long-Horizon Robotic Manipulation
- 2026 Changye Li Yuanpei · AGI Class Scaling Test-time Inference for Visual Grounding
- 2026 Siqi Yang Yuanpei · AGI Class LatentRec: Internalizing Faithful Latent Reasoning for LLM-Based Recommendation
- 2025 Chiyuan Wang Yuanpei · AGI Class A Scalable Multi-Agent Macroeconomic Simulation Framework in JAX
- 2025 Qiufan Pang Information & Computing Science Improving Safety of Text-to-Image Generation via Interleaved Text-Image Chain-of-Thought Datasets
- 2025 Haiyue Sun Intelligence Science & Tech Solving Bridge AI with Large Language Models
- 2025 Shenghang Sun Information & Computing Science PREMIUM: Personalizing LLMs with Individual Preference Feedback
- 2025 Qizhi Chen Information & Computing Science Exploring Thread-level Multi-task Abstraction in Large Language Models
- 2025 Ziran Yang Yuanpei · AGI Class Modeling and Guiding Policy Diversity in LLM-Based Agents
- 2025 Yangyi Ye Information & Computing Science An Improved ComboOpt Zero Algorithm for Solving the Max-Cut Problem
- 2024 Kai Cheng Computer Science Perception-Based Object Manipulation Learning in Cluttered Environments
- 2023 Lehang Zhang Computer Science Part-level Interactive Scene Reconstruction for Robotic Task and Motion Planning
- 2023 Weitao Wang Computer Science Implementation and Application of Multi-task Learning: A UniMASK Perspective
- 2023 Yutong Yin Yuanpei · AGI Class Hardware-Accelerated Computation of Nash Equilibrium
- 2022 Zhuoyuan He Computer Science GPU-Accelerated Efficient Approximation of Nash Equilibrium
Publications
Representative works · browse by topic below
ALN
Language Models Resist Alignment: Evidence From Data Compression *
Jiaming Ji, Kaile Wang, Tianyi Alex Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, Yaodong Yang#
ACL 2025 ★ Best Paper
Alignment TheoryAlignmentLLM
AI4
Efficient and scalable reinforcement learning for large-scale network control *
Nature Machine Intelligence ★ UKRI Best Paper in AI & Robotics
Network ControlReinforcement Learning
EMB
UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning
Wan, Weikang, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, and He Wang
ICCV 2023 ★ Best Paper Finalist
UniDexGraspDexterous ManipulationGrasping
MRL
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems *
Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, Daniel Graves, Haitham Bou Ammar, Jun Wang, Matthew E. Taylor
AAMAS 2021 ★ Best Blue-Sky Paper
Auto-CurriculumMulti-Agent RLDiversity
EMB
SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving
Ming Zhou*, Jun Luo*, Julian Villella*, Yaodong Yang*, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat, Mohsen Rohani, Nicolas Perez Nieves, Yihan Ni, Seyedershad Banijamali, Alexander Cowen Rivers, Zheng Tian, Daniel Palenicek, Haitham bou Ammar, Hongbo Zhang, Wulong Liu, Jianye Hao, Jun Wang (* equal contribution)
CoRL 2020 ★ Best System Paper
SMARTSAutonomous DrivingMulti-Agent RL
ALN
Safe multi-agent reinforcement learning for multi-robot control *
Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois C. Knoll, Yaodong Yang#
Artificial Intelligence Journal (AIJ)
Multi-Agent RLRoboticsReinforcement LearningSafe RL
ALN
Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games *
Chengdong Ma, Ziran Yang, Hai Ci, Jun Gao, Minquan Gao, Xuehai Pan, Yaodong Yang#
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Red-teamingMulti-Agent RLLLMNash EquilibriumGame TheoryDiversitySelf-Play
MRL
ASP: Learn a Universal Neural Solver *
Chenguang Wang, Zhouliang Yu, Stephen McAleer, Tianshu Yu, Yaodong Yang#
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Combinatorial OptimizationPSROAuto-Curriculum
EMB
Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation *
Yuanpei Chen, Yiran Geng, Fangwei Zhong, Jiaming Ji, Jiechuang Jiang, Zongqing Lu, Hao Dong, Yaodong Yang#
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
BimanualDexterous ManipulationRobotics
MRL
Heterogeneous-Agent Reinforcement Learning *
Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang#
Journal of Machine Learning Research (JMLR)
HARLReinforcement LearningCooperative MARLMulti-Agent RLNash Equilibrium
ALN
Omnisafe: An infrastructure for accelerating safe reinforcement learning research *
Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang#
Journal of Machine Learning Research (JMLR)
OmniSafeSafe RLReinforcement Learning
MRL
MARLlib: A Multi-agent Reinforcement Learning Library *
Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang#
Journal of Machine Learning Research (JMLR)
MARLlibMulti-Agent RLReinforcement Learning
MRL
TorchOpt: An Efficient Library for Differentiable Optimization *
Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang#
Journal of Machine Learning Research (JMLR)
Differentiable Optimization
AI4
Transforming the synthesis of carbon nanotubes with machine learning models and automation *
Yue Li, Shurui Wang, Zhou Lv, Zhaoji Wang, Yunbiao Zhao, Ying Xie, Yang Xu, Liu Qian, Yaodong Yang#, Ziqiang Zhao#, Jin Zhang#
Matter (Cell Press)
Carbon NanotubesMaterials Synthesis
MRL
On the complexity of computing markov perfect equilibrium in general-sum stochastic games *
Xiaotie Deng, Ningyuan Li, David Mguni, Jun Wang, Yaodong Yang#
National Science Review
Nash EquilibriumStochastic GamesTheoryMulti-Agent RL
ALN
Safe VLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning *
Borong Zhang, Yuhao Zhang, Jiaming Ji, Yingshan Lei, Josef Dai, Yuanpei Chen, Yaodong Yang#
NeurIPS 2025 Spotlight
Safe VLAVLASafe RLSafetyAlignment
ALN
Aligner: Efficient Alignment by Learning to Correct *
Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang#
NeurIPS 2024 Oral
AlignerAlignmentLLMRLHF
MRL
Mean Field Multi-Agent Reinforcement Learning
Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang
ICML 2018 Long Oral
Mean Field RLMulti-Agent RLReinforcement LearningNash EquilibriumQ-LearningActor-Critic
Service
Area Chair · Associate Editor · Program Chair
Area Chair
- NeurIPS CCF-A
- ICML CCF-A
- ICLR CCF-A
- AAAI CCF-A
- IJCAI CCF-A
- AAMAS — Senior AC CCF-B
- IROS / CoRL CCF-C
Associate Editor
- Neural Networks (Springer) CCF-B
- Transactions on Machine Learning Research TMLR
- Scientific Reports Nature
Program / Publicity Chair
- World Artificial Intelligence Conference Academic (WAICA) 2026 · Shanghai Publicity Chair
- Distributed AI Conference (DAI) 2024 · Singapore Program Chair
Experience
USTC · Imperial · UCL · AIG · KCL · PKU
2022 – Now
Assistant Professor (Boya Young Scholar)
Peking University · Institute for AI 北京大学人工智能研究院
Chief Scientist, PKU–PsiBot Joint Laboratory · PI, PAIR-Lab
2021 – 2022
Assistant Professor
King's College London · Department of Informatics 伦敦国王大学
2019 – 2021
Principal Researcher
Huawei U.K. · London Research Centre 华为英国研究院
2020 Best Technology Breakthrough Award (sole awardee)
2015 – 2019
Senior Science Manager
American International Group (AIG) · Science Dept. 美国国际集团
2016 – 2021
Ph.D. · Computer Science
University College London (UCL) 伦敦大学学院
2013 – 2014
M.Sc. · Quantitative Biology
Imperial College London 伦敦帝国理工学院
2009 – 2013
B.Eng. · Electronic Engineering & Information Science
University of Science & Technology of China (USTC) 中国科学技术大学
§ Join the Lab
Come work on the hardest problems in safe and trustworthy AGI.
PhD · 2027 PhD admissions (2027 cycle)
Three research directions
LLM Post-Training · Alignment
RLHF / DPO / Safe-RLHF · reward modeling · interpretability · multi-modal & multilingual safety. Connecting alignment theory to practice at scale.
Embodied Intelligence · Dexterous Manipulation · Robot Foundation Models
Sim-to-real policy learning for high-DoF dexterous manipulation; embodied foundation models that act in the physical world. Joint work with PsiBot.
World Models · Physics Foundation Models · Sim-to-Real Alignment
Build world models that capture both physical and social dynamics; align simulators with the real world for downstream policy training. Joint work with Neo Matrix.
PAIR-Lab also welcomes master's students, visiting scholars, undergraduate research interns, and postdocs. If you are fascinated by reinforcement learning, LLM alignment, multi-agent systems, or embodied intelligence — and want to build safe and trustworthy AGI that ships — please read the starter materials above and reach out.
YY