Yaodong Yang · 杨耀东 (original) (raw)

Yaodong Yang 杨耀东 · Boya Young Scholar

Yaodong Yang YY

Dr. Yaodong Yang is an Assistant Professor (Boya Young Scholar) and Assistant Dean at the Institute for Artificial Intelligence, Peking University, and Chief Scientist of the PKU–PsiBot Joint Laboratory. His research focuses on experience learning and alignment of AI / Embodied agents, aiming to advance the trustworthy deployment and real-world alignment of large models, spanning the areas of reinforcement learning, AI alignment, and embodied intelligence.

He has published over 200 papers in leading journals and conferences, including Nature Machine Intelligence, Cell Matter, Artificial Intelligence Journal, and IEEE TPAMI, with more than 17,000 Google Scholar citations. Since 2022, he has been ranked as the top scholar in AI & ML at Peking University according to CSRankings.

Dr. Yang has received numerous honors, including the ACL 2025 Best Paper Award, UKRI 2026 Best Paper Award in AI, ICCV 2023 Best Paper Finalist, CoRL 2020 Best System Paper Award, and the AAMAS 2021 Blue Sky Idea Award.

He was named to the MIT Technology Review "AI 100 Young Innovators", the 2025 Forbes China Technology & Innovation Innovative Leader list, received the WAIC 2022 "Yunfan Star Award", and the ACM SIGAI China Rising Star Award. His work has been featured by CCTV, People's Daily, Xinhua News, the National Natural Science Foundation of China (NSFC), and MIT Technology Review.

He serves as an Area Chair for major conferences including ICML, ICLR, NeurIPS, AAAI, IJCAI, AAMAS, and IROS, and as an Associate Editor for Scientific Reports, Transactions on Machine Learning Research, and Neural Networks.

Previously, Dr. Yang was an Assistant Professor at King's College London, a Principal Researcher at Huawei Research U.K., and a Senior Manager at AIG. He received his B.Sc. from the University of Science and Technology of China, M.Sc. from Imperial College London, and Ph.D. from University College London, where he was the university's sole nominee for the ACM SIGAI Doctoral Dissertation Award.

| CSRanking · #1 PKU AI+ML | Best Paper Award · Five times | Elsevier · World Top 2% Scientist

200+

Publications

Nature MI · Matter · JMLR · TPAMI

17k+

Citations

Google Scholar · h-index 60

#1

PKU AI+ML Rank since 2022

CSRankings · AIRankings

5+

Best-Paper-Level Awards

ACL · UKRI · CoRL · ICCV · AAMAS

— Industrial Collaborations · partners

News

Headlines · recent updates

★ Headline · Jun 2026

Our paper RoboSafe wins the Outstanding Paper Award at the ICLR 2026 Workshop on Efficient Spatial Reasoning.

"RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic" proposes a neuro-symbolic framework that compiles natural-language safety rules into executable logic to monitor and constrain embodied agents at run-time.

★ Headline · Apr 2026

AI breaks the human records in the Kissing Number Problem

PKU mathematicians used AI and reinforcement learning to explore the kissing number problem, achieving breakthroughs in higher dimensions.

★ Headline · Apr 2026

PsiBot releases WAM — world-action model ranking #1 globally on MolmoSpace

Joint work with PKU–PsiBot Lab. A generalist world-action model for embodied agents, outperforming prior SOTA on spatial reasoning benchmarks.

★ Headline · Jul 2025

Our paper wins the ACL 2025 Best Paper Award — "Language Models Resist Alignment"

The paper shows that post-aligned language models tend to revert to their pre-training distributions — a theoretical "elasticity" result with implications for RLHF and safety.

★ Headline · Apr 2025

I delivered a 3-hour tutorial at ICML 2025 (virtual) on Alignment Methods for LLMs.

A comprehensive ICML tutorial covering RLHF, DPO, safe alignment, preference learning and super-alignment.

★ Headline · Dec 2024

We published Matter (Cell Press) paper on applying LLMs for generating carbon nanotubes automatically.

A cross-disciplinary work applying LLMs to steer autonomous experimental synthesis of carbon nanotubes, featured in Cell Press's flagship materials journal Matter.

★ Headline · Sep 2024

We published Nature Machine Intelligence paper on large-scale multi-agent networked RL and its applications on pandemics, smart grid and traffic control.

The first multi-agent RL paper led by a Chinese team on a Nature sister journal. Scalable method for controlling 1000+ networked agents with real-world deployments.

Browse the full timeline ICML 2026 · NeurIPS 2025 · ACL 2025 Best Paper · ICLR · CoRL 2020 … 37 entries

Research

Five directions · methods, benchmarks, and representative works

01 / RL for Alignment

LLM Alignment & RLHF

Centered on RLHF, preference learning, safe alignment, red-teaming, and interpretability — making LLMs remain helpful, harmless, and honest as capabilities grow.

Featured

02 / RL for Embodied AI

Embodied Reinforcement Learning

Driving bimanual dexterous manipulation, vision-language-action models, and sim-to-real transfer with reinforcement learning — building generalist robots that reach human-level dexterity in the physical world.

Featured

03 / Multi-agent RL

Multi-Agent RL

Studying the game-theoretic foundations and scalable algorithms of cooperative and competitive multi-agent reinforcement learning — from policy gradients and Nash equilibria to population-based training at scale.

Featured

04 / Agentic RL

Agentic RL & Social Simulation

Studying policy learning and alignment for LLM-based agents, covering negotiation, consensus, macroeconomic modelling, and world models that unify physical and social dynamics.

Featured

05 / RL for Science

RL for Science

Applying reinforcement learning and LLMs to scientific problems in mathematics, medicine, physics and materials, with results published in Nature and Cell sister journals.

Featured

Press

National coverage · CCTV · Xinhua · NSFC · MIT Tech Review

CCTV · Xinhua News · People's Daily · MIT Tech Review

Awards

Best papers · talent programs · academic honors · competitions

I. Best-Paper Awards 5 awards

2026

UKRI Best Research Paper in AI

Efficient and Scalable Reinforcement Learning for Large-Scale Network Control · Nature Machine Intelligence

2025

ACL 2025 Best Paper Award

Language Models Resist Alignment: Evidence From Data Compression

2023

ICCV 2023 Best Paper Finalist

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

2021

AAMAS 2021 Blue-Sky Idea Award

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

2020

CoRL 2020 Best System Paper Award

SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving

II. Talent Programs 3 programs

2024

National Young Talent

NSFC Excellent Young Scientist

2022

High-Level Overseas Talent

Ministry of Human Resources — 30 nationwide

2023

CAST Youth Talent Support Program

CAAI — 6 selected nationally

III. Academic Honors 5 honors

2025

Elsevier / Stanford World Top 2% Scientists

Global Top 2% career-impact ranking

2025

MIT Tech Review — AI 100 Young Innovators

MIT Technology Review · "AI 100 Young Innovators"

2026

Forbes China — Innovation Leader

Forbes China · Innovation & Tech Leaders

2022

ACM SIGAI China Rising Star Award

ACM SIGAI China · 3 awardees nationwide

2022

WAIC Yunfan Award — Rising Star

WAIC · 10 awardees nationwide

IV. Competitions & Industry 4 awards

2025

Wu Wenjun AI S&T Award · 2nd Prize

Wu Wenjun AI S&T Award · 2nd Prize — Knowledge-Enhanced Trustworthy Multimodal Interaction

2025

CMSA Meteorological Tech Invention Award · 1st Prize

CMSA · 1st Prize for Technological Invention — BeiDou + AI for Extreme-Wind Emergency Navigation

2022

NeurIPS 2022 MyoChallenge · Winner

Physiological dexterity manipulation · 1 / 340 teams

2025

Digital China Innovation Contest · AI Track 1st Prize

Digital China Innovation Contest · AI Track · National 1st Prize

Mentorship

Highest PKU student honors · Apple & Tencent fellowships · NSFC grants

2024 Highest Student Honor · PKU

PKU May-4th Medal

Yiran Geng 耿逸然 (2024) 陈 Boyuan Chen 陈博远 (2026)

PKU's highest honor for students (once every two years).

2024 University-Wide · PKU

PKU Annual Figures

Jiaming Ji 吉嘉铭 (2025) 陈 Boyuan Chen 陈博远 (2025)

Only ten students university-wide are named PKU Annual Figures each year.

2024 · 2026 Highest Graduate Honor · PKU

PKU President's Scholarship

Jiaming Ji 吉嘉铭 (2024) 马 Chengdong Ma 马成栋 (2026)

PKU's highest scholarship for students.

2025 Industry Fellowship · Apple

Apple Scholars in AI / ML

Jiaming Ji 吉嘉铭

Apple PhD Fellowship — 12 globally; only 2 from mainland China.

2024 NSFC · PhD Student Grant

NSFC Young Student

Basic Research (PhD)

Jiaming Ji 吉嘉铭

Sole PhD awardee in PKU's AI direction — NSFC Young Student Basic Research Program (PhD).

2024 NSFC · Undergraduate Grant

NSFC Young Student

Basic Research (UG)

Tianyi Qiu 邱天异

Sole undergraduate awardee in PKU's AI direction.

Teaching Awards

2026

PKU Teaching Achievement Award · 2nd Prize

For the course "Foundations and Alignment of Large Language Models" (《大语言模型基础与对齐》).

2025

Digital China Innovation Contest · AI Track 1st Prize

2025 Digital China Innovation Competition — AI Track, First Prize National.

2025

ICBC Teaching Award · PKU

ICBC Teaching Award · PKU · 2025

2022–

Class Advisor · Yuanpei AGI Experimental Class

Yuanpei College · Class Advisor & Curriculum Committee · AGI Experimental Class (2022 cohort)

2023 – 2025

Outstanding Undergraduate Research Supervisor · PKU

Awarded three years in a row (2023, 2024, 2025) by Peking University.

Undergraduate Theses Supervised Yuanpei College · School of Information Science & Technology · 2022 → 2026 23 theses

Publications

Representative works · browse by topic below

ALN

Language Models Resist Alignment: Evidence From Data Compression *

Jiaming Ji, Kaile Wang, Tianyi Alex Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, Yaodong Yang#

ACL 2025 ★ Best Paper

Alignment TheoryAlignmentLLM

AI4

Efficient and scalable reinforcement learning for large-scale network control *

Nature Machine Intelligence ★ UKRI Best Paper in AI & Robotics

Network ControlReinforcement Learning

EMB

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Wan, Weikang, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, and He Wang

ICCV 2023 ★ Best Paper Finalist

UniDexGraspDexterous ManipulationGrasping

MRL

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems *

Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, Daniel Graves, Haitham Bou Ammar, Jun Wang, Matthew E. Taylor

AAMAS 2021 ★ Best Blue-Sky Paper

Auto-CurriculumMulti-Agent RLDiversity

EMB

SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving

Ming Zhou*, Jun Luo*, Julian Villella*, Yaodong Yang*, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat, Mohsen Rohani, Nicolas Perez Nieves, Yihan Ni, Seyedershad Banijamali, Alexander Cowen Rivers, Zheng Tian, Daniel Palenicek, Haitham bou Ammar, Hongbo Zhang, Wulong Liu, Jianye Hao, Jun Wang (* equal contribution)

CoRL 2020 ★ Best System Paper

SMARTSAutonomous DrivingMulti-Agent RL

ALN

Safe multi-agent reinforcement learning for multi-robot control *

Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois C. Knoll, Yaodong Yang#

Artificial Intelligence Journal (AIJ)

Multi-Agent RLRoboticsReinforcement LearningSafe RL

ALN

Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games *

Chengdong Ma, Ziran Yang, Hai Ci, Jun Gao, Minquan Gao, Xuehai Pan, Yaodong Yang#

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Red-teamingMulti-Agent RLLLMNash EquilibriumGame TheoryDiversitySelf-Play

MRL

ASP: Learn a Universal Neural Solver *

Chenguang Wang, Zhouliang Yu, Stephen McAleer, Tianshu Yu, Yaodong Yang#

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Combinatorial OptimizationPSROAuto-Curriculum

EMB

Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation *

Yuanpei Chen, Yiran Geng, Fangwei Zhong, Jiaming Ji, Jiechuang Jiang, Zongqing Lu, Hao Dong, Yaodong Yang#

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

BimanualDexterous ManipulationRobotics

MRL

Heterogeneous-Agent Reinforcement Learning *

Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang#

Journal of Machine Learning Research (JMLR)

HARLReinforcement LearningCooperative MARLMulti-Agent RLNash Equilibrium

ALN

Omnisafe: An infrastructure for accelerating safe reinforcement learning research *

Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang#

Journal of Machine Learning Research (JMLR)

OmniSafeSafe RLReinforcement Learning

MRL

MARLlib: A Multi-agent Reinforcement Learning Library *

Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang#

Journal of Machine Learning Research (JMLR)

MARLlibMulti-Agent RLReinforcement Learning

MRL

TorchOpt: An Efficient Library for Differentiable Optimization *

Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang#

Journal of Machine Learning Research (JMLR)

Differentiable Optimization

AI4

Transforming the synthesis of carbon nanotubes with machine learning models and automation *

Yue Li, Shurui Wang, Zhou Lv, Zhaoji Wang, Yunbiao Zhao, Ying Xie, Yang Xu, Liu Qian, Yaodong Yang#, Ziqiang Zhao#, Jin Zhang#

Matter (Cell Press)

Carbon NanotubesMaterials Synthesis

MRL

On the complexity of computing markov perfect equilibrium in general-sum stochastic games *

Xiaotie Deng, Ningyuan Li, David Mguni, Jun Wang, Yaodong Yang#

National Science Review

Nash EquilibriumStochastic GamesTheoryMulti-Agent RL

ALN

Safe VLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning *

Borong Zhang, Yuhao Zhang, Jiaming Ji, Yingshan Lei, Josef Dai, Yuanpei Chen, Yaodong Yang#

NeurIPS 2025 Spotlight

Safe VLAVLASafe RLSafetyAlignment

ALN

Aligner: Efficient Alignment by Learning to Correct *

Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang#

NeurIPS 2024 Oral

AlignerAlignmentLLMRLHF

MRL

Mean Field Multi-Agent Reinforcement Learning

Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang

ICML 2018 Long Oral

Mean Field RLMulti-Agent RLReinforcement LearningNash EquilibriumQ-LearningActor-Critic

Service

Area Chair · Associate Editor · Program Chair

Area Chair

Associate Editor

Program / Publicity Chair

Experience

USTC · Imperial · UCL · AIG · KCL · PKU

2022 – Now

Assistant Professor (Boya Young Scholar)

Peking University · Institute for AI 北京大学人工智能研究院

Chief Scientist, PKU–PsiBot Joint Laboratory · PI, PAIR-Lab

2021 – 2022

Assistant Professor

King's College London · Department of Informatics 伦敦国王大学

2019 – 2021

Principal Researcher

Huawei U.K. · London Research Centre 华为英国研究院

2020 Best Technology Breakthrough Award (sole awardee)

2015 – 2019

Senior Science Manager

American International Group (AIG) · Science Dept. 美国国际集团

2016 – 2021

Ph.D. · Computer Science

University College London (UCL) 伦敦大学学院

2013 – 2014

M.Sc. · Quantitative Biology

Imperial College London 伦敦帝国理工学院

2009 – 2013

B.Eng. · Electronic Engineering & Information Science

University of Science & Technology of China (USTC) 中国科学技术大学

§ Join the Lab

Come work on the hardest problems in safe and trustworthy AGI.

PhD · 2027 PhD admissions (2027 cycle)

Three research directions

LLM Post-Training · Alignment

RLHF / DPO / Safe-RLHF · reward modeling · interpretability · multi-modal & multilingual safety. Connecting alignment theory to practice at scale.

Embodied Intelligence · Dexterous Manipulation · Robot Foundation Models

Sim-to-real policy learning for high-DoF dexterous manipulation; embodied foundation models that act in the physical world. Joint work with PsiBot.

World Models · Physics Foundation Models · Sim-to-Real Alignment

Build world models that capture both physical and social dynamics; align simulators with the real world for downstream policy training. Joint work with Neo Matrix.

PAIR-Lab also welcomes master's students, visiting scholars, undergraduate research interns, and postdocs. If you are fascinated by reinforcement learning, LLM alignment, multi-agent systems, or embodied intelligence — and want to build safe and trustworthy AGI that ships — please read the starter materials above and reach out.