Yaodong Yang · 杨耀东 (original) (raw)

Yaodong Yang 杨耀东 · Boya Young Scholar

Dr. Yaodong Yang is an Assistant Professor (Boya Young Scholar) and Assistant Dean at the Institute for Artificial Intelligence, Peking University, and Chief Scientist of the PKU–PsiBot Joint Laboratory. His research focuses on experience learning and alignment of AI / Embodied agents, aiming to advance the trustworthy deployment and real-world alignment of large models, spanning the areas of reinforcement learning, AI alignment, and embodied intelligence.

He has published over 200 papers in leading journals and conferences, including Nature Machine Intelligence, Cell Matter, Artificial Intelligence Journal, and IEEE TPAMI, with more than 17,000 Google Scholar citations. Since 2022, he has been ranked as the top scholar in AI & ML at Peking University according to CSRankings.

Dr. Yang has received numerous honors, including the ACL 2025 Best Paper Award, UKRI 2026 Best Paper Award in AI, ICCV 2023 Best Paper Finalist, CoRL 2020 Best System Paper Award, and the AAMAS 2021 Blue Sky Idea Award.

He was named to the MIT Technology Review "AI 100 Young Innovators", the 2025 Forbes China Technology & Innovation Innovative Leader list, received the WAIC 2022 "Yunfan Star Award", and the ACM SIGAI China Rising Star Award. His work has been featured by CCTV, People's Daily, Xinhua News, the National Natural Science Foundation of China (NSFC), and MIT Technology Review.

He serves as an Area Chair for major conferences including ICML, ICLR, NeurIPS, AAAI, IJCAI, AAMAS, and IROS, and as an Associate Editor for Scientific Reports, Transactions on Machine Learning Research, and Neural Networks.

Previously, Dr. Yang was an Assistant Professor at King's College London, a Principal Researcher at Huawei Research U.K., and a Senior Manager at AIG. He received his B.Sc. from the University of Science and Technology of China, M.Sc. from Imperial College London, and Ph.D. from University College London, where he was the university's sole nominee for the ACM SIGAI Doctoral Dissertation Award.

｜ CSRanking · #1 PKU AI+ML ｜ Best Paper Award · Five times ｜ Elsevier · World Top 2% Scientist

200+

Publications

Nature MI · Matter · JMLR · TPAMI

17k+

Citations

Google Scholar · h-index 60

PKU AI+ML Rank since 2022

CSRankings · AIRankings

Best-Paper-Level Awards

ACL · UKRI · CoRL · ICCV · AAMAS

— Industrial Collaborations · partners

News

Headlines · recent updates

★ Headline · Jun 2026

Our paper RoboSafe wins the Outstanding Paper Award at the ICLR 2026 Workshop on Efficient Spatial Reasoning.

"RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic" proposes a neuro-symbolic framework that compiles natural-language safety rules into executable logic to monitor and constrain embodied agents at run-time.

★ Headline · Apr 2026

AI breaks the human records in the Kissing Number Problem

PKU mathematicians used AI and reinforcement learning to explore the kissing number problem, achieving breakthroughs in higher dimensions.

★ Headline · Apr 2026

PsiBot releases WAM — world-action model ranking #1 globally on MolmoSpace

Joint work with PKU–PsiBot Lab. A generalist world-action model for embodied agents, outperforming prior SOTA on spatial reasoning benchmarks.

★ Headline · Jul 2025

Our paper wins the ACL 2025 Best Paper Award — "Language Models Resist Alignment"

The paper shows that post-aligned language models tend to revert to their pre-training distributions — a theoretical "elasticity" result with implications for RLHF and safety.

★ Headline · Apr 2025

I delivered a 3-hour tutorial at ICML 2025 (virtual) on Alignment Methods for LLMs.

A comprehensive ICML tutorial covering RLHF, DPO, safe alignment, preference learning and super-alignment.

★ Headline · Dec 2024

We published Matter (Cell Press) paper on applying LLMs for generating carbon nanotubes automatically.

A cross-disciplinary work applying LLMs to steer autonomous experimental synthesis of carbon nanotubes, featured in Cell Press's flagship materials journal Matter.

★ Headline · Sep 2024

We published Nature Machine Intelligence paper on large-scale multi-agent networked RL and its applications on pandemics, smart grid and traffic control.

The first multi-agent RL paper led by a Chinese team on a Nature sister journal. Scalable method for controlling 1000+ networked agents with real-world deployments.

Browse the full timeline ICML 2026 · NeurIPS 2025 · ACL 2025 Best Paper · ICLR · CoRL 2020 … 37 entries

Research

Five directions · methods, benchmarks, and representative works

01 / RL for Alignment

LLM Alignment & RLHF

Centered on RLHF, preference learning, safe alignment, red-teaming, and interpretability — making LLMs remain helpful, harmless, and honest as capabilities grow.

Featured

Language Models Resist Alignment — ACL 2025 ★ Best Paper
Safe RLHF — ICLR 2024 Spotlight
BeaverTails — NeurIPS 2023 Spotlight
Aligner — NeurIPS 2024 Oral
OmniSafe — JMLR 2024

02 / RL for Embodied AI

Embodied Reinforcement Learning

Driving bimanual dexterous manipulation, vision-language-action models, and sim-to-real transfer with reinforcement learning — building generalist robots that reach human-level dexterity in the physical world.

Featured

UniDexGrasp++ — ICCV 2023 ★ Best Paper Finalist
SMARTS — CoRL 2020 ★ Best System Paper
Bi-DexHands — IEEE TPAMI 2024
DexGraspVLA — AAAI 2026 Oral
Safe VLA — NeurIPS 2025 Spotlight

03 / Multi-agent RL

Multi-Agent RL

Studying the game-theoretic foundations and scalable algorithms of cooperative and competitive multi-agent reinforcement learning — from policy gradients and Nash equilibria to population-based training at scale.

Featured

Mean Field MARL — ICML 2018 Long Oral
Diverse Auto-Curriculum — AAMAS 2021 ★ Best Blue-Sky Paper
Heterogeneous-Agent RL — JMLR 2024
ASP: Universal Neural Solver — IEEE TPAMI 2024
Complexity of Markov Perfect Equilibrium — NSR 2023

04 / Agentic RL

Studying policy learning and alignment for LLM-based agents, covering negotiation, consensus, macroeconomic modelling, and world models that unify physical and social dynamics.

Featured

JARVIS-1 — IEEE TPAMI 2024
CivRealm — ICLR 2024 Spotlight
ProAgent — AAAI 2024 Oral
ProgressGym — NeurIPS 2024 Spotlight
Social World Model-Augmented Mechanism Design — NeurIPS 2025

05 / RL for Science

RL for Science

Applying reinforcement learning and LLMs to scientific problems in mathematics, medicine, physics and materials, with results published in Nature and Cell sister journals.

Featured

Efficient and Scalable RL for Large-Scale Network Control — Nature MI 2024 ★ UKRI Best Paper
Transforming Carbon Nanotube Synthesis — Matter / Cell Press 2024
LLMs in Medicine: A Scoping Review — iScience / Cell Press 2024
Finding Kissing Numbers with Game-theoretic RL — arXiv 2025
PHYBench — NeurIPS 2025

Press

National coverage · CCTV · Xinhua · NSFC · MIT Tech Review

CCTV · Xinhua News · People's Daily · MIT Tech Review

Awards

Best papers · talent programs · academic honors · competitions

I. Best-Paper Awards 5 awards

2026

UKRI Best Research Paper in AI

Efficient and Scalable Reinforcement Learning for Large-Scale Network Control · Nature Machine Intelligence

2025

ACL 2025 Best Paper Award

Language Models Resist Alignment: Evidence From Data Compression

2023

ICCV 2023 Best Paper Finalist

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

2021

AAMAS 2021 Blue-Sky Idea Award

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

2020

CoRL 2020 Best System Paper Award

SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving

II. Talent Programs 3 programs

2024

National Young Talent

NSFC Excellent Young Scientist

2022

High-Level Overseas Talent

Ministry of Human Resources — 30 nationwide

2023

CAST Youth Talent Support Program

CAAI — 6 selected nationally

III. Academic Honors 5 honors

2025

Elsevier / Stanford World Top 2% Scientists

Global Top 2% career-impact ranking

2025

MIT Tech Review — AI 100 Young Innovators

MIT Technology Review · "AI 100 Young Innovators"

2026

Forbes China — Innovation Leader

Forbes China · Innovation & Tech Leaders

2022

ACM SIGAI China Rising Star Award

ACM SIGAI China · 3 awardees nationwide

2022

WAIC Yunfan Award — Rising Star

WAIC · 10 awardees nationwide

IV. Competitions & Industry 4 awards

2025

Wu Wenjun AI S&T Award · 2nd Prize

Wu Wenjun AI S&T Award · 2nd Prize — Knowledge-Enhanced Trustworthy Multimodal Interaction

2025

CMSA Meteorological Tech Invention Award · 1st Prize

CMSA · 1st Prize for Technological Invention — BeiDou + AI for Extreme-Wind Emergency Navigation

2022

NeurIPS 2022 MyoChallenge · Winner

Physiological dexterity manipulation · 1 / 340 teams

2025

Digital China Innovation Contest · AI Track 1st Prize

Digital China Innovation Contest · AI Track · National 1st Prize

Mentorship

Highest PKU student honors · Apple & Tencent fellowships · NSFC grants

2024 Highest Student Honor · PKU

PKU May-4th Medal

耿 Yiran Geng 耿逸然 （2024）陈 Boyuan Chen 陈博远 （2026）

PKU's highest honor for students (once every two years).

2024 University-Wide · PKU

PKU Annual Figures

吉 Jiaming Ji 吉嘉铭 （2025）陈 Boyuan Chen 陈博远 （2025）

Only ten students university-wide are named PKU Annual Figures each year.

2024 · 2026 Highest Graduate Honor · PKU

PKU President's Scholarship

吉 Jiaming Ji 吉嘉铭 （2024）马 Chengdong Ma 马成栋 （2026）

PKU's highest scholarship for students.

2025 Industry Fellowship · Apple

Apple Scholars in AI / ML

吉 Jiaming Ji 吉嘉铭

Apple PhD Fellowship — 12 globally; only 2 from mainland China.

2024 NSFC · PhD Student Grant

NSFC Young Student

Basic Research (PhD)

吉 Jiaming Ji 吉嘉铭

Sole PhD awardee in PKU's AI direction — NSFC Young Student Basic Research Program (PhD).

2024 NSFC · Undergraduate Grant

NSFC Young Student

Basic Research (UG)

邱 Tianyi Qiu 邱天异

Sole undergraduate awardee in PKU's AI direction.

Teaching Awards

2026

PKU Teaching Achievement Award · 2nd Prize

For the course "Foundations and Alignment of Large Language Models" (《大语言模型基础与对齐》).

2025

Digital China Innovation Contest · AI Track 1st Prize

2025 Digital China Innovation Competition — AI Track, First Prize National.

2025

ICBC Teaching Award · PKU

ICBC Teaching Award · PKU · 2025

2022–

Class Advisor · Yuanpei AGI Experimental Class

Yuanpei College · Class Advisor & Curriculum Committee · AGI Experimental Class (2022 cohort)

2023 – 2025

Outstanding Undergraduate Research Supervisor · PKU

Awarded three years in a row (2023, 2024, 2025) by Peking University.

Undergraduate Theses Supervised Yuanpei College · School of Information Science & Technology · 2022 → 2026 23 theses

2026 Minghao Liu Information & Computing Science Exploring Ramsey Number Constructions via Artificial Intelligence Methods
2026 Tianyi Qiu Computer Science ★ PKU Top-10 Undergraduate Thesis Convergence and Locality of Reasoning in Language Models: A Probability-Weighted Graph Analysis
2026 Jianan Lyu Intelligence Science & Tech Dataset Construction and VLA Training for Multi-task Generalizable Dexterous Hand Manipulation
2026 Minqi Wang Intelligence Science & Tech Large-scale Optimization Pipeline for Human-to-Dexterous-Hand Transfer Based on Semantic Correspondence and Trajectory Optimization
2026 Lingyun Xu Intelligence Science & Tech ResMerge: Residual Policy Learning and Merging for Continual Adaptation of Pre-trained Robot Policies
2026 Muyao Li Yuanpei · AGI Class Improving Long-Horizon Decision-Making with Foundation Agentic Models
2026 Kaile Wang Yuanpei · AGI Class Reducing Deceptive Alignment through Self-Regulation
2026 Boyuan Chen Yuanpei · AGI Class The Shadow of Intelligence: Benchmarking the Scaling Laws of Catastrophic Risks in LLMs
2026 Xuchuan Huang Data Science A Hierarchical Vision-Language-Action Framework for Long-Horizon Robotic Manipulation
2026 Changye Li Yuanpei · AGI Class Scaling Test-time Inference for Visual Grounding
2026 Siqi Yang Yuanpei · AGI Class LatentRec: Internalizing Faithful Latent Reasoning for LLM-Based Recommendation
2025 Chiyuan Wang Yuanpei · AGI Class A Scalable Multi-Agent Macroeconomic Simulation Framework in JAX
2025 Qiufan Pang Information & Computing Science Improving Safety of Text-to-Image Generation via Interleaved Text-Image Chain-of-Thought Datasets
2025 Haiyue Sun Intelligence Science & Tech Solving Bridge AI with Large Language Models
2025 Shenghang Sun Information & Computing Science PREMIUM: Personalizing LLMs with Individual Preference Feedback
2025 Qizhi Chen Information & Computing Science Exploring Thread-level Multi-task Abstraction in Large Language Models
2025 Ziran Yang Yuanpei · AGI Class Modeling and Guiding Policy Diversity in LLM-Based Agents
2025 Yangyi Ye Information & Computing Science An Improved ComboOpt Zero Algorithm for Solving the Max-Cut Problem
2024 Kai Cheng Computer Science Perception-Based Object Manipulation Learning in Cluttered Environments
2023 Lehang Zhang Computer Science Part-level Interactive Scene Reconstruction for Robotic Task and Motion Planning
2023 Weitao Wang Computer Science Implementation and Application of Multi-task Learning: A UniMASK Perspective
2023 Yutong Yin Yuanpei · AGI Class Hardware-Accelerated Computation of Nash Equilibrium
2022 Zhuoyuan He Computer Science GPU-Accelerated Efficient Approximation of Nash Equilibrium

Publications

Representative works · browse by topic below

ALN

Language Models Resist Alignment: Evidence From Data Compression *

Jiaming Ji, Kaile Wang, Tianyi Alex Qiu, Boyuan Chen, Jiayi Zhou, Changye Li, Hantao Lou, Josef Dai, Yunhuai Liu, Yaodong Yang#

ACL 2025 ★ Best Paper

Alignment TheoryAlignmentLLM

AI4

Efficient and scalable reinforcement learning for large-scale network control *

Nature Machine Intelligence ★ UKRI Best Paper in AI & Robotics

Network ControlReinforcement Learning

EMB

UniDexGrasp++: Improving Dexterous Grasping Policy Learning via Geometry-aware Curriculum and Iterative Generalist-Specialist Learning

Wan, Weikang, Haoran Geng, Yun Liu, Zikang Shan, Yaodong Yang, Li Yi, and He Wang

ICCV 2023 ★ Best Paper Finalist

UniDexGraspDexterous ManipulationGrasping

MRL

Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems *

Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, Daniel Graves, Haitham Bou Ammar, Jun Wang, Matthew E. Taylor

AAMAS 2021 ★ Best Blue-Sky Paper

Auto-CurriculumMulti-Agent RLDiversity

EMB

SMARTS: An Open-Source Scalable Multi-Agent RL Training School for Autonomous Driving

Ming Zhou*, Jun Luo*, Julian Villella*, Yaodong Yang*, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat, Mohsen Rohani, Nicolas Perez Nieves, Yihan Ni, Seyedershad Banijamali, Alexander Cowen Rivers, Zheng Tian, Daniel Palenicek, Haitham bou Ammar, Hongbo Zhang, Wulong Liu, Jianye Hao, Jun Wang (* equal contribution)

CoRL 2020 ★ Best System Paper

SMARTSAutonomous DrivingMulti-Agent RL

ALN

Safe multi-agent reinforcement learning for multi-robot control *

Shangding Gu, Jakub Grudzien Kuba, Yuanpei Chen, Yali Du, Long Yang, Alois C. Knoll, Yaodong Yang#

Artificial Intelligence Journal (AIJ)

Multi-Agent RLRoboticsReinforcement LearningSafe RL

ALN

Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games *

Chengdong Ma, Ziran Yang, Hai Ci, Jun Gao, Minquan Gao, Xuehai Pan, Yaodong Yang#

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Red-teamingMulti-Agent RLLLMNash EquilibriumGame TheoryDiversitySelf-Play

MRL

ASP: Learn a Universal Neural Solver *

Chenguang Wang, Zhouliang Yu, Stephen McAleer, Tianshu Yu, Yaodong Yang#

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Combinatorial OptimizationPSROAuto-Curriculum

EMB

Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation *

Yuanpei Chen, Yiran Geng, Fangwei Zhong, Jiaming Ji, Jiechuang Jiang, Zongqing Lu, Hao Dong, Yaodong Yang#

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

BimanualDexterous ManipulationRobotics

MRL

Heterogeneous-Agent Reinforcement Learning *

Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang#

Journal of Machine Learning Research (JMLR)

HARLReinforcement LearningCooperative MARLMulti-Agent RLNash Equilibrium

ALN

Omnisafe: An infrastructure for accelerating safe reinforcement learning research *

Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang#

Journal of Machine Learning Research (JMLR)

OmniSafeSafe RLReinforcement Learning

MRL

MARLlib: A Multi-agent Reinforcement Learning Library *

Siyi Hu, Yifan Zhong, Minquan Gao, Weixun Wang, Hao Dong, Xiaodan Liang, Zhihui Li, Xiaojun Chang, Yaodong Yang#

Journal of Machine Learning Research (JMLR)

MARLlibMulti-Agent RLReinforcement Learning

MRL

TorchOpt: An Efficient Library for Differentiable Optimization *

Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang#

Journal of Machine Learning Research (JMLR)

Differentiable Optimization

AI4

Transforming the synthesis of carbon nanotubes with machine learning models and automation *

Yue Li, Shurui Wang, Zhou Lv, Zhaoji Wang, Yunbiao Zhao, Ying Xie, Yang Xu, Liu Qian, Yaodong Yang#, Ziqiang Zhao#, Jin Zhang#

Matter (Cell Press)

Carbon NanotubesMaterials Synthesis

MRL

On the complexity of computing markov perfect equilibrium in general-sum stochastic games *

Xiaotie Deng, Ningyuan Li, David Mguni, Jun Wang, Yaodong Yang#

National Science Review

Nash EquilibriumStochastic GamesTheoryMulti-Agent RL

ALN

Safe VLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning *

Borong Zhang, Yuhao Zhang, Jiaming Ji, Yingshan Lei, Josef Dai, Yuanpei Chen, Yaodong Yang#

NeurIPS 2025 Spotlight

Safe VLAVLASafe RLSafetyAlignment

ALN

Aligner: Efficient Alignment by Learning to Correct *

Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang#

NeurIPS 2024 Oral

AlignerAlignmentLLMRLHF

MRL

Mean Field Multi-Agent Reinforcement Learning

Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, Jun Wang

ICML 2018 Long Oral

Mean Field RLMulti-Agent RLReinforcement LearningNash EquilibriumQ-LearningActor-Critic

Service

Area Chair · Associate Editor · Program Chair

Area Chair

NeurIPS CCF-A
ICML CCF-A
ICLR CCF-A
AAAI CCF-A
IJCAI CCF-A
AAMAS — Senior AC CCF-B
IROS / CoRL CCF-C

Associate Editor

Neural Networks (Springer) CCF-B
Transactions on Machine Learning Research TMLR
Scientific Reports Nature

Program / Publicity Chair

World Artificial Intelligence Conference Academic (WAICA) 2026 · Shanghai Publicity Chair
Distributed AI Conference (DAI) 2024 · Singapore Program Chair

Experience

USTC · Imperial · UCL · AIG · KCL · PKU

2022 – Now

Assistant Professor (Boya Young Scholar)

Peking University · Institute for AI 北京大学人工智能研究院

Chief Scientist, PKU–PsiBot Joint Laboratory · PI, PAIR-Lab

2021 – 2022

Assistant Professor

King's College London · Department of Informatics 伦敦国王大学

2019 – 2021

Principal Researcher

Huawei U.K. · London Research Centre 华为英国研究院

2020 Best Technology Breakthrough Award (sole awardee)

2015 – 2019

Senior Science Manager

American International Group (AIG) · Science Dept. 美国国际集团

2016 – 2021

Ph.D. · Computer Science

University College London (UCL) 伦敦大学学院

2013 – 2014

M.Sc. · Quantitative Biology

Imperial College London 伦敦帝国理工学院

2009 – 2013

B.Eng. · Electronic Engineering & Information Science

University of Science & Technology of China (USTC) 中国科学技术大学

§ Join the Lab

Come work on the hardest problems in safe and trustworthy AGI.

PhD · 2027 PhD admissions (2027 cycle)

Three research directions

LLM Post-Training · Alignment

RLHF / DPO / Safe-RLHF · reward modeling · interpretability · multi-modal & multilingual safety. Connecting alignment theory to practice at scale.

Embodied Intelligence · Dexterous Manipulation · Robot Foundation Models

Sim-to-real policy learning for high-DoF dexterous manipulation; embodied foundation models that act in the physical world. Joint work with PsiBot.

World Models · Physics Foundation Models · Sim-to-Real Alignment

Build world models that capture both physical and social dynamics; align simulators with the real world for downstream policy training. Joint work with Neo Matrix.

PAIR-Lab also welcomes master's students, visiting scholars, undergraduate research interns, and postdocs. If you are fascinated by reinforcement learning, LLM alignment, multi-agent systems, or embodied intelligence — and want to build safe and trustworthy AGI that ships — please read the starter materials above and reach out.

Yaodong Yang · 杨耀东 (original) (raw)

Yaodong Yang 杨耀东 · Boya Young Scholar

News

Our paper RoboSafe wins the Outstanding Paper Award at the ICLR 2026 Workshop on Efficient Spatial Reasoning.

AI breaks the human records in the Kissing Number Problem

PsiBot releases WAM — world-action model ranking #1 globally on MolmoSpace

Our paper wins the ACL 2025 Best Paper Award — "Language Models Resist Alignment"

I delivered a 3-hour tutorial at ICML 2025 (virtual) on Alignment Methods for LLMs.

We published Matter (Cell Press) paper on applying LLMs for generating carbon nanotubes automatically.

We published Nature Machine Intelligence paper on large-scale multi-agent networked RL and its applications on pandemics, smart grid and traffic control.

Research

LLM Alignment & RLHF

Embodied Reinforcement Learning

Multi-Agent RL

Agentic RL & Social Simulation

RL for Science

Press

Awards

UKRI Best Research Paper in AI

ACL 2025 Best Paper Award

ICCV 2023 Best Paper Finalist

AAMAS 2021 Blue-Sky Idea Award

CoRL 2020 Best System Paper Award

National Young Talent

High-Level Overseas Talent

CAST Youth Talent Support Program

Elsevier / Stanford World Top 2% Scientists

MIT Tech Review — AI 100 Young Innovators

Forbes China — Innovation Leader

ACM SIGAI China Rising Star Award

WAIC Yunfan Award — Rising Star

Wu Wenjun AI S&T Award · 2nd Prize

CMSA Meteorological Tech Invention Award · 1st Prize

NeurIPS 2022 MyoChallenge · Winner

Digital China Innovation Contest · AI Track 1st Prize

Mentorship

PKU May-4th Medal

PKU Annual Figures

PKU President's Scholarship

Apple Scholars in AI / ML

NSFC Young Student

NSFC Young Student

PKU Teaching Achievement Award · 2nd Prize

Digital China Innovation Contest · AI Track 1st Prize

ICBC Teaching Award · PKU

Class Advisor · Yuanpei AGI Experimental Class

Outstanding Undergraduate Research Supervisor · PKU

Publications

Service

Experience

Come work on the hardest problems in safe and trustworthy AGI.

LLM Post-Training · Alignment

Embodied Intelligence · Dexterous Manipulation · Robot Foundation Models

World Models · Physics Foundation Models · Sim-to-Real Alignment