Michi Yasunaga (original) (raw)

I am a research scientist at OpenAI.

I received PhD in Computer Science from Stanford, advised by Percy Liang, Jure Leskovec and Chris Manning. Previously, I worked as a researcher at Google DeepMind and Meta.

I am interested in building LLMs and agents that assist humans in diverse tasks. My work spans:

Post-training: RL, reward models, and evaluation (HELM, HEIM, ALMA, MMRB).
Reasoning (AnalogicalReasoner).
Retrieval and tool use for LLMs (LinkBERT, QAGNN, DRAGON, REPLUG, HippoRAG).
Multimodality: vision-language models (RA-CM3, Med-Flamingo, Transfusion).

Publications (Google Scholar)

2025

Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models
Michihiro Yasunaga, Luke Zettlemoyer, Marjan Ghazvininejad.
arXiv 2025. [paper] [dataset]
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs
Zhaofeng Wu, Michihiro Yasunaga, Andrew Cohen, Yoon Kim, Asli Celikyilmaz, Marjan Ghazvininejad.
arXiv 2025. [paper]

2024

Large Language Models as Analogical Reasoners
Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, Denny Zhou.
ICLR 2024. [paper]
ALMA: Alignment with Minimal Annotation
Michihiro Yasunaga, Leonid Shamis, Chunting Zhou, Andrew Cohen, Jason Weston, Luke Zettlemoyer, Marjan Ghazvininejad.
arXiv 2024. [paper]
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou*, Lili Yu*, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy.
arXiv 2024. [paper]
REPLUG: Retrieval-Augmented Black-Box Language Models
Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
NAACL 2024. [paper]
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su.
NeurIPS 2024. [paper]
AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval
Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, James Zou.
NeurIPS 2024. [paper]
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases
Shirley Wu*, Shiyu Zhao*, Michihiro Yasunaga, Kexin Huang, Kaidi Cao, Qian Huang, Vassilis N Ioannidis, Karthik Subbian, James Zou, Jure Leskovec.
NeurIPS 2024. [paper]
Image2Struct: Benchmarking Structure Extraction for Vision-Language Models
Josselin Somerville Roberts*, Tony Lee*, Chi Heem Wong*, Michihiro Yasunaga, Yifan Mai, Percy Liang
NeurIPS 2024. [paper]
VHELM: A Holistic Evaluation of Vision Language Models
Tony Lee*, Haoqin Tu*, Chi Heem Wong*, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Somerville Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, Percy Liang.
NeurIPS 2024. [paper]
CAT: Content-Adaptive Image Tokenization
Junhong Shen, Kushal Tirumala, Michihiro Yasunaga, Ishan Misra, Luke Zettlemoyer, Lili Yu, Chunting Zhou.
arXiv 2024. [paper]
Language Models are Graph Learners
Zhe Xu, Kaveh Hassani, Si Zhang, Hanqing Zeng, Michihiro Yasunaga, Limei Wang, Dongqi Fu, Ning Yao, Bo Long, Hanghang Tong.
arXiv 2024. [paper]

2023

Retrieval-Augmented Multimodal Language Modeling
Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
ICML 2023. [paper] [blog] [slides]
HEIM: Holistic Evaluation of Text-To-Image Models
Tony Lee*, Michihiro Yasunaga*, Chenlin Meng*, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang
NeurIPS 2023. [paper] [website]
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models
Yuhui Zhang*, Michihiro Yasunaga*, Zhengping Zhou*, Jeff Z. HaoChen*, James Zou, Percy Liang, Serena Yeung
ACL Findings 2023. [paper] [code]
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang
EMNLP 2023. [paper]
Holistic Evaluation of Language Models
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, ... (50 authors). Michihiro Yasunaga: Lead author of Knowledge section.
TMLR 2023. [paper] [project page]
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
VQA-GNN: Reasoning with Multimodal Knowledge for Visual Question Answering
Yanan Wang, Michihiro Yasunaga, Hongyu Ren, Shinya Wada and Jure Leskovec.
ICCV 2023. [paper]
Med-EASi: Finely Annotated Dataset and Models for Controllable Simplification of Medical Texts
Chandrayee Basu, Rosni Vasu, Michihiro Yasunaga, Qian Yang
AAAI 2023. [paper]
Zero-shot causal learning
Hamed Nilforoshan, Michael Moor, Yusuf Roohani, Yining Chen, Anja Šurina, Michihiro Yasunaga, Sara Oblak, Jure Leskovec
NeurIPS 2023. [paper]
Med-Flamingo: a Multimodal Medical Few-shot Learner
Michael Moor, Qian Huang, Shirley Wu, Michihiro Yasunaga, Cyril Zakka, Yash Dalmia, Eduardo Pontes Reis, Pranav Rajpurkar, Jure Leskovec
ML4H 2023. [paper] [code] [model]

2022

DRAGON: Deep Bidirectional Language-Knowledge Graph Pretraining
Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D. Manning, Percy Liang* and Jure Leskovec*
NeurIPS 2022.
AAAI 2023 Deep Learning on Graphs Workshop (Best Paper Award).
[paper] [model & code & data] [codalab] [slides] [blog]
LinkBERT: Pretraining Language Models with Document Links
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models
Tianbao Xie*, Chen Wu*, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, ..., Rui Zhang, Noah A. Smith, Luke Zettlemoyer and Tao Yu.
EMNLP 2022. [paper] [project page] [code & data]
GreaseLM: Graph Reasoning Enhanced Language Models for Question Answering
Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D Manning and Jure Leskovec.
ICLR 2022. [paper] [slides] [code]
Extending the WILDS Benchmark for Unsupervised Adaptation
Shiori Sagawa*, Pang Wei Koh*, Tony Lee*, Irena Gao*, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn and Percy Liang
ICLR 2022. [paper] [project page] [code]

2021

LM-Critic: Language Models for Unsupervised Grammatical Error Correction
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, ..., Percy Liang (116 authors). Michihiro Yasunaga: Lead author of Healthcare & Biomedicine section.
arXiv 2021. [paper] [project page]
Break-It-Fix-It: Unsupervised Learning for Program Repair
WILDS: A benchmark of in-the-wild distribution shifts
Pang Wei Koh*, Shiori Sagawa*, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang
ICML 2021.
[paper] [project page] [code] [Stanford AI blog]
LEGO: Latent Execution-Guided Reasoning for Multi-Hop Question Answering on Knowledge Graphs
Hongyu Ren, Hanjun Dai, Bo Dai, Xinyun Chen, Michihiro Yasunaga, Haitian Sun, Dale Schuurmans, Jure Leskovec and Denny Zhou.
ICML 2021. [paper] [code]

2020

DrRepair: Graph-based, Self-Supervised Program Repair from Diagnostic Feedback

2019

A Neural Topic-Attention Model for Medical Term Abbreviation Disambiguation
Irene Li, Michihiro Yasunaga, Muhammed Yavuz Nuzumlalı, Cesar Caraballo, Shiwani Mahajan, Harlan Krumholz and Dragomir Radev
NeurIPS 2019, Machine Learning for Health Workshop. [paper] [bib] [code]
CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases
SParC: Cross-Domain Semantic Parsing in Context
TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts
ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander Fabbri, Irene Li, Dan Friedman and Dragomir Radev
AAAI 2019.
[paper] [bib] [dataset]
Overview and Results of CL-SciSumm Shared Task 2019
Muthu Kumar Chandrasekaran, Michihiro Yasunaga, Dragomir Radev, Dayne Freitag and Min-Yen Kan
SIGIR 2019, BIRNDL Workshop.
[paper] [bib] [project page]

2018

SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-Domain Text-to-SQL Task
Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li and Dragomir Radev
EMNLP 2018.
[paper] [bib] [code]
Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang and Dragomir Radev
EMNLP 2018.
[paper] [bib] [blog] [dataset & leaderboard]
Neural Coreference Resolution with Deep Biaffine Attention by Joint Mention Detection and Mention Clustering
Rui Zhang, Cicero Nogueira dos Santos, Michihiro Yasunaga, Bing Xiang and Dragomir Radev
ACL 2018.
[paper] [bib]
Robust Multilingual Part-of-Speech Tagging via Adversarial Training

2017

Graph-based Neural Multi-Document Summarization
Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan and Dragomir Radev
CoNLL 2017.
[paper] [bib]