| Agent Risk Searching for Privacy Risks in LLM Agents via Simulation Yanzhe Zhang,Diyi Yang ICLR, 2026 [code] |
| Evaluation AutoMetrics: Approximate Human Judgements with Automatically Generated Evaluators Michael J. Ryan,Yanzhe Zhang,Amol Salunkhe,Yi Chu,Di Xu,Diyi Yang ICLR, 2026 [code] |
| Agent Real-Time Reasoning Agents in Evolving Environments Yule Wen*,Yixin Ye*,Yanzhe Zhang,Diyi Yang,Hao Zhu ICLR, 2026 [code] |
| Agent Evaluation Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents Bowen Wang,Xinyuan Wang,Jiaqi Deng,Tianbao Xie,Ryan Li,Yanzhe Zhang,Junli Wang,Dunjie Lu,Zicheng Gong,Gavin Li,Toh Jing Hua,Wei-Lin Chiang,Ion Stoica,Diyi Yang,Yu Su,Yi Zhang,Zhiguo Wang,Victor Zhong,Tao Yu ICLR, 2026 [code] |
| Agent Generative Interfaces for Language Models Jiaqi Chen*,Yanzhe Zhang*,Yutong Zhang,Yijia Shao,Diyi Yang ACL (Findings), 2026 [code] [website] |
| Agent Training SWE-smith: Scaling Data for Software Engineering Agents John Yang,Kilian Lieret,Carlos E. Jimenez,Alexander Wettig,Kabir Khandpur,Yanzhe Zhang,Binyuan Hui,Ofir Press,Ludwig Schmidt,Diyi Yang NeurIPS Datasets & Benchmarks, 2025 [website] |
| Agent Risk Attacking Vision-Language Computer Agents via Pop-ups Yanzhe Zhang,Tao Yu,Diyi Yang ACL, 2025 [code] |
| Training Distilling an End-to-End Voice Assistant from Speech Recognition Data Will Held,Yanzhe Zhang,Ella Li,Weiyan Shi,Michael Ryan,Diyi Yang ACL, 2025 [website][training code][eval code] |
| Agent Evaluation Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping Ryan Li,Yanzhe Zhang,Diyi Yang NAACL, 2025 [website][code] |
| Agent Evaluation Design2Code: How Far Are We From Automating Front-End Engineering? Chenglei Si*, Yanzhe Zhang* ,Ryan Li,Zhengyuan Yang,Ruibo Liu,Diyi Yang NAACL, 2025 [website][code][data] |
| Agent Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Evaluation Zijun Liu, Yanzhe Zhang , Peng Li,Yang Liu,Diyi Yang COLM, 2024 [code] |
| Risk Auditing Gender Presentation Differences in Text-to-Image Models Yanzhe Zhang , Lu Jiang,Greg Turk,Diyi Yang EAAMO, 2024 [website][code][data] |
| Training TRINS: Towards Multimodal Language Models that Can Read Ruiyi Zhang,Yanzhe Zhang,Jian Chen,Yufan Zhou,Jiuxiang Gu,Changyou Chen,Tong Sun CVPR, 2024 |
| Training Enhanced Visual Instruction Tuning for Text-rich Image Understanding Yanzhe Zhang , Ruiyi Zhang,Jiuxiang Gu,Yufan Zhou,Nedim Lipka,Diyi Yang,Tong Sun NeurIPS Workshop on Instruction Tuning and Instruction Following, 2023 [website][code][data] |
| Training Risk Robustness of Demonstration-based Learning Under Limited Data Scenario Hongxin Zhang, Yanzhe Zhang , Ruiyi Zhang,Diyi Yang EMNLP, 2022 [code] |
| Training Continual Sequence Generation with Adaptive Compositional Modules Yanzhe Zhang , Xuezhi Wang,Diyi Yang ACL, 2022 [code] |
| Training Continual Learning for Text Classification with Information Disentanglement Based Regularization Yufan Huang*, Yanzhe Zhang* , Jiaao Chen,Xuezhi Wang,Diyi Yang NAACL, 2021 [code] |