Jixuan Chen (original) (raw)

Jixuan Chen Hi there! I am a 4-th year undergraduate student at Nanjing University, majoring in Software Engineering. Currently I'm working as a Research Assistant at XLANG Lab (as part of the HKU NLP Group) with Prof. Tao Yu. Before that, I was honoured to work with Prof. Shujian Huang at Nanjing University. My research interests lie in ML and NLP. Nowadays, I am working on Computer Use Multimodal LLM agents and Code Generation. Here is the problem I am thinking about: How to design smarter GUI agents with reinforcement learning (RL) and large language models (LLM)? I am looking for a Ph.D. position starting in 2025 Fall. Please feel free to reach out! Email / Semantic Scholar / Google Scholar / Twitter / Github profile photo

News

Selected Publications

* denotes equal contribution.

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows Fangyu Lei*,Jixuan Chen*,Yuxiao Ye,Ruisheng Cao,Dongchan Shin,Hongjin Su,Zhaoqing Suo,Hongcheng Gao,Wenjing Hu,Pengcheng Yin,Victor Zhong,Ruoxi Sun,Qian Liu,Sida Wang,Tao Yu Website /Paper /Data /Code GitHub stars ICLR'25, Oral Presentation Spider 2.0 is an evaluation framework comprising 627 real-world text-to-SQL workflow problems derived from enterprise-level database use cases.
Your browser does not support the video tag. OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Tianbao Xie,Danyang Zhang,Jixuan Chen,Xiaochuan Li,Siheng Zhao,Ruisheng Cao,Toh Jing Hua,Zhoujun Cheng,Dongchan Shin,Fangyu Lei,Yitao Liu,Yiheng Xu,Shuyan Zhou,Silvio Savarese,Caiming Xiong,Victor Zhong,Tao Yu Website /Paper /Slides /Data Viewer /Code GitHub stars NeurIPS'24 D&B Track OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS.
COMMA : A Communicative Multimodal Multi-Agent Benchmark Tim Ossowski*,Jixuan Chen*,Danyal Maqbool,Zefan Cai,Tyler Bradshaw,Junjie Hu Website /Paper /Code Preprint'24 (Under Review) COMMA is a novel benchmark designed to evaluate the collaborative performance of multimodal multi-agent systems through language communication.
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Ruisheng Cao,Fangyu Lei,Haoyuan Wu,Jixuan Chen,Yeqiao Fu,Hongcheng Gao,Xinzhuang Xiong,Hanchong Zhang,Yuchen Mao,Wenjing Hu,Tianbao Xie,Hongsheng Xu,Danyang Zhang,Sida Wang,Ruoxi Sun,Pengcheng Yin,Caiming Xiong,Ansong Ni,Qian Liu,Victor Zhong,Lu Chen,Kai Yu,Tao Yu Website /Paper /Data Viewer /Code GitHub stars NeurIPS'24 D&B Track, Spotlight Presentation Spider2-V is a multimodal agent benchmark spanning across the entire data science and engineering workflow.

Education

Nanjing University 2021.09 - 2025.07 (Expected) B.E. in Software Engineering GPA: 91.60 / 100.0 (4.58/5.00) Overall Academic Ranking: 1/259
The Hong Kong University of Science and Technology 2024.01 - 2024.05 Exchange Student with a full scholarship

Academic Experiences

XLANG Lab @ HKU Research Intern 2023.08 - present Conducted research on various topics including executable language grounding, tool usage, code generation and multimodal LLMs. Advisor: Prof. Tao Yu

Services

Honors & Awards

Miscellanea