GitHub - xlang-ai/computer-agent-arena: [ICLR 2026] Computer Agent Arena: Toward Human-Centric Evaluation and Analysis of Computer-Use Agents (original) (raw)

Toward Human-Centric Evaluation and Analysis of Computer-Use Agents

🌐 Website | 📑 Paper (ICLR 2026) | 🏆 Leaderboard | 📝 Blog | 🤝 Contributing

ICLR 2026 License Python Node Contributions


Introduction

Computer Agent Arena is an open, crowdsourced evaluation platform for benchmarking computer-use agents (CUAs) on real-world tasks. Users interact with two AI agents side-by-side on live desktop environments (Ubuntu / Windows) and vote for the better one — producing human preference data at scale that powers a continuously updated ELO leaderboard.

This repository releases the full platform stack: backend server, frontend UI, agent hub, and deployment infrastructure.


Platform Overview

Layer Technology
Frontend React 18, TypeScript, Ant Design, Tailwind CSS, Socket.IO
Backend Python, Flask, Flask-SocketIO
Database PostgreSQL, Redis
Infrastructure AWS EC2 (multi-region), S3
Auth Google OAuth 2.0, JWT, Anonymous access, Prolific

Supported Agents

Model Organization
GPT-4.1, GPT-5 OpenAI
Computer-Use-Preview OpenAI
Claude 3.7 / 4 Sonnet, Claude Sonnet 4.5 Anthropic
Gemini 2.5 Pro Google
Qwen2.5-VL-72B Alibaba
UI-TARS-1.5 ByteDance
OpenCUA XLang Lab
CoAct

🚀 Quick Start

Prerequisites

1. Clone

git clone https://github.com/xlang-ai/computer-agent-arena.git cd computer-agent-arena

2. Backend

pip install -r backend/requirements.txt

Create a .env file with your database, Redis, AWS, API keys, and auth credentials. Configure config.yaml:

deployment: local # or 'aws'

Start the server:

python -m backend.main # listens on :8181

3. Frontend

cd frontend npm install npm start # dev server on :3000


Adding a New Agent

  1. Create backend/agents/hub/YourAgent/ and implement a class extending BaseAgent.
  2. Register the model and method name in config.yaml under AVAILABLE_AGENT_OPTIONS.
  3. Test with python backend/agents/test/test_agents.py.

See existing implementations in backend/agents/hub/ (Anthropic, OpenAICUA, UI_TARS, OpenCUA, coact) for reference.


Repository Structure

computer-agent-arena/
├── backend/
│   ├── main.py              # Entry point
│   ├── agents/              # Agent hub + base classes
│   │   └── hub/             # Per-model implementations
│   ├── api/                 # WebSocket / REST handlers
│   ├── desktop_env/         # VM abstraction (AWS, VMware, ...)
│   └── utils/               # DB, S3, socket utilities
├── frontend/                # React 18 + TypeScript UI
│   └── src/
└── config.yaml              # Deployment configuration

License

MIT License — see LICENSE for details.