GitHub - ai-agents-2030/SPA-Bench: SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation (original) (raw)

🌿πŸͺ‘ SPA-Bench: A Comprehensive Benchmark for Smartphone Agent Evaluation

Website β€’Paper

πŸ‘‹ Welcome to the SPA-Bench repository, a benchmark designed to evaluate the performance of smartphone agents. This project offers a structured approach to assessing the efficiency, robustness, and accuracy of various smartphone agents across a variety of scenarios and conditions.

πŸ“’ News

⏩ Quick Start

πŸ› οΈ Installation

git clone --recurse-submodules https://github.com/ai-agents-2030/SPA-Bench.git

πŸ“œ Documentation

πŸ’‘ About SPA-Bench

SPA-Bench provides a thorough evaluation framework for smartphone agents, covering key metrics and test scenarios that reflect real-world usage patterns and challenges. This benchmark supplies essential tools and datasets to support consistent evaluation of agent performance across a wide range of tasks and applications.

Overview

πŸ’¬ Core Features

πŸ“‹ Diverse and Realistic Task Design

πŸ€– Plug-and-Play Agent Framework

βœ… Automatic and Scalable Evaluation Pipeline

πŸš€ Coming Soon

πŸ™Œ Citation

@inproceedings{chen2025spabench, title={SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation}, author={Jingxuan Chen and Derek Yuen and Bin Xie and Yuhao Yang and Gongwei Chen and Zhihao Wu and Li Yixing and Xurui Zhou and Weiwen Liu and Shuai Wang and Kaiwen Zhou and Rui Shao and Liqiang Nie and Yasheng Wang and Jianye HAO and Jun Wang and Kun Shao}, booktitle={The Thirteenth International Conference on Learning Representations}, year={2025}, }