SWE-bench Lite (original) (raw)

Overview

SWE-bench Lite provides a smaller, carefully selected subset of 300 tasks from the full benchmark, designed to:

The 300 tasks were selected to preserve the distribution and difficulty spectrum of the original benchmark while focusing on more self-contained, functional bug fixes.

While the full SWE-bench test split comprises 2,294 issue-commit pairs across 12 Python repositories, SWE-bench Lite covers 11 of the original 12 repositories with a similar diversity and distribution. We also provide 23 development instances that can be useful for active development on the SWE-bench task.

We recommend future systems evaluating on SWE-bench to report numbers on SWE-bench Lite in lieu of the full SWE-bench set when compute efficiency is a concern.


Selection Criteria

SWE-bench Lite instances were selected using the following criteria:

The source code for how SWE-bench Lite was created is available in SWE-bench/swebench/collect/make_lite.


Repository Distribution

SWE-bench Lite distribution across repositories. Compare to the full SWE-bench in Figure 3 of the SWE-bench paper.

SWE-bench Lite repository distribution


Baseline Performance

SWE-bench Lite performance for our baselines. Compare to the full SWE-bench baseline performance in Table 5 of the SWE-bench paper.

SWE-bench Lite baseline performance comparison


Resources

SWE-bench Lite datasets:


Citation

If you use SWE-bench in your research, please cite our paper:

@inproceedings{ jimenez2024swebench, title={{SWE}-bench: Can Language Models Resolve Real-world Github Issues?}, author={Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=VTF8yNQM66} }

Jimenez, C. E., Wettig, A., Yao, S., Yang, J., Pei, K., Jain, S., Press, O., & Narasimhan, K. (2024). SWE-bench: Can Language Models Resolve Real-world Github Issues? arXiv preprint arXiv:2310.06770.

Jimenez, Carlos E., et al. "SWE-bench: Can Language Models Resolve Real-world Github Issues?" arXiv preprint arXiv:2310.06770 (2023).