SWE-bench Lite (original) (raw)

Overview

SWE-bench Lite provides a smaller, carefully selected subset of 300 tasks from the full benchmark, designed to:

Reduce evaluation costs while maintaining benchmark quality
Enable faster iteration cycles for model development
Provide a more accessible entry point for research groups

The 300 tasks were selected to preserve the distribution and difficulty spectrum of the original benchmark while focusing on more self-contained, functional bug fixes.

While the full SWE-bench test split comprises 2,294 issue-commit pairs across 12 Python repositories, SWE-bench Lite covers 11 of the original 12 repositories with a similar diversity and distribution. We also provide 23 development instances that can be useful for active development on the SWE-bench task.

We recommend future systems evaluating on SWE-bench to report numbers on SWE-bench Lite in lieu of the full SWE-bench set when compute efficiency is a concern.

Selection Criteria

SWE-bench Lite instances were selected using the following criteria:

Removed instances with images, external hyperlinks, references to specific commit SHAs and references to other pull requests or issues
Removed instances with fewer than 40 words in the problem statement
Removed instances that edit more than 1 file
Removed instances where the gold patch has more than 3 edit hunks
Removed instances that create or remove files
Removed instances that contain tests with error message checks
Finally, sampled 300 test instances and 23 development instances from the remaining candidates

The source code for how SWE-bench Lite was created is available in SWE-bench/swebench/collect/make_lite.

Repository Distribution

SWE-bench Lite distribution across repositories. Compare to the full SWE-bench in Figure 3 of the SWE-bench paper.

SWE-bench Lite repository distribution

Baseline Performance

SWE-bench Lite performance for our baselines. Compare to the full SWE-bench baseline performance in Table 5 of the SWE-bench paper.

SWE-bench Lite baseline performance comparison

Resources

SWE-bench Lite datasets:

Citation

If you use SWE-bench in your research, please cite our paper:

@inproceedings{ jimenez2024swebench, title={{SWE}-bench: Can Language Models Resolve Real-world Github Issues?}, author={Carlos E Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik R Narasimhan}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=VTF8yNQM66} }

Jimenez, C. E., Wettig, A., Yao, S., Yang, J., Pei, K., Jain, S., Press, O., & Narasimhan, K. (2024). SWE-bench: Can Language Models Resolve Real-world Github Issues? arXiv preprint arXiv:2310.06770.

Jimenez, Carlos E., et al. "SWE-bench: Can Language Models Resolve Real-world Github Issues?" arXiv preprint arXiv:2310.06770 (2023).