SWE-bench Lite is a challenging benchmark that evaluates AI systems on real-world GitHub issues from popular open-source Python projects. Each task requires understanding complex codebases, diagnosing bugs, implementing fixes, and passing test suites — mirroring the daily work of professional software engineers.
Highest score among all evaluated systems on SWE-bench Lite
Accessible to individuals and enterprises worldwide
Driving broader adoption of AI programming across society