Aider Polyglot

Aider Polyglot: Aider Polyglot is a multi-language coding benchmark, originated by the Aider open-source project, that evaluates an AI agent's ability to satisfy hidden tests across Exercism-style problems in roughly half a dozen languages — typically Python, JavaScript, Go, Rust, C++, and Java. Each task hands the agent a problem statement and stub files; success requires producing a patch that, when applied, passes the hidden test file on the first run. Because tasks span multiple languages, the benchmark surfaces cross-language edit accuracy and instruction-following in a way that single-language Python benchmarks like SWE-Bench cannot. It is one of the standard reference benchmarks cited in coding-agent evaluations alongside SWE-Bench Verified and Terminal-Bench.

Example

An agent receives an Exercism-style task in Rust along with stub files and a hidden test file. It must produce a patch that compiles cleanly and passes every assertion when the test file is run. The harness repeats this across hundreds of tasks spanning all supported languages, then reports pass-rate per language and overall. A model that excels in Python but stumbles on Rust borrow-checker edits shows that asymmetry directly in the per-language breakdown.

Related Guide

Continue learning with our in-depth guide

Read guide →

Related Resources

Blog Post

AI Coding Agent Evals: SWE-Bench, Aider Polyglot, Terminal-Bench (2026)

What SWE-Bench, Aider Polyglot, and Terminal-Bench actually measure, where public benchmarks mislead, and how to build internal evals that map to your codebase.

Put this into practice

Build polished, copy-ready prompts in under 60 seconds with SurePrompts.

Try SurePrompts

Example

Related Terms

Related Resources

AI Coding Agent Evals: SWE-Bench, Aider Polyglot, Terminal-Bench (2026)

Put this into practice