Swirlsmith
Back to leaderboard
#10

Aider

๐Ÿ”“ Open

Terminal-native pair programming tool with git integration, repo-map understanding, and multi-file editing.

๐Ÿ’ฐ Open-source, terminal-native ยท cli, api

68.4Overall Score
Non-Gameable Scoring

Scores are derived from established benchmarks, adjusted for harness-specific performance across four dimensions: Coding, Reasoning, Tool Use, and Autonomy.

Each dimension starts from public benchmark data and applies harness-specific modifiers based on tool integration, context handling, and orchestration quality. The overall score is a weighted composite that penalizes narrow optimization.

ModelOverall
Claude Opus 4.668.4
Gemini 3.1 Pro58.5
Kimi K2.558.0
Claude Sonnet 4.652.4
GPT-5.452.0
MiniMax-M2.750.8
DeepSeek R145.7
Qwen 3.543.3
Gemini 3 Pro40.3
MiMo-V2-Flash40.0