Swirlsmith
Back to leaderboard
#3

Devin

๐Ÿ”“ Open

Fully autonomous AI software engineer with sandboxed environment, browser access, and end-to-end task completion.

๐Ÿ’ฐ Free tier + $20/mo pay-as-you-go ยท web, api

74.7Overall Score
Non-Gameable Scoring

Scores are derived from established benchmarks, adjusted for harness-specific performance across four dimensions: Coding, Reasoning, Tool Use, and Autonomy.

Each dimension starts from public benchmark data and applies harness-specific modifiers based on tool integration, context handling, and orchestration quality. The overall score is a weighted composite that penalizes narrow optimization.

ModelOverall
Claude Opus 4.674.7
Gemini 3.1 Pro66.7
Kimi K2.564.1
GPT-5.459.6
Claude Sonnet 4.658.6
MiniMax-M2.756.4
DeepSeek R150.2
Qwen 3.549.1
Gemini 3 Pro47.5
MiMo-V2-Flash44.5