Swirlsmith
Back to leaderboard
#6

Devin

Fully autonomous AI software engineer with sandboxed environment, browser access, and end-to-end task completion.

82.9+0.5% (24h)
Non-Gameable Scoring

Scores are derived from established benchmarks, adjusted for harness-specific performance across four dimensions: Coding, Reasoning, Tool Use, and Autonomy.

Each dimension starts from public benchmark data and applies harness-specific modifiers based on tool integration, context handling, and orchestration quality. The overall score is a weighted composite that penalizes narrow optimization.

ModelOverall
Devin Core v284.2
Claude Sonnet 4.682.1