Swirlsmith
Back to leaderboard
#14

Copilot Workspace

๐Ÿ”’ Closed

GitHub-integrated AI coding environment with Codespaces, PR-native workflow, and repository-wide context.

๐Ÿ’ฐ Included with GitHub subscription ยท web, vscode

60.7Overall Score
Non-Gameable Scoring

Scores are derived from established benchmarks, adjusted for harness-specific performance across four dimensions: Coding, Reasoning, Tool Use, and Autonomy.

Each dimension starts from public benchmark data and applies harness-specific modifiers based on tool integration, context handling, and orchestration quality. The overall score is a weighted composite that penalizes narrow optimization.

ModelOverall
Claude Opus 4.660.7
Gemini 3.1 Pro51.5
Claude Sonnet 4.644.4
GPT-5.444.0
Gemini 3 Pro34.9
Gemini 3 Flash31.6
GPT-5.230.6
Grok 330.1
Claude Sonnet 4.528.8
GPT-5 (high)28.8