Swirlsmith
Back to leaderboard
#15

Codex CLI

๐Ÿ”’ Closed

OpenAI's open-source terminal agent. Lightweight, sandboxed execution, multi-model support via OpenAI-compatible APIs.

๐Ÿ’ฐ Included with OpenAI subscription ยท cli, api

51.1Overall Score
Non-Gameable Scoring

Scores are derived from established benchmarks, adjusted for harness-specific performance across four dimensions: Coding, Reasoning, Tool Use, and Autonomy.

Each dimension starts from public benchmark data and applies harness-specific modifiers based on tool integration, context handling, and orchestration quality. The overall score is a weighted composite that penalizes narrow optimization.

ModelOverall
GPT-5.451.1
GPT-5.235.6
GPT-5 (high)33.1
GPT-oss 120B31.2
GPT-5 mini30.5
GPT-5 (medium)28.3
GPT-5.125.3
o324.7
GPT-5.1 Thinking21.7
GPT-5.1 (high)20.8