#13

Cursor

🔑 Semi-Open

AI-first code editor with inline completions, multi-file editing, and codebase-aware context via embeddings.

💰 $20/mo Pro, $40/mo Enterprise · vscode, web, mobile

62.0Overall Score

Non-Gameable Scoring

Scores are derived from established benchmarks, adjusted for harness-specific performance across four dimensions: Coding, Reasoning, Tool Use, and Autonomy.

Each dimension starts from public benchmark data and applies harness-specific modifiers based on tool integration, context handling, and orchestration quality. The overall score is a weighted composite that penalizes narrow optimization.

Model	Overall	Coding	Reasoning	Tool Use	Autonomy
Claude Opus 4.6	62.0	75.99	76.14	39.88	51.27
Kimi K2.5	52.8	76.04	64.95	23.09	42.56
Gemini 3.1 Pro	52.6	81.24	73.07	34.36	42.12
Claude Sonnet 4.6	45.8	70.15	47.09	37.18	49.18
GPT-5.4	45.2	48.42	56.28	49.36	45.63
MiniMax-M2.7	44.5	63.79	43.3	37.67	44.32
DeepSeek R1	41.7	73.24	43.29	25.19	30.91
Qwen 3.5	40.5	53.17	43.69	17.16	31.77
Gemini 3 Pro	35.8	44.77	47.95	27	32.54
MiMo-V2-Flash	35.8	65.62	40.06	21.19	26.34