Swirlsmith
Back to leaderboard
#5

Hermes Agent

๐Ÿ”“ Open

Multi-modal agentic harness with persistent memory, MCP tool orchestration, sub-agent delegation, and autonomous task execution.

๐Ÿ’ฐ Open-source, self-hostable ยท cli, api, web

72.6Overall Score
Non-Gameable Scoring

Scores are derived from established benchmarks, adjusted for harness-specific performance across four dimensions: Coding, Reasoning, Tool Use, and Autonomy.

Each dimension starts from public benchmark data and applies harness-specific modifiers based on tool integration, context handling, and orchestration quality. The overall score is a weighted composite that penalizes narrow optimization.

ModelOverall
Claude Opus 4.672.6
Gemini 3.1 Pro64.6
Kimi K2.562.1
GPT-5.458.1
Claude Sonnet 4.657.3
MiniMax-M2.755.0
DeepSeek R147.9
Qwen 3.547.4
Gemini 3 Pro45.9
MiMo-V2-Flash43.0