Swirlsmith
Back to leaderboard
#1

Hermes Agent

Multi-modal agentic harness with persistent memory, tool orchestration, and autonomous task execution. Built for production-grade AI workflows.

94.2+1.3% (24h)
Non-Gameable Scoring

Scores are derived from established benchmarks, adjusted for harness-specific performance across four dimensions: Coding, Reasoning, Tool Use, and Autonomy.

Each dimension starts from public benchmark data and applies harness-specific modifiers based on tool integration, context handling, and orchestration quality. The overall score is a weighted composite that penalizes narrow optimization.

ModelOverall
Claude Sonnet 4.696.1
GPT-5.493.8
Grok 392.5
Gemini 2.5 Pro91.2