#agent-evaluation

3 skills are using this tag

code-reviewer

hidai25 / eval-view

Catch AI agent regressions before you ship. YAML test cases, golden baselines, execution tracing, cost tracking, CI integration. LangGraph, CrewAI, Anthropic, OpenAI.

agentagent-benchmarkagent-evaluation+16

agent-eval-harness

plaited / agent-eval-harness

Evaluate AI agents with Unix-style pipeline commands. Schema-driven adapters for any CLI agent, trajectory capture, pass@k metrics, and multi-run comparison.

agent-comparisonagent-evaluationai-agents+11

headless-adapters