← Projects
Agent Skills Eval thumbnail showing the evaluation CLI and product title
Project2026Live

Agent Skills Eval

TypeScript SDK and CLI for evaluating agentskills.io-style AI agent skills with LLM judges, YAML suites, JSONL logs, and HTML reports.

  • Updated May 7, 2026
  • AI Agents
  • LLM Evals
Visit product ↗

About

TypeScript SDK and CLI for evaluating agentskills.io-style AI agent skills with LLM judges, YAML suites, JSONL logs, and HTML reports.

agent-skills-eval is a TypeScript SDK and CLI for running repeatable evaluations against agentskills.io-style AI agent skills. It gives each skill a YAML-defined test suite, sends the task through an OpenAI-compatible judge, and records the result as structured JSONL plus a readable HTML report.

The project is useful when skill prompts, instructions, or tool workflows start becoming product surface area. Instead of relying on a few manual spot checks, it lets me compare skill behavior across changes, models, and baselines with enough structure to catch regressions.

The current npm package includes the CLI, config helpers, reporter utilities, and provider adapters, with docs published through GitHub Pages.

Gallery

Details

Tags
Install
npx agent-skills-eval
Stack
TypeScript, Commander, js-yaml, OpenAI-compatible providers
License
MIT
Year
2026
Last updated
Product URL
Open link ↗

© 2026 Rishabh Mehan · All rights reserved · Built with Next.js and a little stubbornness.