Project2026Live

Agent Skills Eval

TypeScript SDK and CLI for evaluating agentskills.io-style AI agent skills with LLM judges, YAML suites, JSONL logs, and HTML reports.

Updated May 7, 2026
AI Agents
LLM Evals

Visit product ↗

About

TypeScript SDK and CLI for evaluating agentskills.io-style AI agent skills with LLM judges, YAML suites, JSONL logs, and HTML reports.

agent-skills-eval is a TypeScript SDK and CLI for running repeatable evaluations against agentskills.io-style AI agent skills. It gives each skill a YAML-defined test suite, sends the task through an OpenAI-compatible judge, and records the result as structured JSONL plus a readable HTML report.

The project is useful when skill prompts, instructions, or tool workflows start becoming product surface area. Instead of relying on a few manual spot checks, it lets me compare skill behavior across changes, models, and baselines with enough structure to catch regressions.

The current npm package includes the CLI, config helpers, reporter utilities, and provider adapters, with docs published through GitHub Pages.

Gallery

Agent Skills Eval cover showing an AI agent skill evaluation workflow

Details

Tags: AI AgentsLLM EvalsTypeScriptCLInpmOpen sourceYAMLJSONLAgent evaluation CLI
GitHub: darkrishabh/agent-skills-eval
npm: agent-skills-eval@0.1.1
Docs: GitHub Pages documentation
Install: npx agent-skills-eval
Stack: TypeScript, Commander, js-yaml, OpenAI-compatible providers
License: MIT
Year: 2026
Last updated: May 7, 2026
Product URL: Open link ↗