agent-skills-eval is a TypeScript SDK and CLI for running repeatable evaluations against agentskills.io-style AI agent skills. It gives each skill a YAML-defined test suite, sends the task through an OpenAI-compatible judge, and records the result as structured JSONL plus a readable HTML report.
The project is useful when skill prompts, instructions, or tool workflows start becoming product surface area. Instead of relying on a few manual spot checks, it lets me compare skill behavior across changes, models, and baselines with enough structure to catch regressions.
The current npm package includes the CLI, config helpers, reporter utilities, and provider adapters, with docs published through GitHub Pages.
