← Index
ay@marops:~$ ./run evals --all
Unit EV-1 / Eval Harness

Prompt
Eval Harness

Every version of the production prompt, tested against assertions on each run. A prompt edit ships only when the numbers hold. Click a version to see its per-check pass rates and what failed.

last run ok
Prompt
icp-classifier
Cases
12 gold-labeled
Checks
assert · match · judge
Latest
v6 — 97%
Generated
2026-06-10

UNDER TEST ▸ The icp-classifier prompt that the lead-scoring pipeline runs in production, tested across versions against the same real YC Spring '26 companies.

FIG.2 — prompt eval / rev.A — The icp-classifier prompt tested across versions against the same real YC Spring '26 companies scored on Unit MO-1. Checks and inputs are real and runnable; pass rates are this run's output. The runner does not retry, so these are first-shot numbers.

view raw json ↗