Prompt
Eval Harness

Every version of the production prompt, tested against assertions on each run. A prompt edit ships only when the numbers hold. Click a version to see its per-check pass rates and what failed.

last run ok

Prompt

icp-classifier

Cases

12 gold-labeled

Checks

assert · match · judge

Latest

v6 — 97%

Generated

2026-06-10

UNDER TEST ▸ The icp-classifier prompt that the lead-scoring pipeline runs in production, tested across versions against the same real YC Spring '26 companies.

FIG.2 · prompt eval / rev.A. The icp-classifier prompt tested across versions against the same real YC Spring '26 companies scored on Unit MO-1. Checks and inputs are real and runnable. Pass rates are this run's output, and the runner does not retry, so these are first-shot numbers.

view raw json ↗

PromptEval Harness

Prompt
Eval Harness