I'm a marketing and revenue operations professional who builds with AI. I put LLMs inside GTM pipelines to do work teams usually do by hand. The two builds below run on real data: a pipeline that scores a company list against an ICP, and an eval harness that tests the prompts behind it. You can open the prompts and the raw run data yourself. Available now.
PROOF ▸ Two systems that actually run on real data. Every score, prompt, and raw output is viewable.
Pulls a company list and scores every row against an ICP, then hands back a ranked outbound queue. Each score carries a one-line reason taken from the company's own description. Runs on the real YC Spring '26 batch.
GTM Engineer View build → EV-1Treats prompts like code. Every prompt has a test suite, and a change ships only when its pass rate holds against the last version. Checks run three ways: plain assertions, exact-match on a labeled set, and a model judging the fuzzy stuff.
AI Enablement View build →Double Bill — a streaming-rotation calculator. Drop your Letterboxd watchlist and it works out the one, two, and three services that actually cover it. Exact set-cover math, posters and all. Try it →
Old Flames — scrobble archaeology. Type your Last.fm username and it digs up the songs you had on repeat but haven't played in years, filed by era. The rule sits on the page; change it and everything recounts. Try it →
Send me a company list and I'll run it through the same pipeline and send back the scored ICP queue, real output on your own accounts. Hiring for MarOps / RevOps / GTM engineering? Same address gets you my resume.
annieyepes@gmail.com →