QuoteAI — Performance & Cost Backlog
Status: 🟢 Active (rolling) Authors: Dan Hannah & Clay Created: 2026-04-22 Parent: QuoteAI Project Design Doc
Purpose
Lightweight backlog of AI performance + cost tactical wins. Too small individually for epic docs; tracked here with sizing and status.
Target metric: per-quote cost. Currently ~$0.20-0.25. Aim: sub-$0.05 without quality regression.
Active Items
| # | Item | Est. effort | Impact | Status |
|---|---|---|---|---|
| 1 | Prompt caching — cache_control: ephemeral on system prompt in app/lib/cc/generate.ts | 1 day | ~60% cost cut per run ($0.13-0.17 saved) | 🔄 Next |
| 2 | Template boilerplate server-side — terms, capability pitch, exclusions emitted as <<TERMS>> placeholders, stitched in adapter | 1 day | ~4% cost, 15-25s wall time, removes drift | ⏳ Pending |
| 3 | Parallel tool calls — system prompt update: "call all per-item searches in one turn" | Half day | 20-40s wall time on multi-line quotes | ⏳ Pending |
| 4 | Pre-fetch at form submit — customer lookup + past-quote search kick off immediately | Half day | Hides latency behind user intent | ⏳ Pending |
| 5 | Smaller model (Haiku) for boilerplate stitch-in (if #2 doesn't cover) | 1 day | 3-5× faster + cheaper on those sections | ⏳ Pending |
Measurements
Each item should land with a before/after measurement:
- Cost per quote (input + output tokens × rates, adjusted for cache reads/writes)
- Wall time (form submit → draft rendered)
- Cache hit rate (for #1)
Log results inline in the item row once shipped.
Shipped
(empty)
Related Files
app/lib/cc/generate.ts— Agent SDK query options (touch for #1)app/lib/cc/adapter.ts— markdown → RenderMeta (touch for #2)app/components/streaming/StreamView.tsx— display cadence knobs (UX-adjacent, not cost)
Intentionally lightweight. Graduate items to their own docs if they grow beyond "tactical tweak."