QuoteAI — Performance & Cost Backlog

Status: 🟢 Active (rolling) Authors: Dan Hannah & Clay Created: 2026-04-22 Parent: QuoteAI Project Design Doc

Purpose

Lightweight backlog of AI performance + cost tactical wins. Too small individually for epic docs; tracked here with sizing and status.

Target metric: per-quote cost. Currently ~$0.20-0.25. Aim: sub-$0.05 without quality regression.

#	Item	Est. effort	Impact	Status
1	Prompt caching — `cache_control: ephemeral` on system prompt in `app/lib/cc/generate.ts`	1 day	~60% cost cut per run ($0.13-0.17 saved)	🔄 Next
2	Template boilerplate server-side — terms, capability pitch, exclusions emitted as `<<TERMS>>` placeholders, stitched in adapter	1 day	~4% cost, 15-25s wall time, removes drift	⏳ Pending
3	Parallel tool calls — system prompt update: "call all per-item searches in one turn"	Half day	20-40s wall time on multi-line quotes	⏳ Pending
4	Pre-fetch at form submit — customer lookup + past-quote search kick off immediately	Half day	Hides latency behind user intent	⏳ Pending
5	Smaller model (Haiku) for boilerplate stitch-in (if #2 doesn't cover)	1 day	3-5× faster + cheaper on those sections	⏳ Pending

Each item should land with a before/after measurement:

Cost per quote (input + output tokens × rates, adjusted for cache reads/writes)
Wall time (form submit → draft rendered)
Cache hit rate (for #1)

Log results inline in the item row once shipped.

(empty)

app/lib/cc/generate.ts — Agent SDK query options (touch for #1)
app/lib/cc/adapter.ts — markdown → RenderMeta (touch for #2)
app/components/streaming/StreamView.tsx — display cadence knobs (UX-adjacent, not cost)

Intentionally lightweight. Graduate items to their own docs if they grow beyond "tactical tweak."