Foundry Foundry

QuoteAI — Performance & Cost Backlog

Status: 🟢 Active (rolling) Authors: Dan Hannah & Clay Created: 2026-04-22 Parent: QuoteAI Project Design Doc


Purpose

Lightweight backlog of AI performance + cost tactical wins. Too small individually for epic docs; tracked here with sizing and status.

Target metric: per-quote cost. Currently ~$0.20-0.25. Aim: sub-$0.05 without quality regression.

Active Items

#ItemEst. effortImpactStatus
1Prompt cachingcache_control: ephemeral on system prompt in app/lib/cc/generate.ts1 day~60% cost cut per run ($0.13-0.17 saved)🔄 Next
2Template boilerplate server-side — terms, capability pitch, exclusions emitted as <<TERMS>> placeholders, stitched in adapter1 day~4% cost, 15-25s wall time, removes drift⏳ Pending
3Parallel tool calls — system prompt update: "call all per-item searches in one turn"Half day20-40s wall time on multi-line quotes⏳ Pending
4Pre-fetch at form submit — customer lookup + past-quote search kick off immediatelyHalf dayHides latency behind user intent⏳ Pending
5Smaller model (Haiku) for boilerplate stitch-in (if #2 doesn't cover)1 day3-5× faster + cheaper on those sections⏳ Pending

Measurements

Each item should land with a before/after measurement:

  • Cost per quote (input + output tokens × rates, adjusted for cache reads/writes)
  • Wall time (form submit → draft rendered)
  • Cache hit rate (for #1)

Log results inline in the item row once shipped.

Shipped

(empty)

  • app/lib/cc/generate.ts — Agent SDK query options (touch for #1)
  • app/lib/cc/adapter.ts — markdown → RenderMeta (touch for #2)
  • app/components/streaming/StreamView.tsx — display cadence knobs (UX-adjacent, not cost)

Intentionally lightweight. Graduate items to their own docs if they grow beyond "tactical tweak."

Review

🔒

Enter your access token to view annotations