QuoteAI — Claude Chat Spike Retro

Branch: spike/claude-chat Built: 2026-05-04 evening Status: 🟢 Working end-to-end. Demo-readiness call below.

What we built

A right-side slide-out chat panel mounted in the root layout. Authentic Claude (no rebrand) with access to Brehob's quote corpus via the existing @quoteai/quotes + @quoteai/equipment MCP servers. Five stories, five commits, ~7 hrs total wall time.

Story	Output	Result
1	Scaffolding + popover shell	Toggle button (bottom-right), slide-out panel (420px wide), Escape closes
2	SSE chat endpoint + Agent SDK wrapper	`POST /api/chat` streaming events; Claude persona + hard guardrails in system prompt; 6 MCP tools available
3	Streaming UI integration	Multi-turn message state, tool indicators, thinking dots, error bubbles
4	Citation rendering	`[CITE:type:id]` token parsing, inline orange pills, hover popover
5	Smoke test	This doc

No new MCP tools, no DB changes, no schema changes. Per spike scope.

What worked

The corpus chat is genuinely useful

Test question: "Show me a few past quotes for centrifugal compressors. Be brief and cite specific quotes."

Response (excerpt):

"Here are some notable centrifugal compressor quotes from the history:

BASF Corporation — 250 HP FS-Elliott Polaris 300 Series [quote:137218…c22] — Full package with motor, Wye-Delta starter, Regulus R200 PLC controls. (Nov 2007)

Guardian Industries Corp — Two Polaris Series options quoted side-by-side [quote:d7a0a1…edf]: 250 HP P300-250 (1005 scfm) and 300 HP P300-300 (1200 scfm). (Mar 2011)

Marathon Oil — 800 HP centrifugal with steam turbine drive [line:275f33…f97] — one of the larger single-unit quotes in the corpus. (May 2007)

Slate Auto — Six 1000 HP Lone Star Turbo Chorus 90 centrifugals [line:88cb3e…b98] for a 20,000 CFM system at a new Warsaw, IN facility — the biggest centrifugal quote I'm seeing. (Mar 2026)"

Five citations, real customer names, real equipment specs, real dates spanning 19 years of corpus history. This is the platform thesis embodied — Claude with access to 22 years of institutional knowledge produces answers no general LLM could give.

Hard guardrails work

Test question: "What's our typical markup on Kaeser compressors?"

Response:

"I don't surface internal financial details like markup or margin — that's the salesperson's call and lives in the pricing card, not here. What I can do: if it's helpful context, I can pull past Kaeser quotes so you can see what we've sold and at what list/quoted equipment values."

Clean refusal + helpful redirect. Exactly what the system prompt specified. No hedging, no leak.

Authentic Claude branding lands

The chat introduces itself as Claude, references the corpus naturally ("the corpus doesn't have quotes explicitly tagged 'food-grade,' but the pattern is clear..."), and feels familiar to anyone who's used claude.ai. Andy / leadership don't need to learn a new chatbot persona — it's Claude with their data.

Citations as inline pills

[CITE:quote:UUID] tokens parse out of the streaming text in real time, render as orange pills with the type + truncated ID. Hover shows a popover with the full citation. Visually demonstrates "this answer is grounded in real corpus data" without requiring the user to click anything.

Streaming + tool indicators

Tool calls show as small dotted rows above the streaming bubble (● search_past_quotes (quotes) ● search_line_items (quotes)) — gives visible "Claude is doing something" feedback during the silent retrieval phase. Streaming text accumulates with a blinking caret. Three-dot thinking animation when there's no text yet.

Same Sonnet model + cache wiring

Reuses the same Agent SDK config as generate.ts — Sonnet 4.6, prompt caching with SYSTEM_PROMPT_DYNAMIC_BOUNDARY, extended thinking disabled (per E3-D4 — same UX-killer concern). First message: ~$0.06, ~11s no-retrieval. Multi-tool message: ~$0.10-0.20, ~15-25s.

What's missing (gaps for a proper feature epic)

Listed roughly in priority order:

High-priority (blocks demo polish)

Source preview in citation popover. Currently shows just the type + ID + "spike: source preview not yet wired" disclaimer. Proper feature should fetch the source content (quote body excerpt, line-item description, product specs) and render it inline in the popover. Needs a new endpoint OR a way to consume the existing get_quote / get_product tool outputs. Wire-friendly without violating the "no new MCP tools" constraint — could even reuse the chat API by sending a structured request.
Markdown rendering in messages. The model emits markdown (**bold**, lists, etc.) which currently renders as raw text. Should pipe through a markdown renderer (the existing draft view uses one — reuse it). Substantial UX uplift; not hard.
Better auto-scroll. Current naive scrollTop = scrollHeight yanks the user back to the bottom even if they've scrolled up to read earlier content. Should use the following pattern from StreamView.tsx (auto-scroll only when user is near the bottom).

Medium-priority (matters at usage scale)

Conversation persistence. State lives in client React state — disappears on page reload, doesn't persist across sessions. Should persist conversations to Postgres (new table: chat_conversations + chat_messages) so the salesperson can come back to a thread.
Multi-turn context-window management. Currently sends the entire message history on every turn. Fine for short chats; will hit token limits at scale. Need a pruning / summarization strategy.
Cost monitoring. Chat is per-query; with retrieval, each turn is $0.10-$0.20+. A power user could burn $5-10/day. Need usage tracking + maybe model selection per question complexity (use Haiku for "what year was X quoted?" type questions, Sonnet for analytical questions).
Mobile responsiveness. Panel is fixed 420px — will overflow on mobile. Should switch to full-screen modal on narrow viewports.

Low-priority (nice-to-have)

Stop generation button. AbortController is already wired into the client; just needs a UI button to call it.
Conversation export. "Send this thread as an email" / "save to a draft" — natural extensions for the salesperson workflow.
Quick actions. Pre-canned prompts on the empty state ("What's been quoted to ABC Co?", "Show me last quarter's centrifugal quotes").
Tool result inspection. Power users might want to see what the corpus actually returned for each tool call — like a debug pane.

Validation gaps (didn't test in spike)

Sizing guardrail ("What size compressor does this customer need?") — not tested; high confidence it works based on the markup-question result, but should verify before demo.
Commission guardrail ("What's John's commission on this?") — not tested; same.
Customer-specific question ("What did we sell to Slate Trucks?") — not tested. The retrieval found Slate Auto in centrifugal quotes; real customer-grain retrieval might need quote-document grain (currently we have line-item-grain embeddings).
Hallucination edge case. What happens when retrieval returns NOTHING? System prompt says to acknowledge it explicitly; should verify by asking about a customer not in the corpus.
Long conversations. Tested 2-3 turns; 10+ turn conversations might surface context-window issues.

Demo-readiness call: 🟢 GO for Andy meeting

Recommendation: show it tomorrow. Reasons:

The corpus chat is the platform thesis embodied. Andy is co-author on the leadership deck — seeing this work makes Slide 16 ("The Platform Extends") concrete. He'll get the thesis viscerally instead of abstractly.
Authentic Claude branding lands. It's Claude with Brehob's data — the simplest, most defensible framing of what we're building. No "is this just GPT?" question to deflect.
Guardrails work. The pricing-refusal we tested is the highest-risk demo moment — and it lands cleanly. Sizing/commission likely work the same way (same system prompt mechanism); worth a quick test before showing.
It's clearly a spike. The "spike: source preview not yet wired" text in the citation popover signals "this is exploratory" — Andy won't expect production polish.

Risks to flag during demo:

Hot-reload during dev wipes conversation state — won't matter on a stable build, just a dev-environment quirk
Citations are visible but not yet click-through to source content — frame this as "next iteration"
Long conversations untested — keep demo to 2-3 turns max

One quick action before showing Andy: test the sizing + commission guardrails ("What size compressor does this customer need?" and "What's John's commission on this quote?") so we know they refuse cleanly. ~2 minutes of testing.

Demo-readiness call for Indianapolis (May 12): 🟡 conditional

For the leadership pitch a week out, this needs:

Markdown rendering in messages (visual polish)
At least basic source preview in citation popover (proves the citations are real, not just decoration)
Better auto-scroll (current naive version yanks the user during reads)
Validated coverage on 5-8 leadership-likely questions (not just the 2-3 we tested)
Probably conversation persistence so leadership can poke at the Vercel preview after the meeting

Effort to get there: ~1 day if we don't gold-plate. Half-day if we skip persistence and conversation polish, ship just the visual + citation upgrades.

What this validates strategically

This spike validates two things load-bearing for the leadership pitch:

The corpus IS the moat. Generic LLMs can't answer "Show me Brehob's centrifugal compressor quotes from Q1" — Claude + 22 years of indexed quote history can. Per the competitive landscape doc, this is exactly the differentiation vs Canals/Distro/Endeavor — they parse RFQs; we ingest the corpus and serve it conversationally.
The platform extension thesis is real, not hand-waving. Slide 16 ("The Platform Extends — Year 2 and Beyond") promises chat as the first extension after quoting. That's no longer aspirational — it's working code on a branch, driven by the same MCP servers + same Sonnet model that powers quote drafting. Same data layer, new use case, no new integrations required — that's the slide language, and the spike makes it true.

Files added in the spike

app/components/chat/ChatPanel.tsx — full UI (popover, messages, input, streaming, citations)
app/lib/cc/chat.ts — Agent SDK wrapper for chat mode (multi-turn, no preamble trim)
app/app/api/chat/route.ts — POST endpoint with SSE streaming
app/prompts/chat-system.md — Claude persona + hard guardrails + citation format spec
One-line edit to app/app/layout.tsx to mount the panel

Total: 5 commits, ~970 lines net. Branch ready for Dan's review and either merge-as-spike-foundation or archive-for-learnings.

QuoteAI — Claude Chat Spike Retro#

What we built#

What worked#

The corpus chat is genuinely useful#

Hard guardrails work#

Authentic Claude branding lands#

Citations as inline pills#

Streaming + tool indicators#

Same Sonnet model + cache wiring#

What's missing (gaps for a proper feature epic)#

High-priority (blocks demo polish)#

Medium-priority (matters at usage scale)#

Low-priority (nice-to-have)#

Validation gaps (didn't test in spike)#

Demo-readiness call: 🟢 GO for Andy meeting#

Demo-readiness call for Indianapolis (May 12): 🟡 conditional#

What this validates strategically#

Files added in the spike#

Review