E2: MCP Servers — Epic Design Doc
Status: 🔄 In Refinement (Step 0) Authors: Dan Hannah & Clay Created: 2026-04-18 Parent: QuoteAI Project Design Doc
Overview
Goals & Non-Goals
Goals:
- Build two MCP servers:
@quoteai/equipmentand@quoteai/quotes - Expose tools that let Claude Code (demo) or a future Anthropic-API-backed app (Full MVP) retrieve the exact context needed to assemble a draft quote
- Wire the servers into the Claude Code config so the demo experience works end-to-end
- Ship a
templates/brehob-quote.mdreference artifact extracted from the 4M Industries doc, so CC has the target template
Non-Goals:
- No pricing tools (pricing is the salesperson's job, per John)
- No write tools (no creating/updating quotes via MCP — CC or the app does that directly)
- No inventory/CRM tools (post-MVP)
- No auth on MCP servers (stdio transport only for Demo)
- No SSE/hosted transport (post-MVP)
Problem Statement
The MCP servers are the bridge between the ingested data (E1) and the generation layer (Claude Code in Demo, app in Full MVP). Without them, CC has no way to query the description library and will hallucinate equipment language instead of using John's proven phrasing.
Every MCP tool either makes the demo better (better retrieval = better draft) or adds friction. Ship the minimum set that serves the user flow in the main design doc; nothing else.
What Is This Epic?
Two independent MCP servers (stdio transport, TypeScript, @modelcontextprotocol/sdk) exposing the tools needed for quote assembly:
@quoteai/equipment— product catalog search + lookup@quoteai/quotes— past-quote semantic search + description retrieval
Both servers share a Postgres connection to the pgvector DB populated by E1. CC is configured to load both during the demo.
Context
Dependents
- Claude Code demo experience — CC calls these tools every time it assembles a draft
- E3 (UI) — the UI triggers a flow that ends with CC assembling a draft via these tools
- Future Full MVP app — the in-app Anthropic API call uses the same tools
Dependencies
- E0 (Foundation) — DB connection layer, env conventions
- E1 (Ingestion + Vector DB) — hard dependency; MCP tools are useless without data
Current State
No MCP servers exist. The design doc sketches the tool surface; E2 builds it. Anvil's MCP server design (projects/anvil/epics/mcp-tools.md) is a useful reference pattern, but tools are QuoteAI-specific (not reused).
Affected Systems
| System / Layer | How It's Affected |
|---|---|
mcp-servers/equipment/ | Fully built — TypeScript package with MCP SDK |
mcp-servers/quotes/ | Fully built — TypeScript package with MCP SDK |
| Claude Code config | Updated to load both servers during demo |
| Postgres | Read-only queries from both servers |
templates/brehob-quote.md | New reference artifact — the 4M template codified |
Design
Tool Surface
@quoteai/equipment
| Tool | Params | Returns | Purpose |
|---|---|---|---|
search_equipment | query: string, cfm_min?: number, cfm_max?: number, psi_min?: number, psi_max?: number, hp_min?: number, hp_max?: number, top_k?: number | Array of { product, score, snippet } | Semantic + structured-filter search over product catalog. All numeric filters are min/max pairs — for exact-HP lookup CC passes hp_min=100, hp_max=100. |
get_product | model: string | Full product row (all spec fields) | Exact model lookup |
get_specs | models: string[] | Array of product rows, one per model | Side-by-side comparison for multi-option quotes |
@quoteai/quotes
| Tool | Params | Returns | Purpose |
|---|---|---|---|
search_past_quotes | query: string, top_k?: number | Array of { quote, score, summary } | Find quotes similar to an overall project description |
get_quote | quote_id: string | Full past_quotes row + line items | Full context for a specific past quote |
search_line_items | query: string, top_k?: number | Array of { line_item, score, source_quote } | The description library query — finds proven language at line-item granularity. Vector-only for MVP (see note). |
Note on line-item filters. quote_line_items has no structured spec columns today — just description, quantity, prices, product_id FK, embedding, markdown (see db/migrations/001_init.sql). Haiku extracts HP/CFM/PSI as part of the description text, not as structured fields, and product_id isn't populated by the loader, so neither direct nor JOIN-based filtering works without schema changes + re-ingest. MVP accepts vector-only retrieval here — the verbatim description already contains literals like "100HP" / "oilless" / "food-grade" which the embedding picks up.
Post-demo follow-up — if vector-only underperforms on real queries, the cheapest structured filter to add is category?: string (e.g., "ROTARY SCREW AIR COMPRESSOR"). The extractor already produces this per line item via LineItemSchema.category; it just isn't stored. One ALTER + backfill (or re-ingest) adds it.
get_descriptions considered and dropped — it was a thin wrapper over search_line_items with a different input shape. CC can construct the equivalent query string directly; two tools doing similar things raises the odds of CC picking the wrong one.
Hybrid Search Logic (Critical)
Pure vector search misses filter-style constraints ("100HP", "food-grade"). Hybrid approach applies to search_equipment only — search_line_items is vector-only for MVP (see Tool Surface note).
When any structured filter is provided — filter-then-rank:
- Generate query embedding via OpenAI
SELECT ... WHERE <structured filters> ORDER BY embedding <=> $query_embedding LIMIT $top_k- Return
top_k
Filter-first ensures hard constraints like "exactly 100HP food-grade" are enforced. The vector step ranks within the eligible set, so a qualifying row can't be missed because it fell outside the N nearest neighbors by embedding distance. This is the opposite of rank-then-filter — which was the earlier proposal but only works when filters are soft signals, not hard requirements.
When no structured filters — rank-only:
- Generate query embedding via OpenAI
SELECT ... ORDER BY embedding <=> $query_embedding LIMIT $top_k- Return
top_k
If hybrid underperforms (golden test fails): add BM25 full-text search via Postgres ts_vector and merge scores with rrf (reciprocal rank fusion). Deferred until we see where the gaps actually are.
Template Reference
The 4M Industries template is compressor-specific (CFM/PSI/cooling). Brehob also quotes vacuums, dryers, blowers — each has a different natural spec set. Rather than one monolithic template, split into an outer skeleton plus per-category partials.
Outer skeleton — templates/brehob-quote.md:
- Header (date, proposal number, company info, attention, salutation)
- Intro paragraph ("Brehob Corporation is pleased to have the opportunity…")
- Per-line-item placeholder — CC inserts one characteristics block per line item, selected by that line's extracted category
- Installation (conditional, when present)
- Exclusions, Totals, Terms (Net 30, 30-day validity, FOB Factory)
- Capability pitch, signature, footer (five Brehob offices)
Per-category characteristics partials — templates/characteristics/<category>.md:
compressor.md— Manufacturer, Series/Model, Cooling, Pressure (PSI), Capacity (CFM), Electric Motor (HP/RPM/ODP/SF), Voltage, Drive System, Dimensions, Weight- Additional categories (vacuum, dryer, blower, etc.) ship only when a category appears in the ingested subset. The
LineItemSchema.categoryfield iningestion/extractor/schemas.tsdrives the lookup — same taxonomy selects the partial AND (post-demo) the optionalcategoryfilter onsearch_line_items.
Demo scope — compressor only unless the ingested subset already contains another category that we want in the demo flow. Spot-check the DB during S7 (SELECT DISTINCT category FROM quote_line_items … once the markdown is extracted, or eyeball the line-item markdown) to decide whether to ship a second partial (Henry Ford is noted as "vacuum" in the session handoff — confirm and either add vacuum.md or leave Henry Ford out of the demo golden path).
Compressor characteristics block (first cut — extract verbatim shape from 4M reference during S7):
[CATEGORY IN CAPS]
CHARACTERISTICS
Manufacturer: [value]
Series / Model: [value]
Cooling: [Air/Water]
Pressure: [PSI]
Capacity CFM: [value]
Electric Motor: [HP, RPM, ODP, SF]
Voltage: [value]
Drive System: [value]
Dimensions: [L x W x H]
Weight: [lbs]
[Warranty block — manufacturer-specific]
Model [X] as described above Net $[PRICE] Each
Delivery: [estimated timeframe]
CC reads the outer skeleton plus the matching characteristics partial(s) when assembling a draft. This is the "house template" contract.
Data Model Changes
None — all queries are read-only against E1's data.
API / Interface Changes
Each MCP server:
- Exports via
stdiotransport (simplest; CC spawns as child process) - Uses
@modelcontextprotocol/sdkwith standard tool registration - Loads
DATABASE_URLandOPENAI_API_KEYfromapp/.env.localat startup — same source of truth as the ingestion CLI and Next.js app. Avoids re-specifying secrets in the CC config and avoids depending on CC's shell env. - Ships as an npm workspace package under
mcp-servers/*(local dev only — not published)
Env-loading pattern matches the ingestion CLI (which already solved this): use dotenv preload with DOTENV_CONFIG_PATH pointing at app/.env.local. CC config references the compiled entrypoint plus the dotenv preloader; the path to .env.local is absolute (CC's mcp.json does not interpolate workspace variables).
CC config example (replace <repo-root> with the absolute path to ~/Documents/Code/quoteai):
{
"mcpServers": {
"quoteai-equipment": {
"command": "node",
"args": ["-r", "dotenv/config", "<repo-root>/mcp-servers/equipment/dist/index.js"],
"env": { "DOTENV_CONFIG_PATH": "<repo-root>/app/.env.local" }
},
"quoteai-quotes": {
"command": "node",
"args": ["-r", "dotenv/config", "<repo-root>/mcp-servers/quotes/dist/index.js"],
"env": { "DOTENV_CONFIG_PATH": "<repo-root>/app/.env.local" }
}
}
}
S8 ships a short README snippet walking through the absolute-path substitution so first-time demo setup isn't a scavenger hunt.
Edge Cases & Gotchas
| Scenario | Expected Behavior | Why It's Tricky |
|---|---|---|
| Query with no matches | Return empty array, not error | CC should be able to distinguish "no results" from "tool broken" |
Query exceeds top_k available rows | Return whatever exists; don't pad | Small DB (curated subset) will hit this often |
| Concurrent queries from CC | Both servers must handle via connection pool | CC may fan out queries in parallel |
| Embedding API outage | search_* tools return error with clear message | CC should degrade gracefully — fall back to keyword match if possible (post-MVP) |
Model field mismatch in get_product | Return 404-style error, not empty result | CC needs to know the model doesn't exist vs. server issue |
| Very long description blocks (> 2k tokens) | Return as-is; let CC decide how to truncate | Some 4M-style quotes have dense installation blocks |
Testing Strategy
Test Layers
| Layer | Applies? | Notes |
|---|---|---|
| Unit tests | Yes | Query builders, filter composition, error handling |
| Integration tests | Yes | Spin up a test Postgres with fixture data; call each tool; verify shape + content |
| Golden retrieval test (from E1) | Yes | Same golden test — does search_line_items return expected items for the golden query? |
| E2E with CC | Yes | Configure CC to load both servers; run a demo quote flow manually; does it assemble correctly? |
Required Fixtures
| Fixture Name | What It Tests | Priority |
|---|---|---|
fixtures/mcp-golden-query.test.ts | search_line_items("100HP oilless food grade") returns Groeb/4M/Powerex in top 5 | 🔴 High |
fixtures/mcp-get-product.test.ts | get_product("QMB30") returns complete spec | 🔴 High |
fixtures/mcp-empty-results.test.ts | Nonsense query returns [], not error | 🟡 Medium |
Verification Rules
- Every tool has at least one integration test against seeded DB.
- Golden retrieval test runs green before demo.
- Manual E2E walkthrough before showing anyone — open CC, fill mock form, watch it call tools, inspect output against the 4M template.
Stories
Stories are split into two phases. Phase A is the vertical slice — the minimum set that answers "can CC assemble a good draft from these tools?" If Phase A fails, Phase B is noise; we'd re-scope. If Phase A passes, Phase B fills in breadth.
Phase A — Vertical slice (demo-path minimum):
| Story | Summary | Status | PR |
|---|---|---|---|
| S3 | @quoteai/quotes scaffold — MCP SDK, stdio transport, DB client, app/.env.local loader | — | — |
| S5 | @quoteai/quotes search_line_items — the critical one, vector-only | — | — |
| S7 | templates/brehob-quote.md outer skeleton + templates/characteristics/compressor.md partial (extracted from 4M reference) | — | — |
| S8 | CC config wiring (absolute paths + DOTENV_CONFIG_PATH) + manual E2E demo walkthrough against golden scenario ("100HP oilless compressor for food-grade plant") | — | — |
Phase A explicitly skips the equipment server. If the spike shows CC needs product-spec lookups to fill template fields cleanly, we promote S0 + S2 out of Phase B before finishing Phase B's other work.
Phase B — Breadth (fills in after Phase A demonstrates the loop):
| Story | Summary | Status | PR |
|---|---|---|---|
| S0 | @quoteai/equipment scaffold — MCP SDK, stdio transport, shared DB client / env loader pattern from S3 | — | — |
| S1 | @quoteai/equipment search_equipment implementation + filter-then-rank hybrid | — | — |
| S2 | @quoteai/equipment get_product + get_specs | — | — |
| S4 | @quoteai/quotes search_past_quotes | — | — |
| S6 | @quoteai/quotes get_quote | — | — |
Known Issues / Tech Debt
| Issue | Severity | Notes |
|---|---|---|
| No rate limiting on server-side | 🟢 Low | stdio = single client (CC) — not a risk |
| No caching on query embeddings | 🟡 Medium | Same query gets re-embedded every call. Add LRU cache in Full MVP. |
| No observability (metrics, tracing) | 🟡 Medium | Add once we're iterating on retrieval quality and want to see what CC is actually asking |
| Hybrid search ranking is naive | 🟡 Medium | May need RRF or learned-to-rank. Evaluate post-demo. |
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Retrieval quality insufficient for good drafts | Medium | 🔴 High | Golden test is the gate. Iterate on query construction + hybrid filters until it passes. |
| CC doesn't use the tools well (hallucinates instead) | Medium | 🔴 High | Test the demo flow against the golden scenario manually before showing John. Adjust tool descriptions (the description field in MCP tool registration) to nudge CC toward calling them. |
| MCP SDK version incompatibility with CC | Low | Medium | Pin @modelcontextprotocol/sdk version; test locally before demo |
| Postgres connection exhaustion under concurrent CC calls | Low | Low | Use pg-pool with a small cap; stdio = one client anyway |
| Embedding-based filtering is too loose (returns unrelated items) | Medium | 🟡 Medium | Structured filters + a similarity threshold (score > 0.7) to drop weak matches |
Decisions Log
| Date | Decision | Rationale | Alternatives Considered |
|---|---|---|---|
| 2026-04-18 | Two independent servers (equipment + quotes) | Matches design doc; separation of concerns; smaller tool surfaces per server | One server with all tools (rejected: scope creep, messier tool list) |
| 2026-04-18 | stdio transport for Demo | Simplest; CC spawns as child process | SSE (rejected: premature for local demo) |
| 2026-04-18 | Hybrid search (vector + structured filter) as default | Pure vector misses hard constraints like HP range | Pure vector (rejected: tested mentally against golden scenario, would miss items) |
| 2026-04-18 | search_line_items is the primary retrieval tool | Matches E1's "line items are the atomic unit" decision | search_past_quotes as primary (rejected: too coarse for description language) |
| 2026-04-18 | 4M template extracted into templates/brehob-quote.md | CC needs an explicit target format | Inline in system prompt (rejected: hard to iterate, unversioned) |
| 2026-04-18 | No MCP write tools | Per design doc — MCP is retrieval only | Write tools for quote_log (rejected: Full MVP scope) |
| 2026-04-18 | Defer BM25/RRF hybrid to post-golden-test | YAGNI until the golden test shows the gap | Build BM25 upfront (rejected: premature optimization) |
| 2026-04-20 | get_descriptions scrapped | Thin wrapper over search_line_items with a different input shape; two similar tools raises the odds CC picks the wrong one | Keep as convenience (rejected: not pulling its weight) |
| 2026-04-20 | All structured numeric filters are min/max pairs on search_equipment | Consistency over cleverness — CC passes hp_min=100, hp_max=100 for exact matches, ranges when it has them | Mixed scalars/ranges (rejected: asymmetric surface); hp + internal tolerance (rejected: hides the control from CC) |
| 2026-04-20 | Filter-then-rank when any structured filter present | Hard constraints ("exactly 100HP food-grade") can't be lossy. Rank-then-filter risks dropping qualifying rows that fell outside the N nearest neighbors | Rank-then-filter with 3× buffer (rejected: only works for soft filters, which isn't the golden-scenario shape) |
| 2026-04-20 | search_line_items is vector-only for MVP | quote_line_items has no structured spec columns (confirmed in 001_init.sql); Haiku extracts specs into description text, not typed fields; product_id FK isn't populated by the loader so JOIN-based filter is blocked too | Extend schema + re-extract (rejected: too much E1 churn for MVP); add category filter (deferred as post-demo follow-up — cheap since LineItemSchema.category is already extracted, just not stored) |
| 2026-04-20 | Vertical-slice story ordering (Phase A = S3/S5/S7/S8, Phase B = the rest) | The biggest unknown is "can CC assemble a good draft?" — we answer that with the minimum viable loop before building breadth. If Phase A fails, everything else is wasted | Breadth-first (rejected: delays the feedback that most changes the design) |
| 2026-04-20 | Env loading via dotenv/config preload pointing at app/.env.local | Single source of truth with ingestion CLI + app; no secrets in ~/.claude/mcp.json; no dependence on CC's shell env | Inline env block in mcp.json (rejected: secret sprawl, painful to rotate); shell env (rejected: fragile across CC restarts) |
| 2026-04-20 | Per-category template partials (outer skeleton + characteristics/<category>.md) | Different equipment categories have genuinely different spec sets (compressor CFM/PSI vs vacuum inHg/ACFM vs dryer dewpoint); one monolithic template would either over-constrain or under-constrain. Reuses the LineItemSchema.category taxonomy | One monolithic template with CC adapting (rejected: CC ends up guessing field names across categories); Zod-per-category (rejected: overkill for MVP) |
| 2026-04-20 | Demo scope = compressor category only | Ingested subset is compressor-heavy; ship vacuum/dryer/blower partials only when those categories appear in the golden flow | Ship all categories upfront (rejected: premature for demo) |
| 2026-04-21 | dotenv imported inside src/index.ts, NOT via node -r dotenv/config preload (commit 6338ff0) | pnpm doesn't hoist dotenv to the repo root; -r resolves from cwd's node_modules so the preload phase fails when CC spawns the server from the project root. File-relative ESM resolution via in-module import reaches the package's own node_modules through pnpm's symlink tree | Hoist dotenv via shamefully-hoist (rejected: heavy-handed for one module); CWD manipulation in the launch args (rejected: fragile) |
| 2026-04-21 | Pool sets ivfflat.probes = 20 via Postgres startup -c option (commit 3c2ea17) | 001_init.sql built ivfflat with lists=20 for eventual thousands-of-rows scale, but MVP corpus is <100 rows per table. Default probes=1 can land the query vector in an empty list and return 0 candidates even when matching data exists (actually observed for search_line_items and search_past_quotes on semantically distant queries). -c at connection startup is race-free, unlike pool.on('connect') which lets the client be checked out before the async SET completes. Follow-up: drop or reduce this setting once ingestion grows past the lists count, otherwise probes=20 scans all lists and negates the index's pruning value | pool.on('connect') with SET (rejected: race condition, pg deprecation warning); per-query SET LOCAL in a txn (Agent B's first attempt; rejected: doesn't cover search_line_items + search_equipment); rebuild indexes with smaller lists (deferred: touches 001_init migration) |
| 2026-04-21 | Inline draft conventions formalized in outer template (commit 5b369c0) — emerged organically in Phase A CC walkthrough, promoted to house style | Three patterns CC invented without prompting turned out to be genuinely useful: (1) SALESPERSON REVIEW blockquote for retrieval gaps / scope notes needing scrubbing / mismatched analogs; (2) attribution notes above verbatim blocks with source customer + quote number + date + score; (3) end-of-draft retrieval summary table. Formalizing them in the template header makes the behavior robust across models (validated later in Sonnet run — same conventions appeared) | Let each model discover them (rejected: Sonnet might not; comparison showed model-variant deltas that templates resolve better than prompt engineering) |
| 2026-04-21 | Added lubrication enum to products + search_equipment filter (commit 7f4ae10) | Phase B walkthrough exposed that Q$ync 100 (oil-flooded) matched "100HP oilless food-grade" at the top of the catalog because the ingested text has no explicit lubrication signal. Structured filter enforces a hard constraint CC can't bypass. Values: oilless / oil-free / oil-flooded / null. Products with NULL lubrication are EXCLUDED from filtered results — "unknown" doesn't fake a match. Extractor prompt bumped to v2 to populate on future ingest; existing 3 rows backfilled via UPDATE (all confirmed oil-flooded by manual inspection of their spec sheets). | Re-rank penalty based on query terms (rejected: unreliable; "oilless" as a query keyword can't differentiate in-source vs not-in-source); ignore the gap (rejected: golden-scenario blocker) |
| 2026-04-21 | Confidence bands on search hits with fixed thresholds (commit 3c79efb) | A flat score distribution across bad retrievals is indistinguishable from a flat distribution with one good match. The band label lets CC read "all my top-K are probable_miss" as a retrieval failure signal without threshold-guessing. Thresholds: ≥0.7 likely_good, 0.4–0.7 analog, <0.4 probable_miss. Calibrated from observed scores on the current corpus. Follow-up: retune as ingestion grows and scores tighten; keep the 3-tier shape stable so downstream (template Status column, eventual UI) can depend on the vocabulary. | Top-level confidence only (rejected: per-hit is strictly more informative); dynamic thresholds from current corpus percentiles (rejected: thresholds must be stable across runs for the labels to mean the same thing) |
| 2026-04-21 | Sonnet validated for the Full MVP generation role (commit fca3b7c — see demo/phase-b-model-comparison.md) | Dual-model walkthrough post-improvements: Sonnet matched Opus on lubrication filter use, confidence consumption, verbatim preservation (incl. source typos), price discipline, template order, and retrieval gap handling with SALESPERSON REVIEW blockquotes. Deltas were cosmetic (Opening block spacing, silent omission of oil/water separator on oilless systems) and both were addressed by template polish in commit 4258713. The design doc's "Assembly (Full MVP): Anthropic Sonnet via API" is no longer an assumption. | Stick with Opus-only (rejected: Full MVP economics favor Sonnet at this quality level); defer validation to Full MVP build time (rejected: cheaper to know now, commit to the API shape earlier) |
E2 is the bridge. Shipping means CC can call these tools, get good results, and assemble a quote that looks like the 4M template. Nothing else.
Status (as of 2026-04-21): E2 is closed. All 9 stories shipped; both phase walkthroughs passed; two follow-ups (#1 lubrication, #6 confidence) landed from Phase B feedback; Sonnet validated for Full MVP. Deferred Phase B walkthrough findings (#3 category filter, #4 markdown rendering, #5 installation scope, #7 data gap, #8 get_product utility) are captured in next.md as post-demo pickups.