Foundry Foundry

E2: MCP Servers — Epic Design Doc

Status: 🔄 In Refinement (Step 0) Authors: Dan Hannah & Clay Created: 2026-04-18 Parent: QuoteAI Project Design Doc


Overview

Goals & Non-Goals

Goals:

  • Build two MCP servers: @quoteai/equipment and @quoteai/quotes
  • Expose tools that let Claude Code (demo) or a future Anthropic-API-backed app (Full MVP) retrieve the exact context needed to assemble a draft quote
  • Wire the servers into the Claude Code config so the demo experience works end-to-end
  • Ship a templates/brehob-quote.md reference artifact extracted from the 4M Industries doc, so CC has the target template

Non-Goals:

  • No pricing tools (pricing is the salesperson's job, per John)
  • No write tools (no creating/updating quotes via MCP — CC or the app does that directly)
  • No inventory/CRM tools (post-MVP)
  • No auth on MCP servers (stdio transport only for Demo)
  • No SSE/hosted transport (post-MVP)

Problem Statement

The MCP servers are the bridge between the ingested data (E1) and the generation layer (Claude Code in Demo, app in Full MVP). Without them, CC has no way to query the description library and will hallucinate equipment language instead of using John's proven phrasing.

Every MCP tool either makes the demo better (better retrieval = better draft) or adds friction. Ship the minimum set that serves the user flow in the main design doc; nothing else.

What Is This Epic?

Two independent MCP servers (stdio transport, TypeScript, @modelcontextprotocol/sdk) exposing the tools needed for quote assembly:

  • @quoteai/equipment — product catalog search + lookup
  • @quoteai/quotes — past-quote semantic search + description retrieval

Both servers share a Postgres connection to the pgvector DB populated by E1. CC is configured to load both during the demo.


Context

Dependents

  • Claude Code demo experience — CC calls these tools every time it assembles a draft
  • E3 (UI) — the UI triggers a flow that ends with CC assembling a draft via these tools
  • Future Full MVP app — the in-app Anthropic API call uses the same tools

Dependencies

  • E0 (Foundation) — DB connection layer, env conventions
  • E1 (Ingestion + Vector DB)hard dependency; MCP tools are useless without data

Current State

No MCP servers exist. The design doc sketches the tool surface; E2 builds it. Anvil's MCP server design (projects/anvil/epics/mcp-tools.md) is a useful reference pattern, but tools are QuoteAI-specific (not reused).

Affected Systems

System / LayerHow It's Affected
mcp-servers/equipment/Fully built — TypeScript package with MCP SDK
mcp-servers/quotes/Fully built — TypeScript package with MCP SDK
Claude Code configUpdated to load both servers during demo
PostgresRead-only queries from both servers
templates/brehob-quote.mdNew reference artifact — the 4M template codified

Design

Tool Surface

@quoteai/equipment

ToolParamsReturnsPurpose
search_equipmentquery: string, cfm_min?: number, cfm_max?: number, psi_min?: number, psi_max?: number, hp_min?: number, hp_max?: number, top_k?: numberArray of { product, score, snippet }Semantic + structured-filter search over product catalog. All numeric filters are min/max pairs — for exact-HP lookup CC passes hp_min=100, hp_max=100.
get_productmodel: stringFull product row (all spec fields)Exact model lookup
get_specsmodels: string[]Array of product rows, one per modelSide-by-side comparison for multi-option quotes

@quoteai/quotes

ToolParamsReturnsPurpose
search_past_quotesquery: string, top_k?: numberArray of { quote, score, summary }Find quotes similar to an overall project description
get_quotequote_id: stringFull past_quotes row + line itemsFull context for a specific past quote
search_line_itemsquery: string, top_k?: numberArray of { line_item, score, source_quote }The description library query — finds proven language at line-item granularity. Vector-only for MVP (see note).

Note on line-item filters. quote_line_items has no structured spec columns today — just description, quantity, prices, product_id FK, embedding, markdown (see db/migrations/001_init.sql). Haiku extracts HP/CFM/PSI as part of the description text, not as structured fields, and product_id isn't populated by the loader, so neither direct nor JOIN-based filtering works without schema changes + re-ingest. MVP accepts vector-only retrieval here — the verbatim description already contains literals like "100HP" / "oilless" / "food-grade" which the embedding picks up.

Post-demo follow-up — if vector-only underperforms on real queries, the cheapest structured filter to add is category?: string (e.g., "ROTARY SCREW AIR COMPRESSOR"). The extractor already produces this per line item via LineItemSchema.category; it just isn't stored. One ALTER + backfill (or re-ingest) adds it.

get_descriptions considered and dropped — it was a thin wrapper over search_line_items with a different input shape. CC can construct the equivalent query string directly; two tools doing similar things raises the odds of CC picking the wrong one.

Hybrid Search Logic (Critical)

Pure vector search misses filter-style constraints ("100HP", "food-grade"). Hybrid approach applies to search_equipment only — search_line_items is vector-only for MVP (see Tool Surface note).

When any structured filter is provided — filter-then-rank:

  1. Generate query embedding via OpenAI
  2. SELECT ... WHERE <structured filters> ORDER BY embedding <=> $query_embedding LIMIT $top_k
  3. Return top_k

Filter-first ensures hard constraints like "exactly 100HP food-grade" are enforced. The vector step ranks within the eligible set, so a qualifying row can't be missed because it fell outside the N nearest neighbors by embedding distance. This is the opposite of rank-then-filter — which was the earlier proposal but only works when filters are soft signals, not hard requirements.

When no structured filters — rank-only:

  1. Generate query embedding via OpenAI
  2. SELECT ... ORDER BY embedding <=> $query_embedding LIMIT $top_k
  3. Return top_k

If hybrid underperforms (golden test fails): add BM25 full-text search via Postgres ts_vector and merge scores with rrf (reciprocal rank fusion). Deferred until we see where the gaps actually are.

Template Reference

The 4M Industries template is compressor-specific (CFM/PSI/cooling). Brehob also quotes vacuums, dryers, blowers — each has a different natural spec set. Rather than one monolithic template, split into an outer skeleton plus per-category partials.

Outer skeletontemplates/brehob-quote.md:

  • Header (date, proposal number, company info, attention, salutation)
  • Intro paragraph ("Brehob Corporation is pleased to have the opportunity…")
  • Per-line-item placeholder — CC inserts one characteristics block per line item, selected by that line's extracted category
  • Installation (conditional, when present)
  • Exclusions, Totals, Terms (Net 30, 30-day validity, FOB Factory)
  • Capability pitch, signature, footer (five Brehob offices)

Per-category characteristics partialstemplates/characteristics/<category>.md:

  • compressor.md — Manufacturer, Series/Model, Cooling, Pressure (PSI), Capacity (CFM), Electric Motor (HP/RPM/ODP/SF), Voltage, Drive System, Dimensions, Weight
  • Additional categories (vacuum, dryer, blower, etc.) ship only when a category appears in the ingested subset. The LineItemSchema.category field in ingestion/extractor/schemas.ts drives the lookup — same taxonomy selects the partial AND (post-demo) the optional category filter on search_line_items.

Demo scope — compressor only unless the ingested subset already contains another category that we want in the demo flow. Spot-check the DB during S7 (SELECT DISTINCT category FROM quote_line_items … once the markdown is extracted, or eyeball the line-item markdown) to decide whether to ship a second partial (Henry Ford is noted as "vacuum" in the session handoff — confirm and either add vacuum.md or leave Henry Ford out of the demo golden path).

Compressor characteristics block (first cut — extract verbatim shape from 4M reference during S7):

[CATEGORY IN CAPS]

CHARACTERISTICS
Manufacturer:    [value]
Series / Model:  [value]
Cooling:         [Air/Water]
Pressure:        [PSI]
Capacity CFM:    [value]
Electric Motor:  [HP, RPM, ODP, SF]
Voltage:         [value]
Drive System:    [value]
Dimensions:      [L x W x H]
Weight:          [lbs]

[Warranty block — manufacturer-specific]

Model [X] as described above    Net $[PRICE] Each
Delivery: [estimated timeframe]

CC reads the outer skeleton plus the matching characteristics partial(s) when assembling a draft. This is the "house template" contract.

Data Model Changes

None — all queries are read-only against E1's data.


API / Interface Changes

Each MCP server:

  • Exports via stdio transport (simplest; CC spawns as child process)
  • Uses @modelcontextprotocol/sdk with standard tool registration
  • Loads DATABASE_URL and OPENAI_API_KEY from app/.env.local at startup — same source of truth as the ingestion CLI and Next.js app. Avoids re-specifying secrets in the CC config and avoids depending on CC's shell env.
  • Ships as an npm workspace package under mcp-servers/* (local dev only — not published)

Env-loading pattern matches the ingestion CLI (which already solved this): use dotenv preload with DOTENV_CONFIG_PATH pointing at app/.env.local. CC config references the compiled entrypoint plus the dotenv preloader; the path to .env.local is absolute (CC's mcp.json does not interpolate workspace variables).

CC config example (replace <repo-root> with the absolute path to ~/Documents/Code/quoteai):

{
  "mcpServers": {
    "quoteai-equipment": {
      "command": "node",
      "args": ["-r", "dotenv/config", "<repo-root>/mcp-servers/equipment/dist/index.js"],
      "env": { "DOTENV_CONFIG_PATH": "<repo-root>/app/.env.local" }
    },
    "quoteai-quotes": {
      "command": "node",
      "args": ["-r", "dotenv/config", "<repo-root>/mcp-servers/quotes/dist/index.js"],
      "env": { "DOTENV_CONFIG_PATH": "<repo-root>/app/.env.local" }
    }
  }
}

S8 ships a short README snippet walking through the absolute-path substitution so first-time demo setup isn't a scavenger hunt.


Edge Cases & Gotchas

ScenarioExpected BehaviorWhy It's Tricky
Query with no matchesReturn empty array, not errorCC should be able to distinguish "no results" from "tool broken"
Query exceeds top_k available rowsReturn whatever exists; don't padSmall DB (curated subset) will hit this often
Concurrent queries from CCBoth servers must handle via connection poolCC may fan out queries in parallel
Embedding API outagesearch_* tools return error with clear messageCC should degrade gracefully — fall back to keyword match if possible (post-MVP)
Model field mismatch in get_productReturn 404-style error, not empty resultCC needs to know the model doesn't exist vs. server issue
Very long description blocks (> 2k tokens)Return as-is; let CC decide how to truncateSome 4M-style quotes have dense installation blocks

Testing Strategy

Test Layers

LayerApplies?Notes
Unit testsYesQuery builders, filter composition, error handling
Integration testsYesSpin up a test Postgres with fixture data; call each tool; verify shape + content
Golden retrieval test (from E1)YesSame golden test — does search_line_items return expected items for the golden query?
E2E with CCYesConfigure CC to load both servers; run a demo quote flow manually; does it assemble correctly?

Required Fixtures

Fixture NameWhat It TestsPriority
fixtures/mcp-golden-query.test.tssearch_line_items("100HP oilless food grade") returns Groeb/4M/Powerex in top 5🔴 High
fixtures/mcp-get-product.test.tsget_product("QMB30") returns complete spec🔴 High
fixtures/mcp-empty-results.test.tsNonsense query returns [], not error🟡 Medium

Verification Rules

  1. Every tool has at least one integration test against seeded DB.
  2. Golden retrieval test runs green before demo.
  3. Manual E2E walkthrough before showing anyone — open CC, fill mock form, watch it call tools, inspect output against the 4M template.

Stories

Stories are split into two phases. Phase A is the vertical slice — the minimum set that answers "can CC assemble a good draft from these tools?" If Phase A fails, Phase B is noise; we'd re-scope. If Phase A passes, Phase B fills in breadth.

Phase A — Vertical slice (demo-path minimum):

StorySummaryStatusPR
S3@quoteai/quotes scaffold — MCP SDK, stdio transport, DB client, app/.env.local loader
S5@quoteai/quotes search_line_itemsthe critical one, vector-only
S7templates/brehob-quote.md outer skeleton + templates/characteristics/compressor.md partial (extracted from 4M reference)
S8CC config wiring (absolute paths + DOTENV_CONFIG_PATH) + manual E2E demo walkthrough against golden scenario ("100HP oilless compressor for food-grade plant")

Phase A explicitly skips the equipment server. If the spike shows CC needs product-spec lookups to fill template fields cleanly, we promote S0 + S2 out of Phase B before finishing Phase B's other work.

Phase B — Breadth (fills in after Phase A demonstrates the loop):

StorySummaryStatusPR
S0@quoteai/equipment scaffold — MCP SDK, stdio transport, shared DB client / env loader pattern from S3
S1@quoteai/equipment search_equipment implementation + filter-then-rank hybrid
S2@quoteai/equipment get_product + get_specs
S4@quoteai/quotes search_past_quotes
S6@quoteai/quotes get_quote

Known Issues / Tech Debt

IssueSeverityNotes
No rate limiting on server-side🟢 Lowstdio = single client (CC) — not a risk
No caching on query embeddings🟡 MediumSame query gets re-embedded every call. Add LRU cache in Full MVP.
No observability (metrics, tracing)🟡 MediumAdd once we're iterating on retrieval quality and want to see what CC is actually asking
Hybrid search ranking is naive🟡 MediumMay need RRF or learned-to-rank. Evaluate post-demo.

Risks

RiskLikelihoodImpactMitigation
Retrieval quality insufficient for good draftsMedium🔴 HighGolden test is the gate. Iterate on query construction + hybrid filters until it passes.
CC doesn't use the tools well (hallucinates instead)Medium🔴 HighTest the demo flow against the golden scenario manually before showing John. Adjust tool descriptions (the description field in MCP tool registration) to nudge CC toward calling them.
MCP SDK version incompatibility with CCLowMediumPin @modelcontextprotocol/sdk version; test locally before demo
Postgres connection exhaustion under concurrent CC callsLowLowUse pg-pool with a small cap; stdio = one client anyway
Embedding-based filtering is too loose (returns unrelated items)Medium🟡 MediumStructured filters + a similarity threshold (score > 0.7) to drop weak matches

Decisions Log

DateDecisionRationaleAlternatives Considered
2026-04-18Two independent servers (equipment + quotes)Matches design doc; separation of concerns; smaller tool surfaces per serverOne server with all tools (rejected: scope creep, messier tool list)
2026-04-18stdio transport for DemoSimplest; CC spawns as child processSSE (rejected: premature for local demo)
2026-04-18Hybrid search (vector + structured filter) as defaultPure vector misses hard constraints like HP rangePure vector (rejected: tested mentally against golden scenario, would miss items)
2026-04-18search_line_items is the primary retrieval toolMatches E1's "line items are the atomic unit" decisionsearch_past_quotes as primary (rejected: too coarse for description language)
2026-04-184M template extracted into templates/brehob-quote.mdCC needs an explicit target formatInline in system prompt (rejected: hard to iterate, unversioned)
2026-04-18No MCP write toolsPer design doc — MCP is retrieval onlyWrite tools for quote_log (rejected: Full MVP scope)
2026-04-18Defer BM25/RRF hybrid to post-golden-testYAGNI until the golden test shows the gapBuild BM25 upfront (rejected: premature optimization)
2026-04-20get_descriptions scrappedThin wrapper over search_line_items with a different input shape; two similar tools raises the odds CC picks the wrong oneKeep as convenience (rejected: not pulling its weight)
2026-04-20All structured numeric filters are min/max pairs on search_equipmentConsistency over cleverness — CC passes hp_min=100, hp_max=100 for exact matches, ranges when it has themMixed scalars/ranges (rejected: asymmetric surface); hp + internal tolerance (rejected: hides the control from CC)
2026-04-20Filter-then-rank when any structured filter presentHard constraints ("exactly 100HP food-grade") can't be lossy. Rank-then-filter risks dropping qualifying rows that fell outside the N nearest neighborsRank-then-filter with 3× buffer (rejected: only works for soft filters, which isn't the golden-scenario shape)
2026-04-20search_line_items is vector-only for MVPquote_line_items has no structured spec columns (confirmed in 001_init.sql); Haiku extracts specs into description text, not typed fields; product_id FK isn't populated by the loader so JOIN-based filter is blocked tooExtend schema + re-extract (rejected: too much E1 churn for MVP); add category filter (deferred as post-demo follow-up — cheap since LineItemSchema.category is already extracted, just not stored)
2026-04-20Vertical-slice story ordering (Phase A = S3/S5/S7/S8, Phase B = the rest)The biggest unknown is "can CC assemble a good draft?" — we answer that with the minimum viable loop before building breadth. If Phase A fails, everything else is wastedBreadth-first (rejected: delays the feedback that most changes the design)
2026-04-20Env loading via dotenv/config preload pointing at app/.env.localSingle source of truth with ingestion CLI + app; no secrets in ~/.claude/mcp.json; no dependence on CC's shell envInline env block in mcp.json (rejected: secret sprawl, painful to rotate); shell env (rejected: fragile across CC restarts)
2026-04-20Per-category template partials (outer skeleton + characteristics/<category>.md)Different equipment categories have genuinely different spec sets (compressor CFM/PSI vs vacuum inHg/ACFM vs dryer dewpoint); one monolithic template would either over-constrain or under-constrain. Reuses the LineItemSchema.category taxonomyOne monolithic template with CC adapting (rejected: CC ends up guessing field names across categories); Zod-per-category (rejected: overkill for MVP)
2026-04-20Demo scope = compressor category onlyIngested subset is compressor-heavy; ship vacuum/dryer/blower partials only when those categories appear in the golden flowShip all categories upfront (rejected: premature for demo)
2026-04-21dotenv imported inside src/index.ts, NOT via node -r dotenv/config preload (commit 6338ff0)pnpm doesn't hoist dotenv to the repo root; -r resolves from cwd's node_modules so the preload phase fails when CC spawns the server from the project root. File-relative ESM resolution via in-module import reaches the package's own node_modules through pnpm's symlink treeHoist dotenv via shamefully-hoist (rejected: heavy-handed for one module); CWD manipulation in the launch args (rejected: fragile)
2026-04-21Pool sets ivfflat.probes = 20 via Postgres startup -c option (commit 3c2ea17)001_init.sql built ivfflat with lists=20 for eventual thousands-of-rows scale, but MVP corpus is <100 rows per table. Default probes=1 can land the query vector in an empty list and return 0 candidates even when matching data exists (actually observed for search_line_items and search_past_quotes on semantically distant queries). -c at connection startup is race-free, unlike pool.on('connect') which lets the client be checked out before the async SET completes. Follow-up: drop or reduce this setting once ingestion grows past the lists count, otherwise probes=20 scans all lists and negates the index's pruning valuepool.on('connect') with SET (rejected: race condition, pg deprecation warning); per-query SET LOCAL in a txn (Agent B's first attempt; rejected: doesn't cover search_line_items + search_equipment); rebuild indexes with smaller lists (deferred: touches 001_init migration)
2026-04-21Inline draft conventions formalized in outer template (commit 5b369c0) — emerged organically in Phase A CC walkthrough, promoted to house styleThree patterns CC invented without prompting turned out to be genuinely useful: (1) SALESPERSON REVIEW blockquote for retrieval gaps / scope notes needing scrubbing / mismatched analogs; (2) attribution notes above verbatim blocks with source customer + quote number + date + score; (3) end-of-draft retrieval summary table. Formalizing them in the template header makes the behavior robust across models (validated later in Sonnet run — same conventions appeared)Let each model discover them (rejected: Sonnet might not; comparison showed model-variant deltas that templates resolve better than prompt engineering)
2026-04-21Added lubrication enum to products + search_equipment filter (commit 7f4ae10)Phase B walkthrough exposed that Q$ync 100 (oil-flooded) matched "100HP oilless food-grade" at the top of the catalog because the ingested text has no explicit lubrication signal. Structured filter enforces a hard constraint CC can't bypass. Values: oilless / oil-free / oil-flooded / null. Products with NULL lubrication are EXCLUDED from filtered results — "unknown" doesn't fake a match. Extractor prompt bumped to v2 to populate on future ingest; existing 3 rows backfilled via UPDATE (all confirmed oil-flooded by manual inspection of their spec sheets).Re-rank penalty based on query terms (rejected: unreliable; "oilless" as a query keyword can't differentiate in-source vs not-in-source); ignore the gap (rejected: golden-scenario blocker)
2026-04-21Confidence bands on search hits with fixed thresholds (commit 3c79efb)A flat score distribution across bad retrievals is indistinguishable from a flat distribution with one good match. The band label lets CC read "all my top-K are probable_miss" as a retrieval failure signal without threshold-guessing. Thresholds: ≥0.7 likely_good, 0.4–0.7 analog, <0.4 probable_miss. Calibrated from observed scores on the current corpus. Follow-up: retune as ingestion grows and scores tighten; keep the 3-tier shape stable so downstream (template Status column, eventual UI) can depend on the vocabulary.Top-level confidence only (rejected: per-hit is strictly more informative); dynamic thresholds from current corpus percentiles (rejected: thresholds must be stable across runs for the labels to mean the same thing)
2026-04-21Sonnet validated for the Full MVP generation role (commit fca3b7c — see demo/phase-b-model-comparison.md)Dual-model walkthrough post-improvements: Sonnet matched Opus on lubrication filter use, confidence consumption, verbatim preservation (incl. source typos), price discipline, template order, and retrieval gap handling with SALESPERSON REVIEW blockquotes. Deltas were cosmetic (Opening block spacing, silent omission of oil/water separator on oilless systems) and both were addressed by template polish in commit 4258713. The design doc's "Assembly (Full MVP): Anthropic Sonnet via API" is no longer an assumption.Stick with Opus-only (rejected: Full MVP economics favor Sonnet at this quality level); defer validation to Full MVP build time (rejected: cheaper to know now, commit to the API shape earlier)

E2 is the bridge. Shipping means CC can call these tools, get good results, and assemble a quote that looks like the 4M template. Nothing else.

Status (as of 2026-04-21): E2 is closed. All 9 stories shipped; both phase walkthroughs passed; two follow-ups (#1 lubrication, #6 confidence) landed from Phase B feedback; Sonnet validated for Full MVP. Deferred Phase B walkthrough findings (#3 category filter, #4 markdown rendering, #5 installation scope, #7 data gap, #8 get_product utility) are captured in next.md as post-demo pickups.

Review

🔒

Enter your access token to view annotations