Foundry Foundry

EPIC-3: MCP Server (Streamable HTTP)

Drafted 2026-05-19. Beta-sprint epic 3 of 5. Sequencing: Week 1 Days 3-4 + 7 (local-first). Runs in parallel with EPIC-1; depends on EPIC-2 schema.

Goal

Adapt mcp-servers/doc-search from stdio transport to Streamable HTTP transport (MCP spec 2025-03-26+), with OAuth 2.1 scope enforcement via the Library/Connector model. End state: Dan can install a local MCP connector into Claude Desktop and query his novel KB with proper scope enforcement.

Why this epic exists

This is the "does the MCP-as-infrastructure wedge work?" gate. If Claude Desktop can install our local MCP connector and query a real KB with proper scope, the entire product thesis is validated end-to-end before we put a dollar into AWS deployment.

Scope (in)

Transport:

  • Stdio → Streamable HTTP (MCP spec 2025-03-26+) — required by AgentCore Runtime per the research finding
  • Local dev server listens on localhost:8080/c/{connectorId}/mcp (POST for JSON-RPC; text/event-stream for streaming responses + progress notifications)
  • Production endpoint will be mcp.autri.ai/c/{connectorId}/mcp (AgentCore Runtime convention)
  • Mcp-Session-Id semantics: server generates on first request, returns in response header, expects client to echo on subsequent requests, persists across the session

HTTP framework + MCP SDK:

  • HTTP framework: Hono (lightweight, modern, edge-runtime friendly if we ever go that way)
  • MCP SDK: @modelcontextprotocol/sdk (Anthropic official) — pin specific version supporting Streamable HTTP; document version in package.json + EPIC-3 notes
  • Local HTTPS: try plain HTTP first (Day 3 verify against Claude Desktop). If Claude Desktop rejects, add mkcert + local CA (~30 min). Don't pre-build complexity.

OAuth 2.1 + PKCE scaffold:

  • Local dev: JWT shim for fast iteration — HS256 with shared secret in .env.dev
  • Dev user mapping: shim reads DEV_USER_ID=<uuid> from .env.dev (Dan sets this after EPIC-2 backfill assigns him a user_id). Shim signs JWTs with that sub claim.
  • Token issuance: pnpm dev:make-token CLI script outputs a JWT to stdout. Dan copies, pastes into Claude Desktop config.
  • Production-ready interface: swap-able auth layer that becomes Cognito JWKS verification in EPIC-4. Interface: verifyToken(token: string): Promise<TokenClaims> — dev impl uses HS256, prod impl uses Cognito JWKS.
  • Token validation middleware: JWKS cache (prod) / shared-secret verify (dev), expiry check, audience check
  • Connector resolution from URL path segment {connectorId} (defense-in-depth check: connector.user_id === token.sub AND connector.revoked_at IS NULL)

Scope enforcement at every tool call:

  • Use getKbScopeForConnector(connectorId) from EPIC-2 to constrain queries
  • Tool calls return ONLY results from KBs in the connector's library
  • Defense-in-depth: scope helper handles revoked connectors AND deleted libraries (returns [])

connectors.last_used_at write:

  • Updated on every successful auth (before tool dispatch). Per EPIC-2 boundary: EPIC-3 owns this write.

Audit events for tool calls (to shared mcp_audit_log table from EPIC-2):

  • Every tool call writes an event with event_type='tool_call.{tool_name}', connector_id=<connectorId>, metadata={tool, kb_scope, query, result_count, status, latency_ms}
  • Failed tool calls write event_type='tool_call.{tool_name}.error' with metadata.error_kind populated
  • ~1ms overhead per call; critical for future audit dashboard, abuse detection, cost analysis

Tool surface (v1):

  • search_knowledge_base(query: string, kb_id?: string) — vector + FTS hybrid search across library's KBs. Optional kb_id filter.
  • lookup_section(section_id: string) — direct lookup by section/rule ID.
  • list_knowledge_bases() — list KBs available in this connector's library.
  • get_document(doc_id: string) — fetch full doc metadata + chunks.

Response strategy: buffer + progress notifications

  • Run DB query, collect chunks, emit single tool result (buffered)
  • For slow tools (>1s estimated): emit notifications/progress mid-execution ("Searching novel KB...", "Found N chunks, formatting...")
  • Status messages appear in Claude Desktop's tool-call UI; LLM's reply still streams natively after the tool result lands
  • Implementation budget: 30 min per tool on top of the base buffer pattern

Error response format:

  • HTTP status codes: 401 (missing/invalid/expired token), 403 (token valid but connector revoked / authz failure), 404 (connector not found), 500 (unexpected)
  • JSON-RPC error envelope per MCP spec for tool-level failures
  • Failed tool calls write audit events with error_kind for diagnostics

Local end-to-end validation (Day 7):

  • Dev seed script: pnpm dev:seed-connector — creates a known connector with known credentials (computes argon2 hash, inserts row, prints connector_id + secret). Avoids hand-computing hashes.
  • Install local MCP connector config in Claude Desktop using seeded credentials + dev JWT from pnpm dev:make-token
  • Run each of the 4 tools, verify scope enforcement, verify progress notifications render, verify audit events written

Out of scope

  • AgentCore Runtime production deploy (EPIC-4)
  • Connector creation UI (EPIC-2 — this epic consumes EPIC-2's connectors)
  • Per-tool rate limiting (deferred to v1.1)
  • Audit logging dashboard UI (events written here AND in EPIC-2 — UI deferred to post-beta)
  • Token introspection caching (just use JWKS verification + DB lookup)
  • True mid-stream chunk streaming within tool results (Claude Desktop buffers tool results anyway — limited UX benefit; revisit if a real use case surfaces)
  • CORS configuration (Claude Desktop is server-side, not browser — not needed for v1)

Dependencies

  • EPIC-2connectors, libraries, library_kbs, mcp_audit_log tables + getKbScopeForConnector helper must exist
  • Existing retrieval primitives in @autri/retrieval (vector-search, fts-search, lookup-section — already working)
  • @modelcontextprotocol/sdk version that supports Streamable HTTP transport (MCP spec 2025-03-26+) — verify + pin SDK version on Day 3
  • Hono framework — add to dependencies
  • node-argon2 (or @node-rs/argon2 per EPIC-2 risks) — already added by EPIC-2; reused here for verify
  • @autri/db for the connector lookup + audit-log write
  • Dan's user_id from the post-backfill DB — set as DEV_USER_ID in .env.dev

Deliverables

  • MCP server running locally on Streamable HTTP at localhost:8080
  • Working OAuth scaffold (dev shim) with swap-able interface for production Cognito
  • All 4 tools implemented + tested
  • Claude Desktop config snippet documented in the epic notes
  • End-to-end demo: Dan queries his novel KB from Claude Desktop with proper scope enforcement

Implementation plan

Day 3 — Transport adaptation + dev tooling

  1. Verify @modelcontextprotocol/sdk version supports Streamable HTTP (upgrade + pin if needed). Document pinned version.
  2. Scaffold Hono server:
    • POST /c/:connectorId/mcp for MCP RPC
    • Response: JSON for short results, text/event-stream for streaming + progress notifications
    • Mcp-Session-Id generation + return in response header
  3. Route handler skeleton: parse JSON-RPC request, dispatch to tool by method name
  4. Write pnpm dev:make-token CLI script: reads DEV_USER_ID from .env.dev, signs HS256 JWT, prints to stdout
  5. Write pnpm dev:seed-connector CLI script: creates a connector with known credentials, computes argon2 hash, inserts row, prints connector_id + plaintext secret
  6. Stub tool implementations returning placeholder data
  7. Verify with raw curl against localhost:8080: tools/list returns the tool surface
  8. Local HTTPS check: attempt Claude Desktop connection with plain HTTP. If rejected, add mkcert + local CA setup.

Day 4 — OAuth + scope enforcement + real tools + audit + last_used_at

  1. OAuth middleware: extract Bearer token, verify JWT (HS256 shared-secret for dev), check expiry + audience, write connector.last_used_at = NOW() on success
  2. Token → user_id extraction from sub claim
  3. Connector resolution: SELECT user_id, library_id FROM connectors WHERE id = ?
  4. Defense-in-depth: assert connector.user_id === token.sub AND connector.revoked_at IS NULL → 403 if mismatch
  5. Wire getKbScopeForConnector into each tool handler
  6. Implement all 4 tools, each with buffer + progress notifications pattern:
    • search_knowledge_base → calls @autri/retrieval's vector + FTS hybrid search, scoped to library's KBs
    • lookup_section → direct section lookup, scoped check
    • list_knowledge_bases → returns KBs from library_kbs join
    • get_document → fetch doc + chunks, scope check
  7. Audit event write for every tool call (success + failure)
  8. Unit tests for scope enforcement: cross-library leak attempts, revoked connector access attempts, expired token attempts, deleted library returns empty scope
  9. Unit test: audit event written correctly for success + failure paths

Day 7 — Local end-to-end validation (wedge gate)

  1. Run pnpm dev:seed-connector to create a test connector with known credentials
  2. Run pnpm dev:make-token to mint a dev JWT for Dan's user
  3. Create Claude Desktop MCP config:
    {
      "mcpServers": {
        "autri": {
          "url": "http://localhost:8080/c/{connectorId}/mcp",
          "headers": { "Authorization": "Bearer {jwt}" }
        }
      }
    }
  4. Restart Claude Desktop, verify it sees the autri tools
  5. Run each tool with realistic queries:
    • "What does Chapter 5 of mom's novel say about the locked door?" → search_knowledge_base
    • "Show me FIA Technical Reg T-7.2" → lookup_section
    • "What KBs do I have access to?" → list_knowledge_bases
    • "Fetch document <doc_id>" → get_document
  6. Verify scope enforcement: try a tool call with the wrong connector ID → expect 403
  7. Verify progress notifications: trigger a slow tool, observe status messages in Claude Desktop UI
  8. Verify audit events: SELECT * FROM mcp_audit_log ORDER BY created_at DESC LIMIT 20 shows the recent tool calls with correct metadata
  9. Realistic multi-turn workflow: ask 3-4 follow-up questions about mom's novel, confirm Claude Desktop uses tool results to build the conversation
  10. Mark wedge gate PASSED → proceed to EPIC-4

Risks

  • @modelcontextprotocol/sdk Streamable HTTP support maturity. Verify the SDK version we use supports it cleanly; if not, may need to implement transport manually. Mitigation: fallback to writing a thin transport layer over the SDK's lower-level primitives.
  • Claude Desktop MCP client config format for Streamable HTTP may differ from stdio. Check Anthropic's MCP docs for current Claude Desktop config schema. Mitigation: test with a known-working remote MCP server first if our format is unclear.
  • OAuth shim must be replaceable with real Cognito without rewrites. Mitigation: design auth layer as a verifyToken(token: string): Promise<TokenClaims> interface; dev impl uses HS256, prod impl uses Cognito JWKS.
  • Port conflict on localhost:8080. Mitigation: make port configurable via env var; default to 8181 if 8080 commonly conflicts.
  • Performance of getKbScopeForConnector on every tool call. Sub-ms DB lookup with index should be fine, but worth measuring. Mitigation: in-memory cache with 60s TTL if it becomes a hotspot.

Definition of done

  • MCP server running at localhost:8080 on Streamable HTTP (MCP spec 2025-03-26+) via Hono + pinned @modelcontextprotocol/sdk
  • Mcp-Session-Id generated, returned, and honored on subsequent requests
  • pnpm dev:make-token CLI script works (reads DEV_USER_ID from .env.dev, outputs JWT)
  • pnpm dev:seed-connector CLI script works (creates connector, computes argon2 hash, prints credentials)
  • OAuth scaffold validates bearer tokens (dev shim with swap-able interface for Cognito JWKS in EPIC-4)
  • Connector resolution + defense-in-depth check works (user_id match, revoked check, deleted-library handling)
  • connectors.last_used_at updated on every successful auth
  • All 4 tools implemented: search_knowledge_base, lookup_section, list_knowledge_bases, get_document
  • Scope enforcement: tool calls return only KBs in connector's library
  • Buffer + progress notifications pattern working for slow tools (status messages visible in Claude Desktop)
  • Audit events written for every tool call (success + failure) to mcp_audit_log with correct metadata
  • Error response format: 401/403/404/500 with JSON-RPC envelope per MCP spec
  • Unit tests pass for scope enforcement edge cases + audit event writing
  • Claude Desktop installs the connector, lists tools, queries successfully
  • Wedge gate (beta-ready definition):
    • All 4 tools called successfully end-to-end
    • Scope enforcement test passed (cross-library leak attempt blocked)
    • One realistic multi-turn workflow completed (3-4 follow-up questions about mom's novel)
    • Progress notifications visible during slow tools
    • Audit events written + queryable
    • → "Yes, MCP-into-Claude-Desktop works." Proceed to EPIC-4.

Notes / open questions

Locked this triage pass (2026-05-19):

  • HTTP framework: Hono
  • MCP SDK: @modelcontextprotocol/sdk (Anthropic official), pinned version supporting Streamable HTTP
  • Dev shim auth: HS256 JWT, signs sub claim with DEV_USER_ID from .env.dev
  • Dev token issuance: pnpm dev:make-token CLI
  • Dev connector seeding: pnpm dev:seed-connector CLI
  • Streaming strategy: buffer + progress notifications via notifications/progress
  • connectors.last_used_at write on every successful auth (owned by EPIC-3)
  • Tool-call audit events written to mcp_audit_log (shared schema from EPIC-2)
  • Library deletion handled implicitly by getKbScopeForConnector joining through library_kbs (defense-in-depth)
  • Local HTTPS: try plain HTTP first; mkcert fallback if Claude Desktop rejects
  • Error format: 401/403/404/500 + JSON-RPC error envelope per MCP spec
  • Wedge gate: beta-ready definition (all 4 tools + scope + workflow + progress notifications + audit events)
  • CORS: not needed (Claude Desktop is server-side)
  • Mid-stream chunk streaming: out of scope (Claude Desktop buffers anyway)

Still open (low-risk, decide during implementation):

  • Should the MCP server live in mcp-servers/doc-search (current location) or be promoted to a top-level mcp-servers/autri package? Current path is fine; rename if it becomes confusing.
  • JWT expiry duration in dev: lean 24h (low friction for local dev). Production will be shorter (15 min) per Cognito defaults.
  • Tool naming convention: search_knowledge_base (snake_case) per MCP examples. Already in scope; just noting.

Review

🔒

Enter your access token to view annotations