EPIC-3: MCP Server (Streamable HTTP)

Drafted 2026-05-19. Beta-sprint epic 3 of 5. Sequencing: Week 1 Days 3-4 + 7 (local-first). Runs in parallel with EPIC-1; depends on EPIC-2 schema.

Goal

Adapt mcp-servers/doc-search from stdio transport to Streamable HTTP transport (MCP spec 2025-03-26+), with OAuth 2.1 scope enforcement via the Library/Connector model. End state: Dan can install a local MCP connector into Claude Desktop and query his novel KB with proper scope enforcement.

Why this epic exists

This is the "does the MCP-as-infrastructure wedge work?" gate. If Claude Desktop can install our local MCP connector and query a real KB with proper scope, the entire product thesis is validated end-to-end before we put a dollar into AWS deployment.

Scope (in)

Transport:

Stdio → Streamable HTTP (MCP spec 2025-03-26+) — required by AgentCore Runtime per the research finding
Local dev server listens on localhost:8080/c/{connectorId}/mcp (POST for JSON-RPC; text/event-stream for streaming responses + progress notifications)
Production endpoint will be mcp.autri.ai/c/{connectorId}/mcp (AgentCore Runtime convention)
Mcp-Session-Id semantics: server generates on first request, returns in response header, expects client to echo on subsequent requests, persists across the session

HTTP framework + MCP SDK:

HTTP framework: Hono (lightweight, modern, edge-runtime friendly if we ever go that way)
MCP SDK: @modelcontextprotocol/sdk (Anthropic official) — pin specific version supporting Streamable HTTP; document version in package.json + EPIC-3 notes
Local HTTPS: try plain HTTP first (Day 3 verify against Claude Desktop). If Claude Desktop rejects, add mkcert + local CA (~30 min). Don't pre-build complexity.

OAuth 2.1 + PKCE scaffold:

Local dev: JWT shim for fast iteration — HS256 with shared secret in .env.dev
Dev user mapping: shim reads DEV_USER_ID=<uuid> from .env.dev (Dan sets this after EPIC-2 backfill assigns him a user_id). Shim signs JWTs with that sub claim.
Token issuance: pnpm dev:make-token CLI script outputs a JWT to stdout. Dan copies, pastes into Claude Desktop config.
Production-ready interface: swap-able auth layer that becomes Cognito JWKS verification in EPIC-4. Interface: verifyToken(token: string): Promise<TokenClaims> — dev impl uses HS256, prod impl uses Cognito JWKS.
Token validation middleware: JWKS cache (prod) / shared-secret verify (dev), expiry check, audience check
Connector resolution from URL path segment {connectorId} (defense-in-depth check: connector.user_id === token.sub AND connector.revoked_at IS NULL)

Scope enforcement at every tool call:

Use getKbScopeForConnector(connectorId) from EPIC-2 to constrain queries
Tool calls return ONLY results from KBs in the connector's library
Defense-in-depth: scope helper handles revoked connectors AND deleted libraries (returns [])

connectors.last_used_at write:

Updated on every successful auth (before tool dispatch). Per EPIC-2 boundary: EPIC-3 owns this write.

Audit events for tool calls (to shared mcp_audit_log table from EPIC-2):

Every tool call writes an event with event_type='tool_call.{tool_name}', connector_id=<connectorId>, metadata={tool, kb_scope, query, result_count, status, latency_ms}
Failed tool calls write event_type='tool_call.{tool_name}.error' with metadata.error_kind populated
~1ms overhead per call; critical for future audit dashboard, abuse detection, cost analysis

Tool surface (v1):

search_knowledge_base(query: string, kb_id?: string) — vector + FTS hybrid search across library's KBs. Optional kb_id filter.
lookup_section(section_id: string) — direct lookup by section/rule ID.
list_knowledge_bases() — list KBs available in this connector's library.
get_document(doc_id: string) — fetch full doc metadata + chunks.

Response strategy: buffer + progress notifications

Run DB query, collect chunks, emit single tool result (buffered)
For slow tools (>1s estimated): emit notifications/progress mid-execution ("Searching novel KB...", "Found N chunks, formatting...")
Status messages appear in Claude Desktop's tool-call UI; LLM's reply still streams natively after the tool result lands
Implementation budget: 30 min per tool on top of the base buffer pattern

Error response format:

HTTP status codes: 401 (missing/invalid/expired token), 403 (token valid but connector revoked / authz failure), 404 (connector not found), 500 (unexpected)
JSON-RPC error envelope per MCP spec for tool-level failures
Failed tool calls write audit events with error_kind for diagnostics

Local end-to-end validation (Day 7):

Dev seed script: pnpm dev:seed-connector — creates a known connector with known credentials (computes argon2 hash, inserts row, prints connector_id + secret). Avoids hand-computing hashes.
Install local MCP connector config in Claude Desktop using seeded credentials + dev JWT from pnpm dev:make-token
Run each of the 4 tools, verify scope enforcement, verify progress notifications render, verify audit events written

Out of scope

AgentCore Runtime production deploy (EPIC-4)
Connector creation UI (EPIC-2 — this epic consumes EPIC-2's connectors)
Per-tool rate limiting (deferred to v1.1)
Audit logging dashboard UI (events written here AND in EPIC-2 — UI deferred to post-beta)
Token introspection caching (just use JWKS verification + DB lookup)
True mid-stream chunk streaming within tool results (Claude Desktop buffers tool results anyway — limited UX benefit; revisit if a real use case surfaces)
CORS configuration (Claude Desktop is server-side, not browser — not needed for v1)

Dependencies

EPIC-2 — connectors, libraries, library_kbs, mcp_audit_log tables + getKbScopeForConnector helper must exist
Existing retrieval primitives in @autri/retrieval (vector-search, fts-search, lookup-section — already working)
@modelcontextprotocol/sdk version that supports Streamable HTTP transport (MCP spec 2025-03-26+) — verify + pin SDK version on Day 3
Hono framework — add to dependencies
node-argon2 (or @node-rs/argon2 per EPIC-2 risks) — already added by EPIC-2; reused here for verify
@autri/db for the connector lookup + audit-log write
Dan's user_id from the post-backfill DB — set as DEV_USER_ID in .env.dev

Deliverables

MCP server running locally on Streamable HTTP at localhost:8080
Working OAuth scaffold (dev shim) with swap-able interface for production Cognito
All 4 tools implemented + tested
Claude Desktop config snippet documented in the epic notes
End-to-end demo: Dan queries his novel KB from Claude Desktop with proper scope enforcement

Implementation plan

Day 3 — Transport adaptation + dev tooling

Verify @modelcontextprotocol/sdk version supports Streamable HTTP (upgrade + pin if needed). Document pinned version.
Scaffold Hono server:
- POST /c/:connectorId/mcp for MCP RPC
- Response: JSON for short results, text/event-stream for streaming + progress notifications
- Mcp-Session-Id generation + return in response header
Route handler skeleton: parse JSON-RPC request, dispatch to tool by method name
Write pnpm dev:make-token CLI script: reads DEV_USER_ID from .env.dev, signs HS256 JWT, prints to stdout
Write pnpm dev:seed-connector CLI script: creates a connector with known credentials, computes argon2 hash, inserts row, prints connector_id + plaintext secret
Stub tool implementations returning placeholder data
Verify with raw curl against localhost:8080: tools/list returns the tool surface
Local HTTPS check: attempt Claude Desktop connection with plain HTTP. If rejected, add mkcert + local CA setup.

Day 4 — OAuth + scope enforcement + real tools + audit + last_used_at

OAuth middleware: extract Bearer token, verify JWT (HS256 shared-secret for dev), check expiry + audience, write connector.last_used_at = NOW() on success
Token → user_id extraction from sub claim
Connector resolution: SELECT user_id, library_id FROM connectors WHERE id = ?
Defense-in-depth: assert connector.user_id === token.sub AND connector.revoked_at IS NULL → 403 if mismatch
Wire getKbScopeForConnector into each tool handler
Implement all 4 tools, each with buffer + progress notifications pattern:
- search_knowledge_base → calls @autri/retrieval's vector + FTS hybrid search, scoped to library's KBs
- lookup_section → direct section lookup, scoped check
- list_knowledge_bases → returns KBs from library_kbs join
- get_document → fetch doc + chunks, scope check
Audit event write for every tool call (success + failure)
Unit tests for scope enforcement: cross-library leak attempts, revoked connector access attempts, expired token attempts, deleted library returns empty scope
Unit test: audit event written correctly for success + failure paths

Day 7 — Local end-to-end validation (wedge gate)

Run pnpm dev:seed-connector to create a test connector with known credentials
Run pnpm dev:make-token to mint a dev JWT for Dan's user

Create Claude Desktop MCP config:

{
  "mcpServers": {
    "autri": {
      "url": "http://localhost:8080/c/{connectorId}/mcp",
      "headers": { "Authorization": "Bearer {jwt}" }
    }
  }
}

Restart Claude Desktop, verify it sees the autri tools
Run each tool with realistic queries:
- "What does Chapter 5 of mom's novel say about the locked door?" → search_knowledge_base
- "Show me FIA Technical Reg T-7.2" → lookup_section
- "What KBs do I have access to?" → list_knowledge_bases
- "Fetch document <doc_id>" → get_document
Verify scope enforcement: try a tool call with the wrong connector ID → expect 403
Verify progress notifications: trigger a slow tool, observe status messages in Claude Desktop UI
Verify audit events: SELECT * FROM mcp_audit_log ORDER BY created_at DESC LIMIT 20 shows the recent tool calls with correct metadata
Realistic multi-turn workflow: ask 3-4 follow-up questions about mom's novel, confirm Claude Desktop uses tool results to build the conversation
Mark wedge gate PASSED → proceed to EPIC-4

Risks

@modelcontextprotocol/sdk Streamable HTTP support maturity. Verify the SDK version we use supports it cleanly; if not, may need to implement transport manually. Mitigation: fallback to writing a thin transport layer over the SDK's lower-level primitives.
Claude Desktop MCP client config format for Streamable HTTP may differ from stdio. Check Anthropic's MCP docs for current Claude Desktop config schema. Mitigation: test with a known-working remote MCP server first if our format is unclear.
OAuth shim must be replaceable with real Cognito without rewrites. Mitigation: design auth layer as a verifyToken(token: string): Promise<TokenClaims> interface; dev impl uses HS256, prod impl uses Cognito JWKS.
Port conflict on localhost:8080. Mitigation: make port configurable via env var; default to 8181 if 8080 commonly conflicts.
Performance of getKbScopeForConnector on every tool call. Sub-ms DB lookup with index should be fine, but worth measuring. Mitigation: in-memory cache with 60s TTL if it becomes a hotspot.

Definition of done

Notes / open questions

Locked this triage pass (2026-05-19):

HTTP framework: Hono
MCP SDK: @modelcontextprotocol/sdk (Anthropic official), pinned version supporting Streamable HTTP
Dev shim auth: HS256 JWT, signs sub claim with DEV_USER_ID from .env.dev
Dev token issuance: pnpm dev:make-token CLI
Dev connector seeding: pnpm dev:seed-connector CLI
Streaming strategy: buffer + progress notifications via notifications/progress
connectors.last_used_at write on every successful auth (owned by EPIC-3)
Tool-call audit events written to mcp_audit_log (shared schema from EPIC-2)
Library deletion handled implicitly by getKbScopeForConnector joining through library_kbs (defense-in-depth)
Local HTTPS: try plain HTTP first; mkcert fallback if Claude Desktop rejects
Error format: 401/403/404/500 + JSON-RPC error envelope per MCP spec
Wedge gate: beta-ready definition (all 4 tools + scope + workflow + progress notifications + audit events)
CORS: not needed (Claude Desktop is server-side)
Mid-stream chunk streaming: out of scope (Claude Desktop buffers anyway)

Still open (low-risk, decide during implementation):

Should the MCP server live in mcp-servers/doc-search (current location) or be promoted to a top-level mcp-servers/autri package? Current path is fine; rename if it becomes confusing.
JWT expiry duration in dev: lean 24h (low friction for local dev). Production will be shorter (15 min) per Cognito defaults.
Tool naming convention: search_knowledge_base (snake_case) per MCP examples. Already in scope; just noting.

EPIC-3: MCP Server (Streamable HTTP)#

Goal#

Why this epic exists#

Scope (in)#

Out of scope#

Dependencies#

Deliverables#

Implementation plan#

Risks#

Definition of done#

Notes / open questions#

Review