Foundry Foundry

Epic: Consumable API + API-Key Auth

Roadmap item 2 of brehob-launch · the seam both QuoteAI and dev-memory consume · lands between items 1 and 4 · → North Star B3/B6 (DB-4).

Status: red/blue-team COMPLETE 2026-06-18 (session 4) — two independent red-team agents + interactive blue-team triage; resolutions folded in below. CLEARED TO BUILD. Supersedes the 6/11 S1–S5 sketch. Grounded against the real repo — the libraries+connectors RBAC spine and the @autri/retrieval operators already exist, so this epic extends them, it does not greenfield.

Objective

Autri's first external programmatic surface: retrieve from and write to knowledge bases over HTTP, authenticated by a static API key, library-scoped from day one. It is the seam the /stop dev-memory dogfood (write transcripts + recall prior decisions via the CLI) and QuoteAI consume, and the surface a future MCP server wraps without a rewrite. Guard: REST + static API keys ONLY — MCP stays cut (D56); the MCP wedge returns post-go-live on top of this same surface.

What already exists (extend, don't build)

Verified 2026-06-18 against the repo:

  • RBAC spinelibraries / library_kbs / library_access (org-scoped; a settings UI already creates libraries, adds KBs, grants access). db/migrations/006_library_connector.sql, app/lib/data/libraries.ts.
  • A credential-bound-to-a-library primitiveconnectors: issue/revoke, hashed secret, last_used_at, revoked_at, audit log (mcp_audit_log). It is Cognito OAuth client-credentials (JWT bearer), verified in mcp-servers/doc-search/src/auth.ts + server.ts. Note (red-team): connector issuance is CLI/script-only — there is NO connector settings UI to "sit beside"; the only settings UI is for libraries.
  • The retrieval operators, cleanly factoredvector_search / fts_search / lookup_section / filter_then_rank with Zod input schemas + a typed EnrichedHit output, plus the MCP-only list_knowledge_bases / get_document. retrieval/src/agent/tool-defs.ts, retrieval/src/types.ts, retrieval/src/agent/types.ts. Note (red-team): the operators take an OPTIONAL knowledgeBaseId and fan out across all in-scope KBs when omitted (tool-defs.ts per-KB loop); filter_then_rank is a per-KB factory keyed to the KB's declared attribute schema, not a static contract.
  • The scope chokepointgetKbScopeForConnector(connectorId) → kbIds (one INNER JOIN via library_kbs), called on every live MCP tool dispatch (retrieval/src/scope.ts, server.ts). Do NOT mutate it (CA-9).
  • The write surface — presigned-upload (app/app/api/kb/[kbId]/upload-url/route.ts, creates a pending doc + fires the S3→SQS→prep pipeline; accepts pdf/docx/xlsx/xls only) and atomic document-delete (app/app/api/kb/[kbId]/documents/[docId]/route.ts + app/lib/documents/delete-document.ts). Both are Cognito-SESSION-coupled (getCurrentUser()redirect() on no session; scope by user.organizationId) — so reuse = refactor, not call (CA-9).

Decisions (locked 2026-06-18, session 4; CA-8…CA-11 added by red/blue-team)

  • CA-1 — Static API keys, a new api_keys table beside connectors; NOT the Cognito-OAuth connector. Format autri_sk_<keyid>.<secret> (256-bit secret; delimiter . so it can't collide with the base64url key alphabet — red-team L2); store SHA-256(secret) + the non-secret keyid lookup handle + a display last4; verify by parse-keyid → fetch → hash → constant-time compare → check revoked/expired; show the full key once. Why: D56 pivoted to API-key auth precisely to escape the OAuth/token-exchange path; a static bearer is curl- and skill-simple. Hash: SHA-256 — high-entropy keys aren't brute-forceable, so a slow hash (argon2) only adds ~50–100ms/call for no benefit; SHA-256 + keyid-lookup is the per-request standard (Stripe/GitHub-style).
  • CA-2 — Access: read is library-wide, write is a per-KB allowlist on the key (default empty = read-only), re-intersected at request time. A key grants read to every KB in its bound libraries; write only to the explicit KB(s) pinned on it. The write set is recomputed every request as api_key_write_kbs ∩ (current read set) — the stored allowlist is a restriction, never a standing grant, so a KB leaving the key's library immediately drops write (red-team H1; mirrors scope.ts's recompute-every-call discipline). Why: the library is the access unit (D13), but a write-capable key carries least privilege so it can't mutate a shared KB — the /stop key writes ONLY the dev-memory KB.
  • CA-3 — Surface = read building blocks + document CRUD; NO black-box "ask a question" endpoint in v1. Expose the read operators + list_knowledge_bases + get_document + document CRUD + ingest-status. Red-team hardening: knowledgeBaseId is REQUIRED on every read endpoint (no implicit cross-KB fan-out over HTTP — bounds cost/latency, prevents an amplification DoS on the embedding bill); responses are an explicit /v1 DTO, not the raw internal EnrichedHit (so a retrieval refactor can't silently break the public contract); filter_then_rank over REST is DEFERRED to v1.1 (it's a per-KB factory needing an attribute-discovery endpoint; dev-memory recall doesn't need numeric filtering). Shapes still mirror tool-defs.ts so a future MCP server is a thin wrapper.
  • CA-4 — Lives at durable /api/v1/* routes in the existing Next.js app, calling retrieval in-process; NOT server actions. Why: server-action IDs drift per build (stale consumers 404); API routes survive deploys. Reuse the in-process @autri/retrieval behind a clean module boundary so it can split out later.
  • CA-5 — Consumption = one typed client + a thin CLI bin over it + a skill that bundles the CLI. Why: the most seamless way for AI agents (Claude Code sessions + the /stop//start loop) to use the API without MCP — and the dev-memory recall path reads over HTTP via this CLI. QuoteAI/the hook import the same client. Cut (blue-team): "reference docs for a cold external consumer" shrinks to a 1-page README (no external consumer in the window).
  • CA-6 — Upload via the presigned-URL flow (reuse the web path). Why: respects the 6MB Lambda request cap (bytes stream S3-direct) and reuses the pipeline trigger. Scope (blue-team): the write seam is proven with an already-supported type (docx/pdf); markdown upload + transcript ingestion ride the dev-memory epic (item 3), not this one.
  • CA-7 — Shared-KB write semantics acknowledged; KB-ownership isolation deferred. A library is an access lens, not a copy: a write to a KB shared into two libraries is visible in both (intra-org, by design); cross-org is hard-isolated by D13 and cannot happen. True per-library isolation is a future model change — revisit only if real overlapping-collaborator editing appears.
  • CA-8 (red-team C1) — /api/v1/* gets its OWN CloudFront behavior that forwards the credential header. The Main Lambda's origin-request policy is a 10-header allowlist at CloudFront's cap and deliberately excludes Authorization (autri-infra/lib/web/cdn.ts) — the exact class of strip that bit us with Next-Action. Left as-is, the key never reaches the Lambda (401 for everyone in prod, green locally). A dedicated /api/v1/* behavior + origin-request policy forwards the credential (and lets us set API-appropriate CORS). This is S0 — before any endpoint code; the prod gate is a real cloud request, not a local pass.
  • CA-9 (red-team C2/H5) — "Reuse" = share an inner scope-parameterized core + a SIBLING resolver; never call the session handlers or mutate the connector resolver. Refactor the org/KB-access check out of upload-url/delete-document into a core that takes a resolved {organizationId, readKbs, writeKbs} scope (the session route passes session-derived scope; the API route passes key-derived scope), and add a new getKbScopeForApiKey resolver (reading api_key_librarieslibrary_kbs + the CA-2 write-intersect) leaving getKbScopeForConnector byte-identical (it's the live MCP security chokepoint). Exclude /api/v1/* from the NextAuth middleware redirect (the x-origin-secret check still guards it).
  • CA-10 (red-team H2) — Per-key rate-limit IS in v1, as a Postgres fixed-window counter (no new infra). A (api_key_id, window) row, atomic INSERT … ON CONFLICT DO UPDATE … RETURNING count, checked before the embedding call → 429 over cap. Correct under concurrency (unlike an in-memory counter); generous default + nullable per-key rate_limit_per_min override. Why: each vector_search is a billable OpenAI call on one shared key; a runaway consumer must be brakeable. Fixed-window's edge-burst is acceptable for "stop an accident." (An in-memory guard would be a no-op under concurrency — that's the thing we're NOT doing.)
  • CA-11 (blue-team) — Build a PRIMITIVE key-issuance settings UI now (rough draft, iterate later). List keys · issue (show-once-secret modal) · revoke · write-KB allowlist picker (constrained to the key's readable set). It's net-new frontend (no connector UI exists to extend), but it's needed eventually and a rough version now beats a CLI-only stopgap.

Data model (new)

  • api_keysid, key_id (unique, indexed lookup handle), secret_hash (sha256), last4, name, organization_id (denormalized for the org assert), rate_limit_per_min (nullable → default), created_at, last_used_at, revoked_at, expires_at (nullable).
  • api_key_libraries — (api_key_id, library_id); the libraries the key may read. Many-to-many → "one or more libraries."
  • api_key_write_kbs — (api_key_id, knowledge_base_id); the KBs the key may write. Re-intersected with the live read set each request (CA-2).
  • api_key_usage — per-key call log (api_key_id, ts, endpoint, kb_id, embedding_cost_usd, latency_ms) for COGS attribution + the fixed-window rate counter (CA-10).
  • New numbered migration (the db-migrate Lambda hard-errors on checksum changes to applied files); additive/nullable.

Surface (endpoints — all under /api/v1, credential via the CA-8 forwarded header)

Retrieval (read; knowledgeBaseId REQUIRED; return a /v1 DTO):

  • POST /retrieve/vector · POST /retrieve/fts · POST /retrieve/lookup. (filter_then_rank → v1.1, CA-3.)
  • GET /kbs (→ list_knowledge_bases, key-scoped) · GET /documents/{id} (→ get_document, optional includeChunks; scope-filtered → 404 if out of read scope).

Documents (CRUD; write-gated by the re-intersected allowlist; by-id endpoints 404 on out-of-scope, not 403):

  • GET /kbs/{kbId}/documents — list, cursor-paginated.
  • POST /kbs/{kbId}/upload-url — presign (shared core) → {documentId, uploadUrl, …}; Idempotency-Key dedupes the pending-doc creation; the consumer PUTs bytes to S3; the pipeline fires.
  • GET /documents/{id}/status — ingest status (pending → ingesting → ready/failed); new, and the dogfood's load-bearing endpoint.
  • DELETE /documents/{id} — atomic delete (shared core), write-scoped.

Errors: { error: { code, message } } with a defined code enum; 401 (bad/missing key), 403 (read-only on write), 404 (out of scope / not found), 409 (idempotency), 422 (validation), 429 (rate limit, Retry-After). CORS: server-to-server only (no Access-Control-Allow-*) — stated, not silent.

Auth & enforcement

  • A request helper parses the bearer key, verifies it (CA-1), resolves scope via getKbScopeForApiKey{organizationId, readKbs, writeKbs} (CA-2/CA-9), enforces the CA-10 rate window, then runs the endpoint; last_used_at + the usage/audit row are fire-and-forget / post-response (never block or fail a good call). Org is the absolute boundary (D13).
  • Isolation negative-tests are inline DoD on S2 + each endpoint (not a trailing story): no read/write outside the key's libraries; no write outside the re-intersected allowlist; read-only key 403'd on write; cross-org denied regardless of libraries; and the dogfood's actual trust case — a key bound to library A gets 404/empty on the restricted dev-memory library's KBs in the same org (red-team M4).

Consumption

  • @autri/api-client — a thin typed client over the REST contract; the upload helper replays the presign's exact pinned Content-Type/Content-Length and surfaces S3 signature failures as a typed error (the iOS-pin bug class, red-team L2).
  • CLI bin (autri search|get|upload|status|delete) over the client; the upload command orchestrates presign→PUT→poll-status.
  • Skill bundling the CLI so agents call it via Bash; a 1-page README (external reference docs deferred).

Work breakdown (dependency-ordered; status/write before read; tests inline)

  • S0 — CloudFront /api/v1/* behavior + credential forwarding (CA-8). The deploy gate; validated with a real cloud request.
  • S1 — Schema + key gen/verify + primitive issuance UI (load-bearing; hand-write). The migration; autri_sk generate; SHA-256+keyid store/verify (constant-time); the CA-11 settings UI.
  • S2 — Auth helper + scope resolver + shared scope-core refactor (load-bearing; hand-write). Bearer parse/verify; getKbScopeForApiKey (sibling, CA-9) with the CA-2 re-intersect; refactor upload-url/delete-document to the scope-parameterized core; middleware exclusion; CA-10 rate window. Isolation negative-tests land here.
  • S3 — Document write path (the dogfood's real need): upload(presign) / status / delete, write-gated. Built BEFORE read (red-team M1).
  • S4 — Read endpoints: vector / fts / lookup + kbs + documents/{id}, knowledgeBaseId-required, /v1 DTO; embedding-cost logged. Factor a single runRetrieval(toolName, input, scope) envelope in @autri/retrieval so chat/MCP/REST don't drift (red-team H3).
  • S5 — Typed client + CLI + skill + README.

Acceptance

  • Isolation negative-tests green (incl. the restricted-library denial) · a write smoke issues a write-scoped key → uploads a supported-type doc → polls /status to ready · a read smoke retrieves through the API with parity to in-process · the rate-limit returns 429 over cap · Authorization/credential reaches the Lambda in deployed prod (CA-8), not just locally.

Out of scope (v1)

filter_then_rank over REST + its attribute-discovery endpoint (→ v1.1) · the dev-memory ingestion pipeline incl. markdown upload (its own epic — this proves the seam) · external-consumer reference docs (README only) · admin-over-API (membership management stays in the UI) · the MCP server (post-go-live wrapper) · multipart/streaming upload · KB-ownership isolation (CA-7).

References

DB-4 (consumable API, dogfood-first), D56 (MCP cut → API-key auth — this epic IS that), D13 (row-level org tenancy). Prior art: mcp-servers/doc-search/src/{server,auth,tools}.ts (verify + scope + the runTool envelope to factor), retrieval/src/scope.ts (the resolver to clone, not mutate), retrieval/src/agent/tool-defs.ts (the operator contract to mirror), app/app/api/kb/[kbId]/upload-url/route.ts + app/lib/documents/delete-document.ts (session-coupled write surface to refactor into a shared core), autri-infra/lib/web/cdn.ts (the CloudFront header allowlist — CA-8). Full session record: decisions.md 2026-06-18 (session 4).

Review

🔒

Enter your access token to view annotations