Epic: Consumable API + API-Key Auth
Roadmap item 2 of brehob-launch · the seam both QuoteAI and dev-memory consume · lands between items 1 and 4 · → North Star B3/B6 (DB-4).
Status: red/blue-team COMPLETE 2026-06-18 (session 4) — two independent red-team agents + interactive blue-team triage; resolutions folded in below. CLEARED TO BUILD. Supersedes the 6/11 S1–S5 sketch. Grounded against the real repo — the
libraries+connectorsRBAC spine and the@autri/retrievaloperators already exist, so this epic extends them, it does not greenfield.
Objective
Autri's first external programmatic surface: retrieve from and write to knowledge bases over HTTP, authenticated by a static API key, library-scoped from day one. It is the seam the /stop dev-memory dogfood (write transcripts + recall prior decisions via the CLI) and QuoteAI consume, and the surface a future MCP server wraps without a rewrite. Guard: REST + static API keys ONLY — MCP stays cut (D56); the MCP wedge returns post-go-live on top of this same surface.
What already exists (extend, don't build)
Verified 2026-06-18 against the repo:
- RBAC spine —
libraries/library_kbs/library_access(org-scoped; a settings UI already creates libraries, adds KBs, grants access).db/migrations/006_library_connector.sql,app/lib/data/libraries.ts. - A credential-bound-to-a-library primitive —
connectors: issue/revoke, hashed secret,last_used_at,revoked_at, audit log (mcp_audit_log). It is Cognito OAuth client-credentials (JWT bearer), verified inmcp-servers/doc-search/src/auth.ts+server.ts. Note (red-team): connector issuance is CLI/script-only — there is NO connector settings UI to "sit beside"; the only settings UI is for libraries. - The retrieval operators, cleanly factored —
vector_search/fts_search/lookup_section/filter_then_rankwith Zod input schemas + a typedEnrichedHitoutput, plus the MCP-onlylist_knowledge_bases/get_document.retrieval/src/agent/tool-defs.ts,retrieval/src/types.ts,retrieval/src/agent/types.ts. Note (red-team): the operators take an OPTIONALknowledgeBaseIdand fan out across all in-scope KBs when omitted (tool-defs.tsper-KB loop);filter_then_rankis a per-KB factory keyed to the KB's declared attribute schema, not a static contract. - The scope chokepoint —
getKbScopeForConnector(connectorId) → kbIds(one INNER JOIN vialibrary_kbs), called on every live MCP tool dispatch (retrieval/src/scope.ts,server.ts). Do NOT mutate it (CA-9). - The write surface — presigned-upload (
app/app/api/kb/[kbId]/upload-url/route.ts, creates a pending doc + fires the S3→SQS→prep pipeline; accepts pdf/docx/xlsx/xls only) and atomic document-delete (app/app/api/kb/[kbId]/documents/[docId]/route.ts+app/lib/documents/delete-document.ts). Both are Cognito-SESSION-coupled (getCurrentUser()→redirect()on no session; scope byuser.organizationId) — so reuse = refactor, not call (CA-9).
Decisions (locked 2026-06-18, session 4; CA-8…CA-11 added by red/blue-team)
- CA-1 — Static API keys, a new
api_keystable besideconnectors; NOT the Cognito-OAuth connector. Formatautri_sk_<keyid>.<secret>(256-bit secret; delimiter.so it can't collide with the base64url key alphabet — red-team L2); store SHA-256(secret) + the non-secretkeyidlookup handle + a displaylast4; verify by parse-keyid → fetch → hash → constant-time compare → check revoked/expired; show the full key once. Why: D56 pivoted to API-key auth precisely to escape the OAuth/token-exchange path; a static bearer is curl- and skill-simple. Hash: SHA-256 — high-entropy keys aren't brute-forceable, so a slow hash (argon2) only adds ~50–100ms/call for no benefit; SHA-256 + keyid-lookup is the per-request standard (Stripe/GitHub-style). - CA-2 — Access: read is library-wide, write is a per-KB allowlist on the key (default empty = read-only), re-intersected at request time. A key grants read to every KB in its bound libraries; write only to the explicit KB(s) pinned on it. The write set is recomputed every request as
api_key_write_kbs ∩ (current read set)— the stored allowlist is a restriction, never a standing grant, so a KB leaving the key's library immediately drops write (red-team H1; mirrorsscope.ts's recompute-every-call discipline). Why: the library is the access unit (D13), but a write-capable key carries least privilege so it can't mutate a shared KB — the/stopkey writes ONLY the dev-memory KB. - CA-3 — Surface = read building blocks + document CRUD; NO black-box "ask a question" endpoint in v1. Expose the read operators +
list_knowledge_bases+get_document+ document CRUD + ingest-status. Red-team hardening:knowledgeBaseIdis REQUIRED on every read endpoint (no implicit cross-KB fan-out over HTTP — bounds cost/latency, prevents an amplification DoS on the embedding bill); responses are an explicit/v1DTO, not the raw internalEnrichedHit(so a retrieval refactor can't silently break the public contract);filter_then_rankover REST is DEFERRED to v1.1 (it's a per-KB factory needing an attribute-discovery endpoint; dev-memory recall doesn't need numeric filtering). Shapes still mirrortool-defs.tsso a future MCP server is a thin wrapper. - CA-4 — Lives at durable
/api/v1/*routes in the existing Next.js app, calling retrieval in-process; NOT server actions. Why: server-action IDs drift per build (stale consumers 404); API routes survive deploys. Reuse the in-process@autri/retrievalbehind a clean module boundary so it can split out later. - CA-5 — Consumption = one typed client + a thin CLI bin over it + a skill that bundles the CLI. Why: the most seamless way for AI agents (Claude Code sessions + the
/stop//startloop) to use the API without MCP — and the dev-memory recall path reads over HTTP via this CLI. QuoteAI/the hook import the same client. Cut (blue-team): "reference docs for a cold external consumer" shrinks to a 1-page README (no external consumer in the window). - CA-6 — Upload via the presigned-URL flow (reuse the web path). Why: respects the 6MB Lambda request cap (bytes stream S3-direct) and reuses the pipeline trigger. Scope (blue-team): the write seam is proven with an already-supported type (docx/pdf); markdown upload + transcript ingestion ride the dev-memory epic (item 3), not this one.
- CA-7 — Shared-KB write semantics acknowledged; KB-ownership isolation deferred. A library is an access lens, not a copy: a write to a KB shared into two libraries is visible in both (intra-org, by design); cross-org is hard-isolated by D13 and cannot happen. True per-library isolation is a future model change — revisit only if real overlapping-collaborator editing appears.
- CA-8 (red-team C1) —
/api/v1/*gets its OWN CloudFront behavior that forwards the credential header. The Main Lambda's origin-request policy is a 10-header allowlist at CloudFront's cap and deliberately excludesAuthorization(autri-infra/lib/web/cdn.ts) — the exact class of strip that bit us withNext-Action. Left as-is, the key never reaches the Lambda (401 for everyone in prod, green locally). A dedicated/api/v1/*behavior + origin-request policy forwards the credential (and lets us set API-appropriate CORS). This is S0 — before any endpoint code; the prod gate is a real cloud request, not a local pass. - CA-9 (red-team C2/H5) — "Reuse" = share an inner scope-parameterized core + a SIBLING resolver; never call the session handlers or mutate the connector resolver. Refactor the org/KB-access check out of
upload-url/delete-documentinto a core that takes a resolved{organizationId, readKbs, writeKbs}scope (the session route passes session-derived scope; the API route passes key-derived scope), and add a newgetKbScopeForApiKeyresolver (readingapi_key_libraries→library_kbs+ the CA-2 write-intersect) leavinggetKbScopeForConnectorbyte-identical (it's the live MCP security chokepoint). Exclude/api/v1/*from the NextAuth middleware redirect (thex-origin-secretcheck still guards it). - CA-10 (red-team H2) — Per-key rate-limit IS in v1, as a Postgres fixed-window counter (no new infra). A
(api_key_id, window)row, atomicINSERT … ON CONFLICT DO UPDATE … RETURNING count, checked before the embedding call →429over cap. Correct under concurrency (unlike an in-memory counter); generous default + nullable per-keyrate_limit_per_minoverride. Why: eachvector_searchis a billable OpenAI call on one shared key; a runaway consumer must be brakeable. Fixed-window's edge-burst is acceptable for "stop an accident." (An in-memory guard would be a no-op under concurrency — that's the thing we're NOT doing.) - CA-11 (blue-team) — Build a PRIMITIVE key-issuance settings UI now (rough draft, iterate later). List keys · issue (show-once-secret modal) · revoke · write-KB allowlist picker (constrained to the key's readable set). It's net-new frontend (no connector UI exists to extend), but it's needed eventually and a rough version now beats a CLI-only stopgap.
Data model (new)
api_keys—id,key_id(unique, indexed lookup handle),secret_hash(sha256),last4,name,organization_id(denormalized for the org assert),rate_limit_per_min(nullable → default),created_at,last_used_at,revoked_at,expires_at(nullable).api_key_libraries— (api_key_id,library_id); the libraries the key may read. Many-to-many → "one or more libraries."api_key_write_kbs— (api_key_id,knowledge_base_id); the KBs the key may write. Re-intersected with the live read set each request (CA-2).api_key_usage— per-key call log (api_key_id, ts, endpoint, kb_id, embedding_cost_usd, latency_ms) for COGS attribution + the fixed-window rate counter (CA-10).- New numbered migration (the db-migrate Lambda hard-errors on checksum changes to applied files); additive/nullable.
Surface (endpoints — all under /api/v1, credential via the CA-8 forwarded header)
Retrieval (read; knowledgeBaseId REQUIRED; return a /v1 DTO):
POST /retrieve/vector·POST /retrieve/fts·POST /retrieve/lookup. (filter_then_rank→ v1.1, CA-3.)GET /kbs(→list_knowledge_bases, key-scoped) ·GET /documents/{id}(→get_document, optionalincludeChunks; scope-filtered → 404 if out of read scope).
Documents (CRUD; write-gated by the re-intersected allowlist; by-id endpoints 404 on out-of-scope, not 403):
GET /kbs/{kbId}/documents— list, cursor-paginated.POST /kbs/{kbId}/upload-url— presign (shared core) →{documentId, uploadUrl, …};Idempotency-Keydedupes the pending-doc creation; the consumer PUTs bytes to S3; the pipeline fires.GET /documents/{id}/status— ingest status (pending → ingesting → ready/failed); new, and the dogfood's load-bearing endpoint.DELETE /documents/{id}— atomic delete (shared core), write-scoped.
Errors: { error: { code, message } } with a defined code enum; 401 (bad/missing key), 403 (read-only on write), 404 (out of scope / not found), 409 (idempotency), 422 (validation), 429 (rate limit, Retry-After). CORS: server-to-server only (no Access-Control-Allow-*) — stated, not silent.
Auth & enforcement
- A request helper parses the bearer key, verifies it (CA-1), resolves scope via
getKbScopeForApiKey→{organizationId, readKbs, writeKbs}(CA-2/CA-9), enforces the CA-10 rate window, then runs the endpoint;last_used_at+ the usage/audit row are fire-and-forget / post-response (never block or fail a good call). Org is the absolute boundary (D13). - Isolation negative-tests are inline DoD on S2 + each endpoint (not a trailing story): no read/write outside the key's libraries; no write outside the re-intersected allowlist; read-only key 403'd on write; cross-org denied regardless of libraries; and the dogfood's actual trust case — a key bound to library A gets 404/empty on the restricted dev-memory library's KBs in the same org (red-team M4).
Consumption
@autri/api-client— a thin typed client over the REST contract; the upload helper replays the presign's exact pinnedContent-Type/Content-Lengthand surfaces S3 signature failures as a typed error (the iOS-pin bug class, red-team L2).- CLI bin (
autri search|get|upload|status|delete) over the client; the upload command orchestrates presign→PUT→poll-status. - Skill bundling the CLI so agents call it via Bash; a 1-page README (external reference docs deferred).
Work breakdown (dependency-ordered; status/write before read; tests inline)
- S0 — CloudFront
/api/v1/*behavior + credential forwarding (CA-8). The deploy gate; validated with a real cloud request. - S1 — Schema + key gen/verify + primitive issuance UI (load-bearing; hand-write). The migration;
autri_skgenerate; SHA-256+keyid store/verify (constant-time); the CA-11 settings UI. - S2 — Auth helper + scope resolver + shared scope-core refactor (load-bearing; hand-write). Bearer parse/verify;
getKbScopeForApiKey(sibling, CA-9) with the CA-2 re-intersect; refactorupload-url/delete-documentto the scope-parameterized core; middleware exclusion; CA-10 rate window. Isolation negative-tests land here. - S3 — Document write path (the dogfood's real need): upload(presign) / status / delete, write-gated. Built BEFORE read (red-team M1).
- S4 — Read endpoints: vector / fts / lookup +
kbs+documents/{id},knowledgeBaseId-required,/v1DTO; embedding-cost logged. Factor a singlerunRetrieval(toolName, input, scope)envelope in@autri/retrievalso chat/MCP/REST don't drift (red-team H3). - S5 — Typed client + CLI + skill + README.
Acceptance
- Isolation negative-tests green (incl. the restricted-library denial) · a write smoke issues a write-scoped key → uploads a supported-type doc → polls
/statustoready· a read smoke retrieves through the API with parity to in-process · the rate-limit returns429over cap ·Authorization/credential reaches the Lambda in deployed prod (CA-8), not just locally.
Out of scope (v1)
filter_then_rank over REST + its attribute-discovery endpoint (→ v1.1) · the dev-memory ingestion pipeline incl. markdown upload (its own epic — this proves the seam) · external-consumer reference docs (README only) · admin-over-API (membership management stays in the UI) · the MCP server (post-go-live wrapper) · multipart/streaming upload · KB-ownership isolation (CA-7).
References
DB-4 (consumable API, dogfood-first), D56 (MCP cut → API-key auth — this epic IS that), D13 (row-level org tenancy). Prior art: mcp-servers/doc-search/src/{server,auth,tools}.ts (verify + scope + the runTool envelope to factor), retrieval/src/scope.ts (the resolver to clone, not mutate), retrieval/src/agent/tool-defs.ts (the operator contract to mirror), app/app/api/kb/[kbId]/upload-url/route.ts + app/lib/documents/delete-document.ts (session-coupled write surface to refactor into a shared core), autri-infra/lib/web/cdn.ts (the CloudFront header allowlist — CA-8). Full session record: decisions.md 2026-06-18 (session 4).