Alpha Readiness

Red-teamed + scope LOCKED 2026-06-19 (session 9). Grounded in a 3-pillar code audit of autri + autri-infra, with every load-bearing claim re-verified directly against the repo during red-team (the author is not the red-teamer). Goal: gate what must be true before onboarding external alpha users to the live app.

Goal & context

Onboard external alpha users to the live Autri app (app.autri.ai), starting with Jeb (a security-minded coworker who asked to use it), ahead of the Brehob Jul-6 kickoff. The app is effectively single-user today (owner only), so the readiness gates are precisely the things single-user dogfooding cannot exercise: cross-tenant data isolation, new-user onboarding, and per-tenant cost/abuse bounds. This work compounds with Brehob (the isolation proof is Brehob item 5b's acceptance; guardrails are item-6 COGS hardening), so it is on-path, not a detour. Ladders to B4 (one substrate, individual→F500, governance + tiered security) and B7 (dependable).

Current state (grounded + red-team-verified)

A 3-pillar audit. Load-bearing claims were re-verified directly against the code during red-team — what follows distinguishes verified facts from corrected assumptions.

Pillar 1 — Tenant isolation: strong in code, unproven live

Row-level multi-tenancy (D13) is well-built. Verified during red-team:

Every web request re-derives org from the session via a fresh DB lookup; client ids never trusted (app/lib/auth/current-user.ts). The chat path confirms the pattern directly: api/chat/route.ts:83-96 re-derives org and 404s on KB/org mismatch.
/api/cache/[...path] gates every asset behind a hard org-join chokepoint; cross-org or missing both return a uniform 404, no existence-leak oracle.
Retrieval + consumable API re-check scope per request; the API-key write set is re-intersected with the live read set every call.
The old CloudFront direct-serve S3 cross-org hole was already removed; presign-after-auth replaced it.
No blocking isolation holes found in the code.

Caveat: clean by code review, never proven by a live two-tenant adversarial test; the audit also left upload-url and the kb-attributes route inferred. SHOULD-fix: the per-route KB-scope check is a per-route responsibility (not centralized like the API's withApiKey), so a future route can forget it.

Pillar 2 — Onboarding: email/password works; the Google path is the risk

Cognito user pool live with Google federation + email/password fallback. Self-serve signup is allowlist-gated (SSM /autri/allowed-emails) in a PostConfirmation Lambda that auto-deletes non-allowlisted users — and the gate is fail-closed (verified: a missing param/env throws → no provisioning → no access).
First confirmed login auto-provisions a personal org + Personal library + welcome notification (transactional, idempotent).
Library access grant/revoke + member-management UI exists. Dev auth provider is double-gated (build-time NODE_ENV + runtime AUTRI_DEV_AUTH); cannot reach prod.
★ Corrected by red-team (H1): the allowlist check AND provisioning live only in PostConfirmation — there is no PreSignUp trigger and no lazy app-side provisioning (greps clean). Cognito's PostConfirmation does not fire for external-IdP (Google) sign-ups, so a Google login likely skips the allowlist and never provisions → lands in the getCurrentUser→signout loop. The doc's original "fully works" claim holds only for the email/password path. Jeb will likely reach for "Sign in with Google" first.

Pillar 3 — Cost / abuse guardrails: the real gap

Cost is computed and logged but never enforced (verified):

No per-document ingest cost ceiling, no per-user/org spend quota, no circuit breaker; computeIngestCost is observational only. The chat route computes cost in onFinish to record it (route.ts:114-133), never to enforce.
No rate limiting on the web surface at any layer — not in the app, and no WAF / CloudFront rate rule / API-GW throttle anywhere in autri-infra (grep clean). The 120-req/min limit exists only on API keys.
The 100MB upload cap (create-upload.ts:39) is the only hard bound — no per-KB doc count, no per-user corpus/storage cap, no per-document figure-vision cap.
Alpha LLM usage bills directly to the Anthropic/OpenAI API accounts (chat uses ANTHROPIC_API_KEY; not Max, not Bedrock) with zero alpha revenue — every overrun is real cash.

Locked scope (red-teamed 2026-06-19)

In for alpha-ready — all three must land before onboarding Jeb:

Cost/abuse guardrails — FULL build (Dan's call: complete guardrail before any onboarding, not staged): per-user/org daily spend cap + pre-flight ingest cost ceiling (estimate from page/unit count, gated at the post-parse/pre-LLM boundary — M1) + web per-session rate limit + a CloudWatch spend/usage alarm in the existing autri-monitoring stack (M2) + a per-document figure-vision cap. Quotas = simple global defaults (alpha is tiny — M3).
Onboarding — FIX the Google-federation path (Dan's call): provision + allowlist-check federated sign-ins (a PreSignUp trigger sharing the allowlist + a federated provisioning path, or unify provisioning into a trigger that fires for both) so Google works correctly and stays gated. Verify against the live pool first.
Isolation proof — BOTH (Dan's call): an integration harness (2nd org; assert uniform 404 across KB / document / cache asset / API-v1, plus the cross-org library-share lens — a granted user sees only the granted library's KBs) AND a thorough manual adversarial prod pass with a real 2nd Cognito account. Doubles as Brehob item-5b acceptance. Treat as able to surface must-fix work, not a rubber stamp.

Deferred past alpha: allowlist admin UI (manual SSM is fine for 1-2 users — M3); centralize the route-scope check into a shared wrapper; denormalization invariants as DB constraints (api_keys.organization_id immutability, chunks.knowledge_base_id consistency); user-facing cost dashboard; MFA; token-based invites; Entra/SAML SSO (Brehob 5b); account/data deletion + export (L1); ToS/privacy/alpha disclaimer (L2); RAG prompt-injection hardening (L3); S3-bucket versioning check (RDS 7-day retention + PITR already confirmed).

Stories (dependency-ordered)

Thresholds locked (Dan, 2026-06-19): per-user daily spend cap = $10; per-document ingest ceiling = $5 (half the daily, so one doc can't exhaust the budget; most real docs are far under); web rate limit = 30 req/min per session (generous for a human, stops a runaway loop); per-document figure-vision cap = 60 (proposed — adjust in-story). All global defaults (alpha is tiny — M3).

Wave 1 — parallel, independent builds:

S1 — Per-user daily spend cap + web rate limit (app; backend QA + CI). A shared per-user gate on the chat + upload routes: a fixed-window request limit (30/min) + a rolling daily spend rollup (sum chat_queries.cost + the user's ingest cost for the day) that blocks new chat/ingest at $10/day. Reuse the api_key fixed-window pattern; new per-user counter (migration). Returns a clean 429/quota error the UI surfaces.
S2 — Pre-flight ingest cost ceiling + figure cap (ingestion; backend QA + CI). At the post-parse/pre-LLM boundary, estimate cost from page/unit count × rate; if > $5/doc, circuit-break (UX decided in-story; lean reject-with-clear-message over silent flag). Cap figure-vision calls at 60/doc. Byte-identical below the ceiling (no-op on normal docs).
S3 — Spend/usage CloudWatch alarm (autri-infra autri-monitoring; backend QA). A daily-spend / chat-volume alarm → SNS to Dan. Cheap backstop that catches overruns in hours, independent of the hard quotas.
S4 — Fix Google-federation onboarding (autri-infra + Lambda; backend QA + live-pool test + deploy). Confirmed against the live pool us-east-1_lSk6wYeDM: PostConfirmation + PreTokenGeneration only, no PreSignUp → Google sign-ins skip allowlist + provisioning. Add a PreSignUp trigger that allowlist-checks (shared logic with post-confirm) + ensure federated users get provisioned (org/Personal library) — in PreSignUp or a first-authenticated-request ensure-user path. Cleanup: delete the stale orphaned pool us-east-1_7YgaDlZlB (created 2026-05-21, no triggers, unused). Deploy + verify with a real Google sign-in.

Wave 2 — gate, after Wave 1 merges + deploys:

S5 — Isolation proof: harness + manual adversarial pass (test; backend QA + manual). Build the integration harness (2nd org; assert uniform 404 across KB / document / cache asset / API-v1 + the cross-org library-share lens — a granted user sees only the granted library's KBs). Then a manual adversarial prod pass with a real second Cognito account (now including a Google account, since S4 makes that path work). Gate: all 404s hold + the share lens is tight. Doubles as Brehob item-5b acceptance; may surface must-fix work.

Waves: S1–S4 build in parallel (Wave 1); S5's manual pass runs last (Wave 2) so it exercises the fully-deployed system end-to-end. S4 (auth/Cognito + deploy) and S5 (manual prod adversarial pass) have human-in-the-loop / live-pool steps that need an interactive session, not unattended agents.

Red-team findings & decisions (2026-06-19)

Verified-true under red-team (no action needed): chat org-scope (route.ts:83-96), /api/cache org-join + uniform 404, API-v1 scope re-intersection, allowlist fail-closed, RDS 7-day + PITR.

C1 (critical) — no cost/abuse enforcement on the web surface at any layer; alpha bills to your API accounts. → DECISION: full per-user/org guardrail build before onboarding (spend cap + pre-flight ceiling + web rate limit + spend alarm + figure cap).
H1 (high) — Google-federation onboarding unverified & likely broken/ungated (allowlist + provisioning only in PostConfirmation, which doesn't fire for external IdP; no PreSignUp; no lazy provisioning). → DECISION: fix the federation provisioning + allowlist path first.
H2 (high) — isolation clean-by-review but unproven live; some audited paths inferred. → DECISION: both proofs (harness across all paths + cross-org share, plus a manual adversarial prod pass).
M1 — a cost ceiling must gate on a pre-flight estimate (actual cost is known only post-extraction), attached at the post-parse/pre-LLM boundary; circuit-breaker UX (reject / queue-for-approval / ingest-and-flag) decided within the build story.
M2 — add a spend/usage CloudWatch alarm to the existing autri-monitoring stack (cheap backstop), folded into the guardrail build.
M3 — alpha stays tiny (Jeb / 1-2 trusted): manual SSM allowlist is fine; quotas = simple global defaults; admin UI deferred.

Remaining open questions (for the build stories)

Thresholds (need Dan's numbers): per-document ingest cost ceiling ($), per-user daily spend cap ($), web rate-limit window/limit. Anchor to measured COGS where available.
Circuit-breaker UX: reject at upload vs queue-for-approval vs ingest-and-flag — decide within the cost-guardrail story.
Federation fix mechanism: confirmed against the live pool before building (does PostConfirmation truly not fire for Google in this pool; is there any linking config).
Isolation acceptance bar: the exact path list + the cross-org library-share assertion; whether the harness gates CI.

Alpha Readiness#

Goal & context#

Current state (grounded + red-team-verified)#

Pillar 1 — Tenant isolation: strong in code, unproven live#

Pillar 2 — Onboarding: email/password works; the Google path is the risk#

Pillar 3 — Cost / abuse guardrails: the real gap#

Locked scope (red-teamed 2026-06-19)#

Stories (dependency-ordered)#

Red-team findings & decisions (2026-06-19)#

Remaining open questions (for the build stories)#

Review