Infra & Auth Plan

Drafted 2026-05-18 (session 2). Direction locked: AWS-native + Cognito + library/connector MCP model. Subject to review and refinement.

This doc decides where Autri runs in production, how users sign in, and how the MCP-as-infrastructure wedge gets implemented at the architecture level. The choice spans app host, DB host, LLM provider, auth provider, and the MCP authorization model — and each has a cost curve that scales differently with users. After this session's brainstorm, the direction is locked: AWS-native (ECS Fargate + RDS + Cognito + Bedrock), with a library/connector model for MCP scope. This doc records the reasoning, the rejected alternatives, and the 2-week sprint that gets us to beta.

Vision & Goals

Autri is a generic, inspectable knowledge-base platform whose primary product surface is MCP-into-the-AI-tools-users-already-use — Microsoft Copilot, Claude Desktop, Cursor, Continue. The bet: enterprise procurement friction collapses when the customer doesn't have to approve a new AI tool, just a new data source for the AI tool they already approved.

Strategic layers (in delivery order):

Personal tier wedge — author + prosumer users with their own knowledge bases (novels, research notes, hobby projects). Acquisition motion. Cheap to serve via MCP.
MCP-as-infrastructure — published MCP endpoints that customers add to their AI tool of choice. The web chat surface exists as a demo + KB management surface; the MCP integration is where the product lives in users' workflows.
Team tier — small-group plans (3-25 seats) with library-based access control. STEM Racing Charlotte is the first beta candidate. Team plan replaces the v1.5 "Studio" placeholder; it's in scope for the MVP launch SKU.
Enterprise tier — SSO, audit, per-tenant isolation, BYO institutional knowledge. Triggered by first real enterprise lead (Brehob, Dan's day-job org, or inbound).
Platform layer (long-term) — Autri's KB substrate as foundation for vertical AI apps. QuoteAI is the proof-of-concept (already an app on top of KB layer). Don't pre-build the platform API; let real second-app pressure surface what's needed.

Market context: Knowledge-base-as-AI-substrate is bifurcated. Closed destinations (Glean, Guru, Notion AI) lock users into their UI. DIY platforms (Pinecone + LangChain) require engineers. Vertical specifics narrow the audience. The gap: a generic, inspectable, BYO-data KB that plugs into existing AI tools. Autri targets that gap.

Brehob context (post-2026-05-18 Andy update): Brehob is offering $30-50k traunches as partnership / acceleration capital, not acquisition. Equity terms TBD. Bottom line: Autri does not require Brehob money to start, only to accelerate growth. This de-gates Autri development from the QuoteAI deal sequencing entirely (updates D2). Pursuing Brehob as an enterprise design partner (deploying Autri into their org via Copilot MCP) is a separate parallel conversation.

Constraints

Updated 2026-05-19 with locked decisions from review. See aws-infra-options.md for full stack-layer analysis.

What we're optimizing for, in priority order. These drive every decision.

2-week target to beta — by 2026-06-01, mom + STEM Racing + Dan can log in and use Autri via the web and via MCP-into-Claude-Desktop (Copilot validation deferred until M365 subscription).
5-10 beta testers max in first 90 days — known users, real workflows. Not waitlist scale. This means we don't need multi-tenancy enforcement on day one, but the architecture must allow it.
AWS-native is the chosen path — locked. Lean direction: Stack B (AgentCore Runtime + Amplify + Fargate Tasks + RDS + Cognito) pending Day 0 spike validation. Stack A (all-Fargate) is the documented fallback. Full alternatives + rejected paths in aws-infra-options.md.
MCP is the wedge — every architecture decision evaluates "does this strengthen MCP-as-infrastructure?" Web chat is the demo; MCP is where the product lives.
Migration cost from personal → team → enterprise must stay low. Plan for SSO and OBO flows from the architecture even if not implemented in v1. The library/connector model carries forward unchanged.
Single region (us-east-1) — locked for beta and likely year 1. Multi-region only when first paying customer pushes (latency or compliance reason).
Local-first development sequencing — locked. Build library/connector + MCP-over-SSE locally first, validate against Claude Desktop, THEN provision AWS. AWS becomes "boring infra deploy" once the substance is proven.
Cost floor at idle: target ~$35-50/mo on Stack B (vs ~$95/mo on Stack A). AWS Activate Founders ($1k) deferred to post-landing-page session; not blocking beta. We "rawdog the cost" for the beta sprint duration.

Locked goals (committed across sessions):

AWS-native stack: Stack B (AgentCore Runtime / Amplify / Fargate Tasks / RDS Postgres + pgvector / Cognito) — pending Day 0 spike. Stack A fallback documented.
LLM split: Anthropic API for local dev, Bedrock for production deploy. Env var swap via cli-client.ts abstraction.
Library/connector model for MCP scope (per-library connector, DB-backed authorization) — per D34/D35
Team tier in MVP launch SKU; SSO deferred to first real enterprise lead
autri.ai domain on Cloudflare DNS → AWS subdomains: app.autri.ai (Amplify) + mcp.autri.ai (AgentCore)
Per-KB connector pattern as v1 stepping stone toward per-library, then per-user connector
Single region us-east-1 for beta and likely year 1
Cost telemetry as beta DoD: produce daily $/MAU report per stack layer

Deferred until later sessions:

M365 + Copilot Business subscription (~$30-40/mo, 1 seat) — buy pre-enterprise validation OR right before any external Copilot demo. Per the M365 thread: "post beta, pre enterprise." Validates the enterprise architecture without paying for it before we need it.
Landing page sprint + AWS Activate Founders application — post-beta. Have the product first, marketing second.
Data ingestion patterns (OneDrive webhooks, GDrive change notifications, email-in, desktop sync client) — post-beta feature, deserves its own doc series. Beta uses manual re-upload.
SSO / SAML / Entra ID integration — add WorkOS or Cognito Federation when first enterprise lead asks
OAuth On-Behalf-Of flow for multi-user-per-org Copilot installs — same trigger as SSO
SOC2 posture, BAAs, dedicated infra tiers — Enterprise tier table-stakes; defer until first contract requires
Audit log dashboard — events written from day one; the UI for browsing them is later
AWS Activate Portfolio referral (independent of website readiness) — surface in next Andy conversation

Recommended Stack

Updated 2026-05-19. Full stack-layer analysis in aws-infra-options.md. This section summarizes the lean direction; the AWS-options doc covers alternatives + rejected paths.

Current lean: Stack B — AgentCore Runtime + Amplify + Fargate Tasks + RDS + Cognito. Pending Day 0 spike validation; falls back to Stack A (all-Fargate) if AgentCore has a blocker.

Layer	Choice	Idle	100 MAU	1k MAU	Notes
MCP server	Bedrock AgentCore Runtime	$0	$5-15	$50-150	Idle is free; per-session microVMs
Web app	AWS Amplify (Next.js, GitHub-connected)	$0	$5-20	$50-300	Scales-to-zero compute
Ingestion workers	Fargate Tasks (on-demand)	$0	~$5	~$30	Pay per task-second
Database	RDS Postgres 16 + pgvector (db.t4g.small)	$30	$30	$60	Multi-AZ later
LLM (dev/local)	Anthropic API direct	pay-as-you-go	~$50	$200-800	Per D12
LLM (production)	AWS Bedrock	pay-as-you-go	~$50	$200-800	Per D16; flip via env var
Auth	Cognito (Google federated)	free	free	free	50k MAU free tier
DNS / TLS	Cloudflare DNS-only → AWS endpoints + ACM cert	free	free	free
Storage	S3 (uploads, page renders, cache)	~$1	~$5	~$20
Secrets	AWS Secrets Manager + Parameter Store	~$2	~$2	~$2
Observability	CloudWatch Logs + Cost Explorer + Budgets	~$5	~$15	~$50	Datadog later if needed
Total		~$35-50/mo	~$60-90/mo	~$300-700/mo

vs Stack A (all-Fargate, the original plan): ~$95/mo idle / ~$180/mo at 100 MAU. Stack B saves $45-60/mo at idle and ~$90/mo at 100 MAU. Compounded monthly during beta = ~$300-500 saved over first 3 months.

Why Stack B over Stack A:

AgentCore Runtime is purpose-built for MCP — per-session microVMs, OAuth 2.1 native, idle is free
Amplify gives push-to-deploy CI/CD Dan is familiar with, vs hand-rolling Fargate + CDK + GitHub Actions
Cheaper at every scale through ~1k MAU
Migration cost off either component is bounded (~1 week to migrate Amplify → raw CDK; AgentCore container is portable to Fargate)

Why Stack A is still documented as the fallback:

AgentCore Runtime is 2 months old at decision time — limited community knowledge
Day 0 spike validates AgentCore before locking; if blocker surfaces, Stack A is the known-good path
Fargate consistency story (one runtime everywhere) is operationally simpler if AgentCore has rough edges

Why these choices over Vercel / Fly / etc: see aws-infra-options.md § Considered Alternatives and the original Considered Alternatives section in this doc (rejected pre-AgentCore).

Direct Anthropic API → Bedrock migration path: start with Anthropic API for local dev (fastest iteration, Max-plan testing affinity per feedback_max_plan_for_testing.md); swap to Bedrock for production deploy. Same prompts, same tool definitions, env var flip via cli-client.ts abstraction. Bedrock model access approval (24-48h per model) requested during week 2 prep so it's ready when production cuts over.

Open questions (validated in Day 0 spike):

AgentCore Runtime transport support (Streamable HTTP vs HTTP+SSE) — research subagent in flight
Cold-start latency on first MCP session — research subagent in flight
Cognito SSO state across app.autri.ai and mcp.autri.ai
Cost telemetry visibility (CloudWatch + Cost Explorer + Budgets pattern at $50/$100/$200 thresholds)
Domain routing topology (Cloudflare DNS → multi-target AWS endpoints with TLS)

MCP Architecture

Transport locked 2026-05-19: Streamable HTTP (MCP spec 2025-03-26+). Per AgentCore Runtime requirements + research findings on the AWS-options doc.

The MCP integration is Autri's load-bearing product surface. The model needs to handle:

Identity: which Autri user is on the other end of an MCP request?
Authorization: which knowledge bases is that user allowed to query?
Enforcement: every tool call scoped to the right KB subset.

Server topology:

Endpoint: https://mcp.autri.ai/c/{connectorId} (production) or http://localhost:8080/c/{connectorId} (local-first dev)
Transport: Streamable HTTP (MCP spec 2025-03-26+). POST requests to /mcp returning JSON-RPC; streaming responses use text/event-stream content-type within Streamable HTTP semantics. Not legacy HTTP+SSE transport — required by AgentCore Runtime, supported by all current MCP clients (Claude Desktop, Cursor, Copilot Studio).
Auth: OAuth 2.1 + PKCE. Cognito acts as the OAuth authorization server. Each connector has its own OAuth client credentials.
Token validation: every request → Cognito JWKS public-key verification of JWT signature. ~1ms in-process.
Host: AgentCore Runtime per-session microVM (Stack B, lean) OR Fargate task behind ALB (Stack A, fallback). Same MCP server container code either way.

Request lifecycle:

1. Copilot / Claude Desktop / Cursor → POST https://mcp.autri.ai/c/{connectorId}/mcp
   (Streamable HTTP: tools/list, tools/call, etc. via JSON-RPC body)
2. Server: extract Bearer token from Authorization header
3. Server: verify JWT signature against Cognito JWKS (cached)
4. Server: SELECT user_id, library_id FROM connectors WHERE id = {connectorId}
5. Server: verify connector.user_id matches token.sub (defense in depth)
6. Server: load library KBs:
     SELECT knowledge_base_id FROM library_kbs WHERE library_id = {library_id}
7. Server: scope all tool calls to that KB set
8. Tool call (e.g., search_knowledge_base) → scoped retrieval query
9. Tool result returned via Streamable HTTP response (chunked text/event-stream
   for streaming results, or single application/json for short responses)

Token scope model (option 3 from session brainstorm — locked):

Token = identity proof (Cognito-issued, contains user_id + standard claims)
DB = authorization source of truth (connector ↔ library ↔ KBs)
Pros: revoke access by deleting DB row; no token regeneration needed; sub-ms lookup with proper indexing
Cons: every request hits the DB (acceptable — chunks table has order-of-100ms LLM calls before this matters)

Tool surface (v1):

search_knowledge_base(query, kb_id?) — vector + FTS hybrid search across library's KBs. kb_id filters to a specific KB; omitting searches all.
lookup_section(section_id) — direct lookup by section/rule ID (for structured docs).
list_knowledge_bases() — list KBs in the connector's library (so the LLM knows what's available).
get_document(doc_id) — fetch full doc metadata + chunks.

AgentCore Runtime session semantics (Stack B specifics):

Per-session isolated microVMs, 2 vCPU / 8 GB each
Idle timeout: 15 min (adjustable via idleRuntimeSessionTimeout)
Max compute lifecycle: 8h (adjustable via maxLifetime) — at the boundary, a new microVM is provisioned with the SAME Mcp-Session-Id; in-memory state lost unless persisted via session storage or AgentCore Memory
Logical session remains valid until the AgentCore Runtime ARN is deleted
Concurrent active sessions per account: 1,000 in us-east-1 (well above beta scale)
New session rate: 100 TPM container deployment (generous for beta + early growth)
Cold start: sub-second from 10-microVM warm pool; 2-3s beyond pool

Enterprise extensions (deferred until first lead):

SSO federation — Cognito → enterprise IdP (Entra ID, Okta, Google Workspace). Library access mapped from IdP group membership.
OAuth On-Behalf-Of — when org admin installs the connector, individual users within that org get their own scope via OBO token exchange. Microsoft Graph + Azure AD app registration.
Audit events — every tool call emits a structured event (user_id, connector_id, tool, KB scope, query, result_count, timestamp). Written to a mcp_audit_log table from day one; the dashboard UI is later.

Open implementation questions:

Client behavior at 8h compute boundary — does Claude Desktop / Copilot Studio / Cursor gracefully reconnect on the same Mcp-Session-Id when AgentCore swaps the underlying microVM? Or does it see a hard disconnect and require a fresh handshake? AWS docs describe server-side behavior; client behavior is empirical. Test in Day 0 spike with Claude Desktop.
Per-tool rate limiting: per-user vs per-connector vs per-tier. Likely per-connector with tier-based budgets.
Connector key rotation: how does a user rotate OAuth client_secret without breaking existing Claude Desktop / Copilot installations? Documented rotation flow needed.
Pre-emptive VM warmup pattern: for the first MCP request from a fresh user, sub-second cold-start is fine, but if we want zero perceived cold-start we'd implement pre-emptive warmup (initialize a session before the user's first real request — e.g., when they click "Connect to Copilot"). Defer until UX justifies.

Access Control: Libraries & Connectors

The library/connector model solves three problems at once:

Avoids per-KB connector setup tedium — a Team admin with 100 KBs can't be expected to generate 100 connectors.
Provides natural RBAC — a "library" IS a role ("Engineering Library" = "engineering team's allowed KBs").
Scales from personal to enterprise without architectural changes — the same primitives serve every tier.

Concepts:

Knowledge Base (KB) — storage/retrieval unit. Documents, chunks, embeddings. Unchanged from current Autri schema.
Library — named collection of KBs with an access policy. Many-to-many with KBs (a KB can be in multiple libraries — e.g., "FIA Regulations" appears in both "Engineering Library" and "Compliance Library").
Connector — MCP endpoint scoped to one library. Tied to one Autri user. Generates OAuth credentials for an external MCP client (Copilot, Claude Desktop, Cursor).
Org — tenancy boundary. Personal users are their own org of one. Team/Enterprise users belong to multi-user orgs.

Schema (additions to current Autri DB):

-- Existing: organizations, users, knowledge_bases (from 002_kb_primitive.sql)

CREATE TABLE libraries (
  id UUID PRIMARY KEY,
  org_id UUID NOT NULL REFERENCES organizations(id),
  name TEXT NOT NULL,
  description TEXT,
  owner_user_id UUID NOT NULL REFERENCES users(id),
  created_at TIMESTAMPTZ DEFAULT NOW(),
  UNIQUE (org_id, name)
);

CREATE TABLE library_kbs (
  library_id UUID NOT NULL REFERENCES libraries(id) ON DELETE CASCADE,
  knowledge_base_id UUID NOT NULL REFERENCES knowledge_bases(id) ON DELETE CASCADE,
  added_at TIMESTAMPTZ DEFAULT NOW(),
  PRIMARY KEY (library_id, knowledge_base_id)
);

CREATE TABLE library_access (
  library_id UUID NOT NULL REFERENCES libraries(id) ON DELETE CASCADE,
  user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  can_admin BOOLEAN DEFAULT FALSE,
  granted_at TIMESTAMPTZ DEFAULT NOW(),
  granted_by UUID REFERENCES users(id),
  PRIMARY KEY (library_id, user_id)
);

CREATE TABLE connectors (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  library_id UUID NOT NULL REFERENCES libraries(id) ON DELETE CASCADE,
  name TEXT NOT NULL,
  oauth_client_id TEXT NOT NULL UNIQUE,
  oauth_client_secret_hash TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  last_used_at TIMESTAMPTZ,
  revoked_at TIMESTAMPTZ
);

CREATE INDEX idx_library_access_user ON library_access(user_id);
CREATE INDEX idx_connectors_user ON connectors(user_id);
CREATE INDEX idx_library_kbs_library ON library_kbs(library_id);

Default library for personal users:

Every newly-created user gets a default Personal library. New KBs auto-added to it. UI surfaces the library concept even for personal users (per Dan's call — "no idea how someone will want to separate their KBs"). The Personal library is just the default seed; users can rename it, create more, and assign KBs across them however they want.

Team admin flow (STEM Racing pattern):

Admin creates org (or invited via existing team flow)
Admin creates KBs: "FIA Technical Regs", "Team Internal Wiki", "Race Strategy Docs"
Admin creates libraries: "Engineering" (FIA Regs + Internal Wiki), "Officers" (all three)
Admin invites 10 team members; each member added to one or more libraries via library_access rows
Each member generates their own MCP connector pointing at their assigned library
Adding a new KB to "Engineering" propagates to all engineering members' connectors automatically (no per-member work)

Per-KB → per-library → per-user evolution:

v1 (this sprint): connectors only support a single library that contains exactly one KB. Functionally equivalent to per-KB connectors but uses the right schema for the library model — no migration when we lift the constraint.
v2 (post-beta, ~week 4+): UI exposes library creation, multi-KB library composition, library-scoped connectors. STEM Racing pattern unlocked.
v3 (per-user MCP, post-MVP): a single connector exposes ALL of a user's accessible libraries; the model chooses which library to query via tool calls. This is the per-user MCP long-term goal.

Tier Ladder

Placeholder pricing — refine after measured cost data from real users. Current numbers anchor against (a) competitor reference points, (b) cost-per-chunk data (D18, D27, D30), (c) the chunks-as-unit-of-cost framing.

	Free	Author	Team	Enterprise
Price	$0	$10/mo	$30/seat/mo (3-25 seats)	custom
Libraries	1	3	unlimited (org-scoped)	unlimited
KBs	1	10	shared pool, 100	unlimited
Chunks	2k	30k	200k pool	custom
Users	1	1	per-seat	unlimited
Admin role	—	—	✓	✓ + SSO
MCP	✗	✓ (1 client)	✓ (multi-client)	✓ + IP allowlist
PDF fair-use	50 pp/mo	500 pp/mo	5k pp/mo pool	metered
Audit log	—	basic	basic	full + export
Support	community	email	priority	dedicated

Differences from prior D18 ladder:

Team replaces Pro. Previous D18 had "Pro $35/mo" as single-seat power user. Team is the right shape for STEM Racing's beta case (10 users, shared KBs). The "power individual" use case can collapse into Author + KB add-ons.
Pricing is placeholder. Refine once measured cost data from real beta users comes in. Per Dan's call: "set price based on cost."
MCP unlocked at Author. Per D19, MCP is the paid-tier wedge. Free is web-chat-only.
SSO unlocked at Enterprise. Team gets multi-user with library access control but no SSO; that's the Enterprise add.

Open questions:

Is "Author $10" still the right floor, or should we be more aggressive given the cost-per-chunk data ($0.002 prose / $0.009 PDF)? Could test $15 or $20 if Author tier turns out to be cost-sensitive.
Team minimum 3 seats — is that right or should it be 1+? Argument for 1-seat-minimum: lets the team admin self-onboard alone before inviting; argument for 3-min: avoids cannibalizing Author tier.
PDF fair-use pool at Team tier (5k pp/mo across all seats) vs per-seat. Pool is simpler; per-seat is fairer for unequal usage. Lean pool for v1.

2-Week Beta Sprint

Reshaped 2026-05-19 to local-first sequencing per resolved annotation thread. Week 0 half-day prep added for AgentCore Runtime spike. Week 1 is local validation; Week 2 is AWS deploy.

Goal: by 2026-06-01, the following works end-to-end:

Dan, mom, and STEM Racing team can sign in at app.autri.ai via Google
Mom can upload her novel (already in the system) and chat about it
STEM Racing can upload FIA Regulations PDF, see it processed in the inspector, and chat about it
Dan + at least one tester can install an Autri MCP connector into Claude Desktop and query a KB (Copilot integration validation deferred until post-M365-purchase)

Week 0 (prep) — half day

Day 0a: AgentCore Runtime POC spike. Deploy "hello world" MCP server to AgentCore, validate: transport (Streamable HTTP or HTTP+SSE), cold-start latency, session lifecycle, Cognito OAuth flow, billing visibility (CloudWatch + Cost Explorer + Budgets alerts). Lock Stack B if green, fall back to Stack A if blocked.
Day 0b: AWS account hygiene. Set up AWS Organization + dedicated autri-prod account, tag all resources project=autri env=beta, set up Budgets alerts at $50/$100/$200/mo thresholds.

Week 1 — Local-first MVP

Day 1-2 — Library/Connector schema:

Run DB migration to add libraries, library_kbs, library_access, connectors tables (against local Docker Postgres)
For each existing user: backfill a default Personal library, add their KBs to it
Validation: SQL queries return expected rows for "what KBs does this connector grant access to"

Day 3-4 — MCP-over-SSE adapter:

Take existing mcp-servers/doc-search (currently stdio) and adapt to HTTP + SSE (or Streamable HTTP, depending on AgentCore Day 0 finding)
Run locally on localhost:8080
OAuth scaffold: stub the auth layer for local dev (Cognito local emulator OR a JWT shim returning a hardcoded test user)
Validation: curl request to MCP endpoint returns SSE tool-list response

Day 5-6 — Connector creation UI:

"Connectors" page in app: list, create, revoke
"Generate MCP Connector" flow: select library → name connector → returns endpoint URL + OAuth client credentials
Per-connector authorization at MCP server: tool calls scoped to library KBs

Day 7 — Local end-to-end validation:

Dan installs his own local MCP connector into Claude Desktop
Configures Claude Desktop with http://localhost:8080/c/<connectorId> + bearer token
Queries his novel KB from within Claude Desktop, verifies SSE response, citations, scope enforcement
This is the "does the wedge work" gate. If yes, proceed to AWS deploy.

Week 2 — AWS deploy

Day 8-10 — Infrastructure scaffolding:

CDK project for: VPC, RDS db.t4g.small, AgentCore Runtime config, Cognito user pool, ACM cert
Amplify project connected to GitHub autri repo, builds Next.js app on push
Bedrock model access approval requested (Sonnet, Haiku) — 24-48h to fire so requested early
Migration scripts: local Postgres → RDS (pg_dump/restore + cutover plan)

Day 11 — Deploy:

Push Next.js app → Amplify deploys to app.autri.ai
Deploy MCP server container → AgentCore Runtime, exposed at mcp.autri.ai
Migrate dev data to RDS
Cognito Google federation wired into Amplify

Day 12 — Auth + DNS:

Cloudflare DNS → AWS endpoints (CNAME app.autri.ai → Amplify, mcp.autri.ai → AgentCore)
TLS certs via ACM for both subdomains
End-to-end auth test: log in at app.autri.ai → generate connector → install in Claude Desktop → query → success

Day 13 — Cost telemetry + observability:

CloudWatch dashboards: per-stack-layer cost, per-MAU cost, MCP session counts
AWS Budgets alerts confirmed working at $50/$100/$200 thresholds
Cost Anomaly Detection daily email enabled
CloudWatch Logs aggregated for app + MCP server

Day 14 — Beta onboarding:

Dan onboards himself end-to-end with the deployed stack
Mom onboarded (web chat only)
One STEM Racing engineer onboarded (web chat + Claude Desktop MCP)
Triage feedback, document follow-ups

Definition of done for v1 beta

AWS-native deploy operational at autri.ai (subdomains: app.autri.ai, mcp.autri.ai)
Cognito + Google federated login works
Web chat works end-to-end for at least the novel + FIA KBs
MCP-over-SSE endpoint operational on AgentCore Runtime (or Fargate if Day 0 fallback triggered)
At least one MCP connector functioning in Claude Desktop at scale of "10 queries without breaking"
3+ external beta users have logged in and successfully used the product
Cost telemetry deliverable: end-of-sprint report with daily $/MAU numbers per stack layer (for validating D18 / Tier Ladder pricing post-beta)

Explicitly NOT in scope for v1 beta

Multi-tenancy enforcement (RLS — defer to v1.1, current state is "any UUID grants access")
Microsoft Copilot integration as the primary MCP target (Claude Desktop is the v1 demo; Copilot validates post-M365-subscription)
Bedrock LLM provider for live traffic (Anthropic API for v1 prod traffic; flip to Bedrock in v1.1 via env var swap)
SSO / SAML / Enterprise auth
Stripe / paid tier enforcement (everyone gets all features during beta)
Team invite flow (single-user-per-org for beta; team comes after first MVP user feedback)
Audit log dashboard (events written, UI deferred)
Mobile-responsive polish
Landing page (deferred to post-beta session, before AWS Activate Founders application)

Risks

AgentCore Runtime is 2 months old. Day 0 spike is the de-risking step. If a blocker surfaces (transport mismatch, cold-start excessive, OAuth flow broken), Stack A (all-Fargate) is the documented fallback — no architecture re-design needed, just deploy MCP server as Fargate task instead of AgentCore.
SSE / Streamable HTTP transport compatibility. Different MCP clients support different transports. Research subagent in flight to nail down AgentCore's transport support.
Cognito hosted UI is ugly. Acceptable for v1; replace with custom UI in v1.1 if it bothers Dan.
RDS db.t4g.small may be undersized for the ingestion workload (concurrent embeddings, vector queries). Watch metrics; scale up if needed.
Local Postgres → RDS migration cutover. Tested in the deploy step; usually clean but has a manual cutover moment requiring brief downtime.
Bedrock model access approval timing. Request early in Week 2 — 24-48h lead time. Not strictly needed for beta (Anthropic API direct works for v1 traffic) but worth pre-positioning for v1.1 Bedrock cutover.

Stickiness & Migration Paths

Decisions ordered by how hard they are to reverse.

Sticky (one-way doors):

Cognito as auth provider — once users have accounts, migrating to a different provider means each user re-authenticates. Cognito → WorkOS migration is possible but invasive; only swap if Cognito's enterprise SSO ergonomics force the issue.
autri.ai brand commitment — domain, marketing material, customer mind-share. Reversible at high cost.

Sticky-ish (reversible with planning):

DB schema — Postgres → Postgres migrations are dump/restore but require downtime. Schema changes need careful migration planning. RDS → Aurora is in-place; RDS → external Postgres requires data movement.
MCP authorization model — once external customers have connectors with specific OAuth credentials, changing the auth model means coordinating updates with each customer's MCP client. Library/connector model designed to absorb future changes (e.g., per-user MCP in v3) without breaking existing connectors.

Swappable (low migration cost):

LLM provider — Anthropic API ↔ Bedrock swap is hours of work via cli-client.ts abstraction. Prompts and tool defs unchanged.
Container host — ECS Fargate → EKS → external Kubernetes is mostly task-definition translation. Docker image is the portable unit.
CDN / DNS — Cloudflare → CloudFront, or DNS-only → proxied, can change without app code changes.

Migration triggers (when do we move?):

Personal → Team: first user requests multi-user invite (we expect this from STEM Racing within the first month of beta).
Team → Enterprise: first inbound or referred enterprise lead asks for SSO. Spin up the SSO + SAML integration as a real engineering effort.
Anthropic API → Bedrock: first month where direct API spend exceeds Bedrock's flat-fee threshold OR first enterprise customer asks about data residency.
AWS-native single-region → multi-region: first paying customer in a region that needs latency or compliance attention. Not before.

Considered Alternatives

Brief rationale for stacks not chosen. Detail captured in prior session notes — preserved here only as institutional memory.

Scenario A: Lean Vercel (Vercel + Neon + Clerk + Anthropic API) — fastest beta deploy possible (~~3 days), cheapest at idle (~~$50/mo). Rejected because Vercel's request model fights long-running SSE for MCP, Clerk's SSO is paywalled behind $400/mo, and Neon adds a non-AWS dependency for marginal value.

Scenario C: Cross-cloud hybrid (Vercel + Neon + Cognito + Bedrock) — splits the difference between speed and AWS-readiness. Rejected because cross-cloud egress costs add up, two clouds means two on-call stories, and the migration path was as complex as just going AWS-native from the start.

Scenario D: Solo prosumer (Fly + Neon Free + NextAuth + Anthropic API) — cheapest possible production stack (~$15/mo idle). Rejected because NextAuth + Google has no enterprise path (no SSO, no audit, no SCIM), and the migration cost when an enterprise lead arrives is significantly larger than the cost difference today.

AWS Amplify — considered as a fast-AWS variant. Rejected because Amplify is an awkward middle ground (neither as easy as Vercel nor as flexible as full AWS), and Dan would have to learn it anyway.

Cloudflare Workers + D1 + R2 — surfaced in skeleton but never seriously evaluated. D1 is too young for pgvector + production scale.

Recommendation

Updated 2026-05-19 with locked decisions from review threads.

Locked stack (lean, pending Day 0 spike): Stack B from aws-infra-options.md — Bedrock AgentCore Runtime (MCP server) + AWS Amplify (Next.js web app) + Fargate Tasks (ingestion workers) + RDS Postgres 16 + pgvector + Cognito (Google federated). autri.ai domain on Cloudflare DNS pointing at AWS subdomains.

Documented fallback: Stack A (all-Fargate) if Day 0 spike reveals an AgentCore blocker.

Locked auth model: Cognito with Google federated identity for v1. WorkOS or Cognito-Federated-SAML deferred to first enterprise lead. Per D34.

Locked MCP model: Library/Connector pattern with DB-backed authorization. Per-KB connector in v1 (single-KB library per connector); per-library connector in v2; per-user MCP in v3. Per D34/D35.

Locked tier ladder: Free / Author $10 / Team $30/seat / Enterprise. Pricing as placeholder; refine after measured cost data from beta users (cost-telemetry is now a beta DoD deliverable).

Locked LLM split: Anthropic API direct for local dev (fast iteration, Max-plan testing affinity); Bedrock for production deploy (AWS-native auth, no egress, enterprise procurement story). Env var swap via cli-client.ts abstraction.

Locked deployment region: us-east-1 only for beta. Multi-region only when first paying customer pushes.

Locked sprint sequencing: Local-first week 1 (build library/connector + MCP-over-SSE locally, validate against Claude Desktop), then AWS deploy week 2. Week 0 half-day prep covers the AgentCore Runtime POC spike.

Sprint: 2 weeks + Week 0 half-day prep. See § 2-Week Beta Sprint above.

Open decisions deferred to subsequent sessions:

Bedrock production migration (week 3-4 after beta) — env var flip + model-access approval
Multi-tenancy enforcement / RLS (week 3-4, drives D13 closure)
Team invite flow + library admin UI (week 4+, before STEM Racing onboarding goes beyond 1 tester)
Audit log dashboard (week 6+)
Stripe + tier enforcement (post-MVP, when first paying customer signals readiness)
M365 + Copilot Business subscription (~$30-40/mo, 1 seat) — buy pre-enterprise validation OR before first external Copilot demo
Landing page sprint + AWS Activate Founders application (post-beta session)
Data ingestion patterns (OneDrive webhooks, GDrive, email-in, etc.) — post-beta, own doc series
AWS Activate Portfolio referral (independent of website readiness) — surface in next Andy conversation

Next session's first action: Day 0 AgentCore Runtime POC spike. Lock Stack B or fall back to Stack A based on findings.

Infra & Auth Plan#

Vision & Goals#

Constraints#

Recommended Stack#

MCP Architecture#

Access Control: Libraries & Connectors#

Tier Ladder#

2-Week Beta Sprint#

Week 0 (prep) — half day#

Week 1 — Local-first MVP#

Week 2 — AWS deploy#

Definition of done for v1 beta#

Explicitly NOT in scope for v1 beta#

Risks#

Stickiness & Migration Paths#

Considered Alternatives#

Recommendation#

Review