Web Stack — W3 DIY (CloudFront + S3 + Lambda)

Sub-system design doc for autri's web app layer (app.autri.ai). Locked direction per D43: hand-rolled CloudFront + S3 + Lambda Function URL topology, scale-to-zero, Lambda VPC-attached to reach private RDS. Replaces the Amplify Hosting choice from Stack B / D39.

Drafted 2026-05-26 after the Amplify SSR ↔ VPC architectural gap surfaced during EPIC-4 Day 11. Implementation pending the next session's design pass through this doc.

Cost Shape

Build & Deploy

The runtime topology above is what's running. Getting code into it is a separate concern with its own seams.

Architecture

Per-Component Breakdown

Overview

Risks & Constraints

Risk	Likelihood	Impact	Mitigation
Lambda cold-start latency on SSR routes (~500-800ms first byte)	High at beta scale (low traffic = frequent cold starts)	Medium (user-facing latency)	Accept for beta; revisit provisioned concurrency at v1.1 if user feedback warrants
Cold-start hit on every deploy (warm containers invalidated)	Certain	Low-Medium (dev workflow tax)	Document the cadence; revisit provisioned concurrency at 1-2 instances if dev iteration suffers
RDS connection pool exhaustion at warm-Lambda concurrency	Low at beta scale; rises sharply with traffic	High (500s on all SSR routes when hit)	Pool size 2 per Lambda; add RDS Proxy at v1.1 if connection exhaustion observed
Lambda bundle exceeds 50MB unzipped (zip mode)	Medium pre-measurement; lower with `/api/chat` split	High (deploy fails)	Split-Lambda strategy (Q1' below) keeps Main Lambda lean; pivot to container image (10GB ceiling) if zip cap hit; measure on first build
`/api/chat` 60s CloudFront origin response timeout truncates long streams	Certain for chat sessions >60s	Medium (chat UX degrades for very long responses)	Set CloudFront origin response timeout to 60s max for beta; build resumable streams (option (a) buffer-and-replay, ~1-2 days) in v1.1 if usage shows truncation
OAuth callback rejected during `app-w3.autri.ai` parallel validation	Certain without explicit handling	High (auth flow untestable on parallel domain)	Add `https://app-w3.autri.ai/api/auth/callback` to Cognito allowed callback URLs during parallel validation; remove after `app` CNAME swap completes
Server-action IAM blast radius for connector creation (D44)	Medium (depends on dep CVE landscape)	Critical (compromised SSR Lambda → backdoor Cognito clients)	Separate `connector-management` Lambda invoked from SSR Lambda via `lambda:Invoke`; IAM scoped to specific user pool ARN; only `cognito-idp:CreateUserPoolClient` + `UpdateUserPoolClient` + `DeleteUserPoolClient`
Cross-domain Cognito SSO ambiguity	Certain without explicit handling	High (auth may silently work web-side and fail MCP-side, or vice versa)	Two Cognito resource servers — `app.autri.ai` (web session audience) and `mcp.autri.ai` (MCP audience). JWT `aud` claim disambiguates; each Lambda validates audience explicitly
`db/client.ts` async init incompatible with drizzle's sync pool pattern	Medium (drizzle types assume sync pool)	High (forces refactor across call sites)	Validate top-level-await pattern early; lazy `Promise<Pool>` is the backup. Env-var branching preserves `pnpm dev` (DB_SECRET_ARN absent → `DATABASE_URL` fallback)
ENI exhaustion in private subnets	Low at beta scale (would need hundreds of concurrent Lambdas)	Medium (Lambdas fail to provision)	Subnet sizing /24 × 2 AZs ≈ 502 IPs, shared with migrations Lambda + future Fargate workers. Add CloudWatch alarm on ENI count > 50% subnet capacity; measure current ENI count during W3 implementation
CloudFront cache-key trap (auth pages served from cache to wrong user)	Low if configured carefully; catastrophic if misconfigured	Critical (data leak between users)	Use AWS-managed `AllViewerExceptHostHeader` + `CachingDisabled` policies for default behavior (Lambda origin); `CORS-S3Origin` + `CachingOptimized` for static behaviors. Verified in CDK code review
Cloudflare proxy-mode flip kills streaming	Low (currently DNS-only / gray cloud)	High (chat UX breaks)	Document that Cloudflare must stay DNS-only; only CloudFront does CDN duty
DNS swap to new CloudFront is one-way at propagation cycle	Medium	Medium (longer rollback window if W3 has issues)	Validate W3 end-to-end on parallel `app-w3.autri.ai` first; swap `app` CNAME last
Secrets Manager fetch on every cold start adds ~50-200ms	Certain	Low	Accept — VPC interface endpoint for Secrets Manager already provisioned, no internet trip
Lambda's 6 MB Function URL response cap	Low for autri use case (SSR HTML <100KB)	Critical when hit	Verify no SSR route returns >6 MB; flag in code review
Per-Lambda RDS connection: Lambda's single-request-per-container nature wastes pool	Certain	Low	Pool max=2 per Lambda; pool barely earns its keep but is harmless
WebSocket support absent on Function URL	Certain	Low for beta (notifications use bell-icon polling)	Polling stays the pattern for beta; APIGW WebSocket API as a separate origin if real-time UX becomes a requirement
Standalone output + hoisted pnpm interaction	Resolved (Pattern #2 lessons from Amplify session apply)	N/A	Re-enable `output: 'standalone'` in `next.config.mjs`; build via `pnpm install --frozen-lockfile && pnpm --filter @autri/app build && pnpm deploy --filter @autri/app --prod /tmp/standalone-build`

Current Status

Capability	Status
Topology design (this doc)	In Progress
CDK constructs (`WebStack` or rename existing `amplify.ts`)	Planned
Lambda handler wrapping Next.js standalone server	Planned
`db/client.ts` async init refactor	Planned
Build pipeline (manual `pnpm deploy:web` script, GHA later per D42 pattern)	Planned
CloudFront distribution + S3 bucket + OAC	Planned
RDS security group ingress from new Lambda SG	Planned
DNS swap from old Amplify CloudFront to new W3 CloudFront	Planned
Amplify CDK construct teardown (after W3 verified end-to-end)	Planned
Performance baseline (cold start measurements, SSR latency)	Planned

The Story

The web layer's architecture has evolved through three locked directions in a month:

Stack A (D33, 2026-05-18). All-Fargate: web app, MCP server, and ingestion workers all on Fargate behind an ALB. ~$95/mo idle floor. Locked as the "AWS-native, container-everywhere" direction.

Stack B (D39, 2026-05-21). AgentCore Runtime + Amplify + Fargate Tasks. Surfaced through the EPIC-1 AgentCore spike: a purpose-built AWS service for MCP server hosting (scale-to-zero, OAuth-native, session-aware microVMs) materially cheaper than Fargate for the MCP layer. Amplify Hosting picked for the web layer because it's CloudFront + Lambda + S3 managed — push-to-deploy from GitHub, opinionated build pipeline. ~$35-50/mo idle floor. Web layer SSR runs in Amplify's managed Lambda compute, not in our VPC.

Stack B-prime / W3 DIY (D43, 2026-05-26 — current). Amplify works through TLS + DNS + build, but its SSR Lambda compute cannot be placed in a VPC. Confirmed via aws-amplify/amplify-hosting#3362 — feature request open since March 2023, reopened February 2025 by AWS staff to track as still-pending. CDK CfnApp has no vpcConfig prop. Our RDS lives in a private subnet by design (per the secure-by-default NetworkAndData stack), so the Amplify SSR Lambda has no network path. Empirical: app.autri.ai/kb returned HTTP 500 with ECONNREFUSED 127.0.0.1:5432. Stack B's web layer pivoted to W3 — hand-rolled CloudFront + S3 + Lambda Function URL, where we control Lambda VPC config directly. AgentCore Runtime (for MCP) unchanged.

Why DIY over a framework (SST / OpenNext). Both DIY and framework variants of W3 produce the same runtime topology with the same cost shape. The DIY choice trades ~10 extra hours of CDK wiring for owning the CDK code directly with no framework abstractions to debug through. Worth being explicit: if framework abstractions had a clear win (faster iteration, better default behavior), the choice would flip. They don't, for our scale.

What Is This Sub-system?

The web layer hosts everything served from app.autri.ai: the user-facing Next.js application, including the visual extraction inspector, the KB management UI, the chat surface, the connector-creation flow, and the auth callback. It is the runtime home of all server-rendered pages and all /api/* route handlers in the autri Next.js app.

It is distinct from the MCP layer (mcp.autri.ai, runs on AgentCore Runtime per D39/D40) and the ingestion layer (Fargate Tasks, on-demand). All three layers share the same VPC, RDS instance, Cognito user pool, and Secrets Manager.

The sub-system's external interface is HTTPS at app.autri.ai. Its internal interface is the set of CDK stack outputs (Lambda function name, S3 bucket name, CloudFront distribution ID) that the deploy script consumes.

The Big Idea

W3 is one move: separate cheap-static delivery from on-demand compute, with a smart edge in front routing between them.

Next.js is two things wearing one trench coat:

A bunch of pre-rendered files (HTML shells, JS bundles, CSS, fonts, images in _next/static/) that never change between deploys
A server that runs route handlers + Server Components, hits the DB, streams chat tokens

Serving both from the same compute pays Lambda cold-start tax for assets that could've been served from a CDN cache in 5ms. Serving everything from S3 means no server code. W3 splits them.

Architecture Diagram

                      app.autri.ai (HTTPS)
                             │
                      [Cloudflare DNS] (DNS-only, gray cloud)
                             │
                      [CloudFront edge]
       ┌──────────────┬───────────────┬────────────────┐
       │              │               │                │
/_next/static/   /api/cache/*    /api/chat       everything else
/static/                                          (SSR + other /api/*)
       │              │               │                │
  [Static S3]    [Cache S3]     [Chat Lambda]    [Main Lambda]
                  (page renders)  Function URL    Function URL
                                       │                │
                                       └────────┬───────┘
                                                ▼
                                       [Both VPC-attached
                                        private subnets]
                              ┌─────────────────┼─────────────────┐
                              ↓                 ↓                 ↓
                          [RDS Postgres]  [Secrets Manager]   [Connector-Mgmt
                          port 5432       via VPC endpoint    Lambda]
                          via SG↔SG                           (invoked from
                                                              Main via
                                                              lambda:Invoke)
                                                                    │
                                                                    ▼
                                                              [Cognito user pool]
                                                              (CreateUserPoolClient,
                                                               UpdateUserPoolClient,
                                                               DeleteUserPoolClient)

Two Lambdas serving dynamic traffic (per Q1'):

Main Lambda — SSR pages, /api/auth/*, /api/upload/*, all other /api/* routes. Lean bundle (no Anthropic SDK).
Chat Lambda — /api/chat only. AI SDK + Anthropic SDK weight isolated here.

Connector-Management Lambda (per C2) — small, infrequent, tightly-scoped IAM. Invoked from Main Lambda's connector-creation server action via lambda:Invoke. Owns all cognito-idp Admin calls.

System Boundary

Owned by this sub-system:

CloudFront distribution for app.autri.ai (and temporary app-w3.autri.ai during parallel validation)
S3 bucket for static assets (separate from uploads bucket)
CloudFront behavior routing to existing S3 cache bucket (page renders from NetworkAndData); the bucket itself stays under NetworkAndData
Main Lambda function holding the Next.js standalone server (SSR pages + non-chat API routes)
Chat Lambda function holding /api/chat route handler (AI SDK + Anthropic SDK isolated)
Connector-Management Lambda with narrowly-scoped Cognito Admin IAM
Lambda execution roles + IAM policies for all three Lambdas
Security groups for the SSR + Chat Lambdas (shared SG OK; same RDS/Secrets access pattern)
ACM cert references (app.autri.ai cert reused from auth-and-compute; temporary cert issued for app-w3.autri.ai)
The Lambda → RDS connection path (security group rule on RDS allows ingress from Lambda SG)
The Lambda → Connector-Mgmt invocation path (lambda:Invoke IAM)
CloudFront Response Headers Policy (security headers)
CloudFront Origin Request + Cache Policies (pinned to AWS-managed combos)
CloudWatch alarms pinned in Cross-Cutting Concerns (Lambda errors, CloudFront 5xx, ENI count, RDS connections)
The build artifact upload script and CloudFront cache invalidation logic
Lambda alias-promote rollback machinery (current + prev aliases per Q10)

Not owned by this sub-system but interfaced with:

VPC + subnets (NetworkAndData stack)
RDS Postgres (NetworkAndData stack) — we only add the SG ingress rule
S3 cache bucket (NetworkAndData stack) — we only add the CloudFront behavior + OAC
Cognito user pool (AuthAndCompute stack) — Main + Chat Lambdas validate JWTs; Connector-Mgmt Lambda issues admin calls. Two resource servers per C3: app.autri.ai (web session audience) + mcp.autri.ai (MCP audience)
Secrets Manager secret holding DB credentials (NetworkAndData stack) — both Lambdas read it on cold start
Secrets Manager secret holding ANTHROPIC_API_KEY (AuthAndCompute stack) — Chat Lambda reads it on cold start
VPC interface endpoints for AWS services (already provisioned in NetworkAndData)
Cloudflare DNS records — manual updates outside CDK
The autri application code itself (lives in the autri repo; this sub-system specifies how it's deployed but not what it does)
Ingestion sub-system — user uploads land in uploads S3 bucket; ingestion reads from there (handoff details in ingestion sub-system doc)

CloudFront — the front door

The CDN. Globally distributed cache, terminates TLS at ~400 edge locations, decides which origin serves each request. Its job beyond caching: route by path to the right origin.

You've been watching CloudFront work this whole time — Amplify Hosting is literally CloudFront + an opinionated build pipeline. We're taking direct ownership of the distribution.

Origin Request + Cache Policies (per Q12):

Default behavior (→ Main Lambda): AllViewerExceptHostHeader + CachingDisabled. Forwards everything except Host (Lambda doesn't need it); Lambda responses never cache (no cache-key trap risk).
/api/chat behavior (→ Chat Lambda): same as default, plus origin response timeout = 60s max (CloudFront cap; longer chats truncate per H4 Failure Mode).
/_next/static/* + /static/* + favicon behaviors (→ Static S3): CORS-S3Origin + CachingOptimized. Long TTLs, no auth header forwarding.
/api/cache/* behavior (→ Cache S3 bucket): CORS-S3Origin + CachingOptimized. Reads page-render PNGs directly from S3 via Origin Access Control; no Lambda invocation.

Security headers via Response Headers Policy (per Q13):

HSTS: max-age=63072000; includeSubDomains; preload
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin
Content-Security-Policy (minimal v1): script-src 'self' 'unsafe-inline'; object-src 'none'; base-uri 'self'; rest at 'self' or 'data:'. Tighten in v1.1 once Next's inline-script needs are mapped.

What it provides:

TLS termination using ACM cert for app.autri.ai
Static asset caching (immutable hashed files → 1 year TTL; never re-hit origin)
DDoS absorption (absorbs SYN floods before they hit Lambda)
Compression (gzip/brotli)
HTTP/2, HTTP/3, IPv6
Origin Access Control (OAC) so both S3 buckets stay private

What it doesn't provide:

Authentication (Lambdas handle Cognito JWT validation)
Business logic (purely routing)

Idle cost: ~$0. Per-request + per-GB-out only. Zero traffic = zero cost.

S3 — the static origin

Two S3 origins served via CloudFront. Both buckets private; CloudFront accesses via Origin Access Control (OAC). Browsers only hit CloudFront, never S3 directly.

Static bucket (new, owned by this sub-system) — pre-built static files:

_next/static/chunks/*.js — Next's compiled bundles (hashed filenames, immutable)
_next/static/css/*.css
_next/static/media/* — fonts, images bundled by Next
static/* — files from app/public/
favicon.ico, robots.txt

Cache bucket (existing, owned by NetworkAndData) — page renders + ingestion artifacts:

PDF → PNG per-page renders (consumed by /api/cache/[...path] route layout for the inspector overlay)
Other ingestion-time cache files

Per Q11, the /api/cache/* CloudFront behavior routes directly to this bucket — no Lambda invocation. The cache key matches the URL path layout (e.g., /api/cache/<doc-id>/page-NN.png → S3 key <doc-id>/page-NN.png in the cache bucket).

NOT in either bucket:

HTML (server-rendered per request from Main Lambda)
User uploads (the existing uploads bucket in NetworkAndData owns those)

Idle cost: ~$0.01-0.05/mo for ~10MB static + variable cache content. Egress to CloudFront is intra-AWS, free.

Lambda + Function URL — the dynamic origin

Two Lambda functions per Q1', each with its own Function URL:

Main Lambda — Next.js SSR + non-chat API routes (/api/auth/*, /api/upload/*, server actions). Holds drizzle + pg + Next runtime + Cognito SDK. Lean bundle without AI SDK / Anthropic SDK weight.

Chat Lambda — /api/chat only. Holds AI SDK + Anthropic SDK + retrieval-tool wiring. Separate cold-start profile, separate connection pool.

Both Lambdas:

VPC-attached to private subnets (share the same SG OK)
Same db/client.ts async init pattern (top-level await + env-var branching per Q4)
Same secrets read on cold start: DB credentials from Secrets Manager
Chat Lambda additionally reads ANTHROPIC_API_KEY secret on cold start

A Function URL is a public HTTPS endpoint AWS attaches to a Lambda. CloudFront uses each as an origin (default behavior → Main; /api/chat behavior → Chat).

Function URL chosen over API Gateway HTTP API because:

Function URLs support response streaming (since 2023; GA in all regions Apr 2026 per AWS What's New). APIGW HTTP API does not — it buffers the entire response. Streaming matters for /api/chat.
Function URLs are free; APIGW charges per million requests.
Function URLs have 6 MB response cap vs APIGW's 10 MB. SSR HTML is <100KB; not a constraint.
Reconsidered post-split (2026-05-26): even with chat isolated to its own Lambda, APIGW for Main doesn't pay back. Throttling / WAF / custom auth — all redundant with what we're already doing or attachable to CloudFront. Two Function URLs keeps the mental model coherent.

Lambda lifecycle (critical to internalize):

A Lambda is NOT a long-running server. AWS spins up a container ("execution environment") on demand.
Cold start: first request to a fresh environment pays the boot tax — container boot + ENI attach + Node parse + JS module init + our top-level await (Secrets Manager fetch + pool init). Total: ~500-800ms for Main; slightly more (~600-1000ms) for Chat due to larger bundle.
Warm invocations: reuse the same container. Module-level state (pg pool, etc.) preserved. ~30-80ms typical.
Idle eviction: after ~5-15 min of no traffic, AWS kills the container. Next request is cold again.
Concurrency: 1 request per container at a time. 2 simultaneous requests → 2 containers.
Cold-start on every deploy: version updates invalidate all warm containers. Each deploy → next ~5-10 requests pay cold-start.

Idle cost: $0. The truest pay-only-when-used.

Packaging strategy (per Q8 — locked 2026-05-26 post-measurement): container image. Bundle measurement against the current autri codebase showed ~38MB unzipped excluding the /api/cache directory which evaporates when route is removed (per Q11), but also surfaced that hoisted pnpm + Next's outputFileTracing misses workspace-hoisted deps (AI SDK, Anthropic SDK, drizzle-orm) — pnpm deploy --prod packaging step is required to resolve, which produces container-image-friendly output anyway. Cold-start cost of container image (~150ms extra) is small relative to the ~500-800ms cold-start floor we already accepted. The zip-cap fight isn't worth having.

Build command (per Q9):

pnpm install --frozen-lockfile
pnpm --filter @autri/app build
pnpm deploy --filter @autri/app --prod /tmp/standalone-build

Matches the MCP server pattern. pnpm deploy --prod produces a self-contained dir with hoisted deps + Next standalone output; the container Dockerfile COPYs this into the image.

The wrapper handler: Next 14 standalone exports a server.js that runs an HTTP server. To run inside Lambda, we add a ~30-line wrapper that imports Next's request handler and translates Function URL events into Node's IncomingMessage + ServerResponse. Pattern used by SST, OpenNext, etc.

Container deploy ergonomics: ECR push + Lambda image update is ~30-60s slower than zip on first deploy of a version, but container layer caching keeps subsequent updates fast (~10-15s). Use Lambda's "update function code from ECR image URI" path; AWS handles the rest.

Connector-Management Lambda (the privilege-isolation Lambda)

A separate, small, infrequently-invoked Lambda that owns all Cognito Admin API calls. Lives in its own deployment artifact with a tightly-scoped IAM execution role.

Why a separate Lambda (per C2): The D44 connector-creation flow requires cognito-idp:CreateUserPoolClient permission. Granting that to the SSR Lambda would mean any RCE in the SSR Lambda (via dep CVE, prompt injection in a Server Component, etc.) could backdoor Cognito clients — a critical privilege-elevation surface. Isolating these privileges in a small, infrequently-invoked Lambda with a tightly-scoped IAM execution role minimizes the blast radius.

IAM scope:

Resource: specific Cognito user pool ARN (not *)
Actions: only cognito-idp:CreateUserPoolClient + UpdateUserPoolClient + DeleteUserPoolClient
No other Cognito Admin verbs (no AdminCreateUser, AdminDeleteUser, ListUsers, etc. — those stay in the post-confirmation Lambda's narrower scope)

Invocation path:

User clicks "Create Connector" or "Rotate Secret" in Main Lambda's UI
Main Lambda's server action validates the session + checks user owns the library
Main Lambda calls lambda:Invoke on Connector-Mgmt Lambda with {action: "create"|"rotate"|"delete", userId, libraryId, ...}
Connector-Mgmt Lambda makes the Cognito Admin API call(s), returns result
Main Lambda inserts/updates the connectors DB row + renders the UI response

Why not direct Cognito Admin calls from Main Lambda: Main Lambda's IAM is broad (RDS, S3, Secrets Manager, lambda:Invoke). Adding Cognito Admin verbs would compound the attack surface. The lambda:Invoke boundary creates a clear audit point — every Cognito client creation is an explicit cross-Lambda call.

Implementation footprint: <100 lines of code. No VPC config needed (Cognito API is reachable via VPC endpoint or NAT). 128MB memory, default timeout. Cold-start cost is acceptable because invocations are user-driven (not on every request).

Not in scope: OAuth code-exchange flow itself (that runs in Main Lambda as a server action — Cognito's token endpoint is public, no Admin IAM needed). Only the Cognito app client creation + rotation + deletion requires Admin IAM.

VPC + private subnets — where the Lambda lives

The NetworkAndData stack already provisions VPC + public + private subnets + NAT Gateway + VPC interface endpoints for Secrets Manager, STS, SSM, and ECR.

W3 attaches the SSR Lambda to the private subnets. This gives the Lambda:

A private IP via an ENI in our subnet
Network reachability to RDS (port 5432 via SG-to-SG)
Network reachability to the internet via NAT (for Anthropic API calls from /api/chat)
Network reachability to AWS services via the VPC interface endpoints (Secrets Manager fetch on cold start uses these — no internet trip)

This is the exact thing Amplify can't do. Amplify's SSR Lambda runs in AWS's own VPC; you cannot attach it to yours. The W3 Lambda is in our VPC, with explicit SG and IAM control.

Cold-start cost of VPC-attached Lambda: ~100-300ms extra for ENI provisioning. AWS heavily improved this in 2019 with "Hyperplane ENI" architecture; the latency is acceptable.

RDS Postgres — the database

Unchanged from current state. Lives in private subnets. Has a security group. W3 adds one rule: "allow ingress on port 5432 from the new SSR Lambda's security group." That's the network handshake.

Connection management nuance:

Pool size per Lambda: lean is max=2. Lambdas serve one request at a time per container, so a deep pool barely earns its keep. max=1 would work too but max=2 gives a buffer for the rare case of a long-running request and an incidental health check.
Total concurrent connections = Lambda concurrent containers × 2. At beta scale (~5 users × low traffic), this is in single digits. RDS t4g.small caps around 100 connections. Headroom is generous.
If usage grows past ~30 concurrent Lambdas, add RDS Proxy as a connection multiplexer. v1.1 concern.

Secrets Manager — credential vault

RDS publishes credentials via Secrets Manager: {username, password, host, port, dbname}. Rotates automatically if rotation is enabled.

Currently app/lib/db/client.ts reads process.env.DATABASE_URL synchronously. W3 changes this: the Lambda fetches the secret on cold start, builds the connection URL, then initializes the pool. This is the async init pattern (see Decisions Log D43.4).

How the Lambda gets permission to read: IAM. The execution role has a policy secretsmanager:GetSecretValue scoped to the specific secret ARN. No long-lived credentials anywhere in code or env vars.

How the Lambda reaches Secrets Manager from inside the VPC: through the existing VPC interface endpoint for Secrets Manager. No NAT traversal needed. ~30-50ms latency on cold start.

ACM cert + Cloudflare DNS

ACM cert for app.autri.ai exists today, validated via DNS CNAMEs in Cloudflare. The cert is tied to the domain, not the CloudFront distribution — so the W3 CloudFront attaches the same cert. No new validation needed.

Cloudflare DNS is configured DNS-only (gray cloud). It resolves app.autri.ai to CloudFront's CNAME and provides DDoS/WAF at the DNS layer. The W3 swap: repoint the app CNAME from the current Amplify CloudFront (d2bkdemcj0sjyg.cloudfront.net) to the new W3 CloudFront distribution's domain.

Must stay DNS-only: if Cloudflare is flipped to proxy mode (orange cloud), Cloudflare becomes another CDN layer in front of CloudFront. Cloudflare's proxy buffers responses — kills /api/chat streaming. The whole topology assumes only CloudFront does CDN duty.

Cognito (unchanged, but load-bearing for the request lifecycle)

The Cognito user pool + Google federated identity remain in the AuthAndCompute stack. The W3 Lambdas validate Cognito JWTs from session cookies (Server Components read cookies during render; auth middleware on /api/* routes does the same).

Cross-domain SSO model (per C3 — new): Cognito has two resource servers:

app.autri.ai — audience for web session tokens (issued by the user's Google federation flow, used by Main + Chat Lambdas)
mcp.autri.ai — audience for MCP tokens (issued by per-connector Cognito clients per D44, validated by the MCP server)

Each Lambda validates the JWT aud claim against its expected resource server identifier. Web Lambdas reject tokens with aud=mcp.autri.ai; MCP server rejects tokens with aud=app.autri.ai. Closes the cross-domain SSO carryforward from prior sessions.

OAuth callback URLs (per C1): Production callback is https://app.autri.ai/api/auth/callback. During parallel app-w3.autri.ai validation, also add https://app-w3.autri.ai/api/auth/callback to the Cognito user pool client's allowed callback URLs. Remove after app CNAME swap completes.

Server-side OAuth secret rotation (per Q16): Connector secrets shown once on creation. "Rotate secret" button calls Cognito update-user-pool-client --generate-secret via the Connector-Management Lambda; new secret shown once; user updates Claude Desktop. Old secret invalidated immediately; tokens issued under it remain valid until expiry (~1hr).

Request Lifecycle

Concrete trace for a logged-in user opening app.autri.ai/kb/abc-123/chat and sending a chat message.

1. Browser → Cloudflare DNS DNS lookup app.autri.ai → Cloudflare returns CNAME → CloudFront edge IP.

2. Browser → CloudFront (TLS handshake) CloudFront presents ACM cert; browser validates. HTTP/2 connection established.

3. CloudFront → Main Lambda (the SSR page) Path /kb/abc-123/chat doesn't match any specific behavior; falls to default → Main Lambda Function URL. CloudFront's Origin Request Policy (AllViewerExceptHostHeader) forwards cookies + headers + query string. Cache Policy (CachingDisabled) ensures the response is never cached.

4. Main Lambda cold start (first invocation only)

~100ms: container boots, ENI attaches
~200ms: Node starts, parses our bundle
~150ms: top-level await fires — Secrets Manager fetch via VPC endpoint + pg pool init
Total cold start: ~500-800ms before first byte

5. Main Lambda runs the request

Next dispatches to app/kb/[kbId]/chat/page.tsx
Server Component reads session cookie, validates Cognito JWT (audience = app.autri.ai), gets user_id
DB query: get KB by id, check user has access via library_access row
Renders React tree to HTML stream
Lambda flushes HTML chunks back to CloudFront

6. CloudFront → Browser HTML streams back; browser parses, sees <script src="/_next/static/chunks/..."> tags and fires off requests.

7. Static asset requests (parallel) Path /_next/static/chunks/abc.js matches the static behavior. CloudFront either serves from edge cache (~5ms) or fetches from S3 (50-200ms first time per edge location). After first request per edge, cached for a year.

8. Inspector renders a page image The chat page has inline <img src="/api/cache/<doc-id>/page-5.png"> references. Browser GET → CloudFront matches /api/cache/* behavior → CloudFront fetches from S3 cache bucket directly (no Lambda invocation). Sub-10ms latency from edge cache after first request.

9. User sends a chat message Browser POSTs /api/chat. CloudFront matches the /api/chat behavior → forwards to Chat Lambda Function URL (separate Lambda from step 5). Chat Lambda kicks off Anthropic streamText, returns a streaming response. Function URL streams back to CloudFront → browser. AI SDK on the client parses chunks, updates UI in real time. CloudFront origin response timeout = 60s cap (per Q12 + H4 Failure Mode) — chat responses exceeding 60s of wall-clock streaming get truncated; resumable streams are v1.1.

10. User creates a connector (server action path) User clicks "Create Connector" in /settings/connectors. Browser POSTs the form to Main Lambda's server action. Main Lambda:

Validates JWT (audience = app.autri.ai)
Calls lambda:Invoke on the Connector-Management Lambda with {action: "create", userId, libraryId, name}
Connector-Mgmt Lambda calls Cognito CreateUserPoolClient (auth_code grant, app.autri.ai/api/auth/callback redirect URI), returns {clientId, clientSecret}
Main Lambda inserts connectors row + initiates auth_code exchange (user already authenticated; instant) → captures access_token
Main Lambda renders the 3-paste-field UI: server URL + bearer token + client_id/client_secret (per D44 + D41)

Performance summary:

Cold-start path: ~500-800ms before first byte (Main); ~600-1000ms (Chat)
Warm path: ~30-80ms before first byte
Static assets + cache renders after first edge-cache hit: ~5-20ms
Streaming chat: continuous, no buffering anywhere in the path, capped at 60s wall-clock

Key Interfaces

Interface	Type	Consumers
`app.autri.ai` HTTPS endpoint	External HTTP	end users (browsers), Cognito auth callbacks
`app-w3.autri.ai` HTTPS endpoint (temporary, parallel validation only)	External HTTP	smoke testers; tear down post-swap
CloudFront distribution ID (main)	CDK stack output	deploy script (for cache invalidation)
CloudFront distribution ID (parallel `app-w3`, temporary)	CDK stack output	deploy script during validation only
Static S3 bucket name	CDK stack output	deploy script (for static asset sync)
Main Lambda function name + `current`/`prev` alias names	CDK stack output	deploy script (alias-promote rollback per Q10)
Chat Lambda function name + aliases	CDK stack output	deploy script
Connector-Mgmt Lambda function ARN	CDK stack output	Main Lambda runtime env (`CONNECTOR_MGMT_LAMBDA_ARN` for `lambda:Invoke`)
Lambda execution role ARNs (Main, Chat, Connector-Mgmt)	CDK stack output	secrets + RDS access + cross-Lambda invoke (IAM policy attachments)
Lambda security group ID (shared by Main + Chat)	Internal	RDS security group (ingress rule)
Connector-Mgmt Lambda IAM scope	Internal contract	scoped to Cognito user pool ARN; verbs: `cognito-idp:CreateUserPoolClient` + `UpdateUserPoolClient` + `DeleteUserPoolClient` only
ACM cert ARN (main)	Stack input	CloudFront viewer cert config
ACM cert ARN (temporary `app-w3`)	Stack input (temporary)	parallel CloudFront viewer cert
DB Secrets Manager secret ARN	Stack input	Lambda IAM policy + Lambda runtime env (`DB_SECRET_ARN`)
`ANTHROPIC_API_KEY` Secrets Manager secret ARN	Stack input	Chat Lambda IAM policy + runtime env
Cognito user pool ID + client ID	Stack input	Lambda code (JWT validation, OAuth callback)
Cognito resource server IDs (`app.autri.ai`, `mcp.autri.ai`)	Stack input	each Lambda's audience-claim validation (per C3)
VPC + private subnet IDs	Stack input	Main + Chat + Connector-Mgmt Lambda VPC config
`DB_SECRET_ARN` env var	Lambda runtime env	`db/client.ts` async init
`DATABASE_URL` env var	Lambda runtime env (absent in prod)	`db/client.ts` fallback for local `pnpm dev`
`AUTRI_APP_URL` env var	Lambda runtime env	citation links via D38 unified agent surface
`CONNECTOR_MGMT_LAMBDA_ARN` env var	Main Lambda runtime env	server action invokes Connector-Mgmt Lambda
Cognito-related env vars	Lambda runtime env	session validation, OAuth callback URL, audience claim expectations
CloudWatch alarms (Main/Chat error rates, CloudFront 5xx/4xx, ENI count, RDS connections)	Internal contract	monitoring stack consumes alarm ARNs; alarms emit to AWS Budgets SNS topic for beta

Build artifacts

Build produces two Lambda packaging artifacts + one static bundle, all from a single pnpm build in the app/ workspace (with output: 'standalone' re-enabled in next.config.mjs):

Build command sequence (per Q9):

pnpm install --frozen-lockfile
pnpm --filter @autri/app build
pnpm deploy --filter @autri/app --prod /tmp/standalone-build

Matches the MCP server pattern. pnpm deploy --prod resolves workspace deps (@autri/retrieval, etc.) into a real node_modules tree, then Next standalone packs only the imports actually traced from app code.

Artifacts:

Static bundle — app/.next/static/ (chunks, css, media) + app/public/. These are immutable per build; hashed filenames mean S3 can hold all versions forever without conflict.

Main Lambda artifact — server-side app/.next/standalone/ filtered to exclude Anthropic SDK + AI SDK imports (or accept the full bundle if measurement shows it fits zip cap). Holds drizzle, pg, Cognito SDK, Next runtime. Includes the ~30-line Function URL handler wrapper.

Chat Lambda artifact — separate package containing only /api/chat/route.ts + its deps (AI SDK + Anthropic SDK + retrieval-tool wiring). Built via a second build step that copies the relevant route + traces deps. Sized small (~5-10MB unzipped).

Packaging (per Q8): zip mode on first deploy; measure both Lambda artifacts; pivot to container image if Main Lambda exceeds 40MB unzipped. Container image escape hatch documented; cold-start cost (+~150ms) accepted as the trade for headroom.

Bundle size measurement is the first task of W3 implementation — drives the zip-vs-container call before CDK is finalized.

CDK provisioning vs deploy script split

Two axes that run on different cadences:

CDK (autri-infra repo) provisions the shape: Lambda function with VPC config + IAM role + env vars, S3 bucket with OAC + lifecycle policies, CloudFront distribution with behaviors, security group rules, all the IAM. Runs only when infra changes.

Deploy script (lives in autri repo) does the update: build, sync static to S3 with appropriate cache headers, zip server bundle, upload as new Lambda version, point alias at new version, invalidate CloudFront cache for HTML paths. Runs on every code change.

Same split as the MCP server pipeline already in place. Same conceptual cleanness: infra shape lives where infra lives; deploy artifacts live where code lives.

Build pipeline phasing

Per D42's pattern ("manual first, automate later"):

Phase 1 (first deploy, ~1 hour to set up): local pnpm deploy:web script in autri. Reads CDK stack outputs via aws cloudformation describe-stacks. Builds, syncs to S3, updates Lambda, invalidates CloudFront. Triggered manually.

Phase 2 (post-W3-verification, ~1 day to set up): GitHub Actions workflow on push to autri main. OIDC role assumed by GHA runner. Same deploy script, just triggered by CI. Audit trail via PR + workflow logs.

Manual-first is acceptable for the 1-dev cadence and matches D42's deployment pattern. Promotion to GHA triggered when frequency of deploys justifies CI cost.

Cache invalidation strategy

CloudFront caches based on path. Static assets at /_next/static/* have hashed filenames — never need invalidation. New build → new hashes → new paths.

HTML responses from Lambda are NOT cached by CloudFront (cache behavior: Cache-Control honors no-cache from Lambda, AND we explicitly set MinTTL=0 on the default behavior). No invalidation needed for HTML.

Only invalidation case: if /static/* (from app/public/) is updated. Those filenames are not hashed by Next. Deploy script issues a CloudFront CreateInvalidation for /static/* after the S3 sync. ~10 second propagation cost.

Rollback Strategy

Per Q10, alias-promote pattern.

Deploy flow:

Build artifacts; static bundle synced to S3 (immutable, no rollback risk)
New Lambda version uploaded for Main + Chat
Capture current current alias version → store as prev alias (overwrites old prev)
Update current alias to new version
CloudFront cache invalidation for /static/* (if needed)

Rollback flow (single command):

aws lambda update-alias \
  --function-name <name> \
  --name current \
  --function-version $(aws lambda get-alias --function-name <name> --name prev --query 'FunctionVersion' --output text)

Static assets in S3 are immutable and additive — rollback only touches Lambda. New code referencing newly-uploaded static paths will fail gracefully (404 on the static asset), but the old code at the rolled-back version references the older static paths which still exist in S3.

Edge cases (Failure Mode section also covers):

First deploy: no prev alias exists. Deploy script must check and refuse rollback when prev is unset
Manual aws CLI intervention between deploys: current and prev get out of sync. Deploy script logs both alias versions pre + post; rollback script can read logs for recovery
Cross-Lambda rollback (Main + Chat): roll back together (pnpm rollback:web) to avoid version drift between SSR and chat surfaces

What rollback does NOT handle:

Schema migrations applied during the deploy (D42's CDK custom-resource Lambda is the path for those, and migrations should always be backwards-compatible per D42)
CDK infra changes — those are a separate cdk deploy operation, with their own rollback story
DNS — app CNAME unchanged through deploys

Idle (zero traffic)

CloudFront: ~$0 (only per-request + per-GB-out)
S3: ~$0.01/mo (10MB stored, intra-AWS egress free)
Lambda: $0 (no invocations)
Lambda idle ENI: $0 (no charge for unused ENIs)
ACM cert: $0
Existing floor (RDS + NAT + Secrets Manager + Cognito): ~$90/mo (already there, unchanged)

W3 add-on idle cost: essentially $0/mo.

Compare to alternatives:

Fargate web layer (Stack A): +$50/mo always-on
App Runner: +$25/mo always-on
SST/OpenNext (W3 via framework): same $0 idle as DIY

Beta load

5 users × 100 requests/day × 30 days = 15,000 requests/mo. Mostly SSR + a few /api/chat streaming sessions.

Lambda: 15k × ~200ms × 1024MB memory = ~3.1 GB-seconds = $0.05
CloudFront: 15k requests + ~5 GB egress = $0.50
S3: $0.01

Total W3 add-on at beta load: ~$0.50-1/mo. Statistically zero above the existing floor.

Hypothetical 1k MAU

50k requests/day × 30 = 1.5M requests/mo, mixed SSR + static.

Lambda: ~300 GB-seconds = $5
CloudFront: ~$30 (mostly egress)
S3: $0.01

Total: ~$35/mo above existing floor. Cost only matters at scale (>1M req/mo), and even then it's lean. This is the entire point of W3.

Failure Modes

Operational characteristics worth knowing. Where W3 fails subtly:

Cold-start latency spikes. First request after Lambda eviction: 500-800ms. Quiet hour = every hour starts slow. Mitigations: provisioned concurrency ($), keep-warm pings (hacky), or accept for beta (recommended). At scale, k-warm provisioned concurrency targeted at peak hours is the standard pattern.

Cold-start hit on every deploy. Every Lambda version update invalidates all warm containers. Dev cadence implication: each deploy → next ~5-10 requests pay cold-start. Beta-user-visible only if a deploy lands during active session. Mitigations same as the general cold-start case; v1.1 candidate for provisioned concurrency target=1-2.

/api/chat truncation at 60s. CloudFront caps origin response duration at 60s max (default 30s; configurable up to 60). Streaming chat responses that exceed 60s of wall-clock time get silently truncated mid-stream. For beta — most chat turns are <60s; long ones are the exception. v1.1 fix: resumable streams via AI SDK's useChat reconnect + server-side buffer-and-replay (option (a), ~1-2 days lift). For Lambda invocations exceeding the 60s cap, the underlying Anthropic call may still complete on the server side; the failure is purely client-visible truncation.

RDS connection exhaustion. Each warm Lambda holds connections. t4g.small RDS caps at ~100 connections. Lean: pool max=2 per Lambda. ~50 concurrent warm Lambdas across both Main + Chat Lambdas = exhaustion. At beta scale, not a concern. At v1.1: RDS Proxy or capped Lambda concurrency.

ENI exhaustion in the VPC. Each warm Lambda holds an ENI in the subnet. Default VPC limits: 5000 ENIs / VPC. Subnet IPs run out before ENI quotas (a /24 subnet = 251 usable IPs). Subnet sizing matters. CloudWatch alarm on ENI count > 50% subnet capacity provides early warning.

The CloudFront cache-key trap. If we forward cookies as cache key, every authenticated request becomes uncacheable (intended). If we don't forward cookies, we might cache an authenticated page and serve to another user — data leak. Pinned behavior: AWS-managed AllViewerExceptHostHeader Origin Request Policy + CachingDisabled Cache Policy for the Lambda origins; CORS-S3Origin + CachingOptimized for static origins. Verified in CDK code review.

Cloudflare proxy mode flip. Currently DNS-only (gray cloud). If anyone flips to proxy mode (orange cloud), Cloudflare becomes another CDN layer. Cloudflare proxy buffers responses — kills /api/chat streaming. Keep DNS-only and document the constraint.

Streaming-killing intermediaries. APIGW HTTP API buffers (would kill streaming). Cloudflare proxy mode buffers (would kill streaming). Function URL + CloudFront in DNS-only-CDN mode preserves streaming end-to-end. The whole topology shape exists to keep the streaming path clean.

WebSocket absence. Function URL doesn't support WebSockets. Bell-icon notifications stay on polling for beta. If real-time UX (live notifications, multi-user collaboration, etc.) becomes a v1.1 requirement, add an APIGW WebSocket API as a separate origin behind CloudFront.

Lambda payload size cap (6 MB on Function URL). SSR HTML for our pages is <100KB; not a concern in normal operation. Could become one if a future feature dumps a large JSON response — flag in code review.

db/client.ts async init incompatibility with drizzle's singleton. Drizzle's pool singleton pattern assumes synchronous pool creation. Top-level await in the module breaks the type contract subtly: the exported db becomes Promise<NodePgDatabase> instead of NodePgDatabase. Two paths: (a) wrap call sites with await getDb(), breaking ~dozens of import sites; (b) export a sync proxy that lazy-awaits on first call. Decision: validate option A early in implementation, fall back to a proxy pattern if call-site changes are unmanageable. Local dev preserved via env-var branching: DB_SECRET_ARN present → fetch from Secrets Manager; absent → fall back to DATABASE_URL env var.

DNS swap is a one-way commitment for ~minutes. Once app CNAME points at the new CloudFront, rolling back means a propagation cycle (~5-15 min via Cloudflare). Mitigation: validate W3 end-to-end on parallel app-w3.autri.ai first; swap app CNAME only after smoke test passes. The parallel domain needs its own temporary ACM cert + a temporary Cognito callback URL entry (both removed post-swap).

Amplify teardown timing. Keep the existing Amplify app alive through W3 implementation (zero idle cost since Amplify scales to zero). Tear down CfnApp + CfnBranch + CfnDomain from CDK once W3 is verified end-to-end, not before. Don't delete the Amplify ACM cert validation CNAME prematurely either — the cert is shared.

Deploy rollback failure modes. Alias-promote pattern: if prev alias is missing (first deploy), rollback isn't possible — deploy script must check and refuse to deploy if no prev alias exists. If current and prev get out of sync (manual aws CLI intervention), rollback could regress further than intended. Mitigation: deploy script logs both alias versions before + after; rollback script reads logs as recovery aid.

Why W3 Over Alternatives

Landscape positioning for red-team's "should we have chosen X instead?" angle:

Approach	Idle cost	Setup complexity	Real-world usage	Why we didn't pick
Vercel	free → $$ at scale	trivial (push to deploy)	most Next.js teams	Hides infra; doesn't VPC-into our private RDS; lock-in to Vercel's pricing curve
Heroku / Render / Railway	$7-25/mo minimum	trivial	indie devs, prototypes	Always-on minimums; VPC story is complex; we already have an AWS commitment
ECS Fargate	~$50/mo always-on	medium	teams that want long-running containers	$50/mo always-on with no upside vs W3 for our shape; Stack A baseline
App Runner	~$25/mo always-on	low	AWS-curious devs who want containers	Managed Fargate; still no scale-to-zero; less control
Amplify Hosting	~$0 idle	low	teams wanting AWS push-to-deploy	The original Stack B choice. Can't VPC. (D43 root cause.)
SST / OpenNext (W3 via framework)	~$0	medium (3-5h scaffold)	AWS-natives wanting serverless Next	Same topology, framework abstractions to debug through; we chose ownership
W3 DIY (us)	~$0	higher one-time (~10-15h)	AWS power users, cost-sensitive teams	Locked per D43
EKS / K8s	varies	very high	enterprise w/ infra teams	Massive overkill for our scale; ops burden does not pay back

Strategic fit for autri's business shape: beta SaaS + unknown traffic + capital-disciplined runway → scale-to-zero is exactly the right cost shape. The $0 idle floor means we can let it sit dormant for weeks between users without burning budget, then absorb a bursty spike without provisioning anything. That's a much better match than Fargate's $50/mo always-on (assumes constant traffic we don't have yet).

Red-team angle: SST/OpenNext. The most credible alternative challenge is "we chose DIY for ownership, but framework abstractions would've saved 10 hours and given us the same runtime shape." Counter: framework versions of W3 are themselves opinionated about Lambda packaging, edge functions, and ISR. If we hit framework limits in implementation, we'd be debugging through the framework AND the AWS layer. DIY keeps the AWS layer as our only abstraction boundary. Worth re-evaluating at v1.1 if the maintenance burden of hand-rolled CDK + deploy script grows.

Epic	Doc	Status	Summary
EPIC-4: AWS Production Deploy	link	In Progress (Day 11+)	Originally implemented Stack B with Amplify; pivoted to W3 per D43. The W3 implementation work is folded into EPIC-4's amended scope.
EPIC-5: Beta Launch + Cost-Data Deliverable	link	Planned	Depends on W3 being live so beta users can reach the web app.

Cross-Cutting Concerns

Concern	How This Sub-system Is Affected
Authentication (Cognito, D34)	Main Lambda validates Cognito JWTs from session cookies on every request. JWT `aud` claim must equal `app.autri.ai` resource server identifier (cross-domain SSO contract — see Decisions Log). OAuth callback handler at `/api/auth/callback` runs in Main Lambda. Per-connector OAuth flow (D44) initiated from Main Lambda, executed by Connector-Management Lambda via `lambda:Invoke`.
MCP wedge (D19, D38)	Inspector links from MCP citations resolve to `app.autri.ai/docs/[id]#chunk-<chunkId>`. Main Lambda's hash-anchor handler scrolls + highlights cited chunks. Inspector-as-citation-surface depends on W3 being live.
Library/connector model (D34, D35)	Connector creation UI (per D44, gated on W3 being live) is a server action in Main Lambda. Action invokes Connector-Management Lambda (which creates the per-connector Cognito app client + initiates auth_code exchange); Main Lambda then inserts `connectors` row + displays 3-paste-field UI. Connector secret rotation ("Rotate client_secret" button per Decisions Log) follows the same invocation path.
Multi-tenancy isolation (D13)	Server-side scope enforcement in Server Components and `/api/*` handlers — every DB query scoped by JWT's `user_id` + library access checks. RLS at DB layer pending. Main Lambda is the chokepoint for enforcement at the request layer.
Streaming chat (D12 split + AI SDK)	`/api/chat` runs in Chat Lambda (split from Main per Q1'). Uses AI SDK's `streamText` + Anthropic provider. CloudFront origin response timeout pinned to 60s max for this behavior; resumable streams deferred to v1.1.
`/api/cache/[...path]` page renders	Cache PNGs live in S3 cache bucket (provisioned in `NetworkAndData` for page renders). CloudFront behavior `/api/cache/` routes directly to S3 — no Lambda invocation. Sub-10ms latency from edge cache after first request per edge. Cache S3 bucket needs Origin Access Control* like the static bucket.
User uploads → ingestion handoff	Main Lambda accepts uploads, writes to existing S3 `uploads` bucket (in `NetworkAndData`). Ingestion sub-system reads from there (S3 event trigger or direct Fargate task launch — detailed in ingestion sub-system doc, not in W3 scope).
DB migrations (D42)	Unchanged. Migrations still run via CDK custom-resource Lambda in `NetworkAndData`. Main + Chat Lambdas are consumers of the schema, not managers.
Local dev compatibility (`pnpm dev`)	`db/client.ts` env-var branches: `DB_SECRET_ARN` present (Lambda runtime) → fetch from Secrets Manager; absent (local dev) → use `DATABASE_URL` from `.env.local`. Same code path; no LocalStack required.
CSRF protection	Server actions use Next 14's built-in CSRF defense (per-build random action IDs as tokens) + `SameSite=Lax` session cookies. No explicit double-submit cookie or CSRF middleware needed for beta. Privileged actions (connector creation, secret rotation) inherit the same defense. Re-audit in pre-paid-customer security pass.
Observability — Lambda + CloudFront metrics	Per-Lambda CloudWatch Log Groups (30-day retention per monitoring stack). Per-route structured JSON logs (request ID, user_id when known, route, duration, status). Cost Explorer tag: `cost-bucket=web`.
Observability — CloudWatch alarms	Pinned set: (1) Main Lambda error rate > 5% over 5 min, (2) Chat Lambda error rate > 5% over 5 min, (3) Lambda concurrent executions > 50% of account quota, (4) CloudFront 5xx rate > 1% over 5 min, (5) CloudFront 4xx rate > 5% (catches broken static asset references), (6) ENI count > 50% subnet capacity, (7) RDS active-connections > 70% of `max_connections`. Alarms emit to existing AWS Budgets SNS topic for now; consider per-severity SNS topics post-beta.
Cost & budget alarms	Lambda + CloudFront + S3 are line items in the existing AWS Budgets ($50/$100/$200 thresholds). $0 idle cost means W3 doesn't move the needle until traffic shows up — but a runaway error loop (Lambda failing → retrying → invoking) could spike costs; concurrency cap is the defense.

Decisions Log

Date	Decision	Rationale	Alternatives Considered
2026-05-26	D43 — W3 DIY for web layer; supersedes Amplify	Amplify SSR cannot VPC; private RDS unreachable from Amplify Lambda; W3 gives us VPC control	Stack A all-Fargate ($50/mo idle); App Runner ($25/mo idle); SST/OpenNext W3 (~$0 idle, framework dep)
2026-05-26	Q1' — Split `/api/chat` into a separate Chat Lambda from day one (revises original Q1 single-Lambda)	Chat Lambda carries AI SDK + Anthropic SDK weight; Main Lambda stays lean for SSR. Cleaner cold-start profiles per route class. CloudFront routes by path	Single Lambda for everything: monolithic bundle, every cold start parses chat deps; defer-and-split-later: locks in the wrong shape early
2026-05-26	Q2 — Lambda Function URL only, no API Gateway	APIGW HTTP API buffers (kills `/api/chat` streaming); Function URL supports response streaming (GA all regions Apr 2026); cheaper; simpler. Reconsidered post-split — APIGW for Main doesn't pay back either	APIGW HTTP API + separate Function URL for `/api/chat`: two control planes; APIGW REST: more overhead
2026-05-26	Q3 — CloudFront multi-behavior routing: `/_next/static/` + `/static/` + favicon → static S3; `/api/cache/*` → cache S3; `/api/chat` → Chat Lambda; default → Main Lambda	Standard multi-behavior pattern; ~5 behaviors total; no Lambda@Edge needed; `/api/cache` bypasses Lambda entirely	CloudFront Functions / Lambda@Edge for routing: overkill
2026-05-26	Q4 — `db/client.ts` async init via top-level await on Lambda cold start; env-var branching preserves local dev	Next 14 supports top-level await in server modules; ~50-200ms cold-start hit acceptable; `DB_SECRET_ARN` present → fetch secret, absent → fallback to `DATABASE_URL`	Lazy `getPool(): Promise<Pool>`: touches all import sites; LocalStack for dev: heavier setup; build-time env injection: defeats secret rotation
2026-05-26	Q5 — Manual `pnpm deploy:web` script first; promote to GitHub Actions post-verification	D42 pattern (manual-first, automate-later); GHA needs OIDC + role setup (+1 day); manual gets W3 working end-to-end faster	GHA from day one: +1 day setup; auto-deploy on every commit might not be wanted yet
2026-05-26	Q6 — Build flow lives in `autri` repo; CDK shape in `autri-infra`	Same split as MCP pipeline; clean repo responsibility (code where code is; infra where infra is); cross-repo via stack outputs	All-in-`autri-infra`: infra owning app build logic feels wrong
2026-05-26	Q7 — Re-enable `output: 'standalone'` in `next.config.mjs`	Standalone is the right shape for Lambda packaging (self-contained server.js + minimal node_modules); Amplify reason for dropping doesn't apply	Keep non-standalone: would force packaging full `node_modules` tree, blowing past Lambda's 50MB cap
2026-05-26	Q8 — Lambda packaging: container image (locked post-measurement)	Bundle measurement on the actual autri codebase: ~38MB unzipped excluding `/api/cache` artifacts (which evaporate when route is removed per Q11). But hoisted pnpm causes Next's `outputFileTracing` to miss workspace-hoisted deps (AI SDK + Anthropic SDK + drizzle-orm) — `pnpm deploy --prod` is required to resolve, which produces container-image-friendly output. Zip-cap headroom is thin; container image's 10GB ceiling makes bundle size a non-design-constraint. ~150ms extra cold start is acceptable given the existing ~500-800ms floor	Zip + measure: bundle is on edge, fights add up over time; aggressive externals (Lambda Layers): fragile, breaks on each new dep
2026-05-26	Q9 — Build command sequence: `pnpm install --frozen-lockfile && pnpm --filter @autri/app build && pnpm deploy --filter @autri/app --prod /tmp/standalone-build`	Matches MCP server pattern. `pnpm deploy --prod` produces self-contained dir with hoisted deps + Next standalone output. Container image COPYs this	Standard Next standalone without `pnpm deploy`: misses workspace deps (`@autri/retrieval`); turborepo/nx: heavier than warranted
2026-05-26	Q10 — Deployment rollback via alias-promote pattern: `current` + `prev` Lambda aliases	Deploy: new version uploaded → `prev` re-points to old `current` → `current` re-points to new. Rollback: swap aliases. Static assets immutable in S3, untouched by rollback	Manual rollback via `aws lambda update-alias`: easy to forget under pressure; deploy-script-records-version: same as alias-promote but without named handles
2026-05-26	Q11 — `/api/cache/[...path]` routes directly from CloudFront to S3 cache bucket (no Lambda invocation)	Sub-10ms edge cache after first request; no Lambda cost per cache hit; cache key matches URL path layout. Also evaporates 197MB of local-cache artifacts from Next's output trace	Lambda reads S3 on demand: +50-200ms latency + Lambda cost per cache request; build-time sync: doesn't work for runtime-generated cache from ingestion
2026-05-26	Q12 — CloudFront Origin Request + Cache Policies pinned to AWS-managed: default behavior = `AllViewerExceptHostHeader` + `CachingDisabled`; static behaviors = `CORS-S3Origin` + `CachingOptimized`	Industry-standard combo for this exact topology. Cookie/auth headers forwarded to Lambda; Lambda responses never cached; static assets cache by path. Prevents cache-key trap	Custom policies with explicit allowlist: tighter control, more CDK, risk of missing a needed header; unspecified-let-CDK-be-truth: future readers must reverse-engineer
2026-05-26	Q13 — Minimal-but-real security headers via CloudFront Response Headers Policy	HSTS (max-age=63072000, includeSubDomains, preload), X-Frame-Options=DENY, X-Content-Type-Options=nosniff, Referrer-Policy=strict-origin-when-cross-origin, minimal CSP (`script-src 'self' 'unsafe-inline'`; others 'self'/'data:'). Tighten CSP in v1.1 once Next inline-script needs mapped	AWS managed `SecurityHeadersPolicy`: no CSP; defer entirely: bad audit posture
2026-05-26	Q14 — CSRF via Next 14 server actions' built-in defense + `SameSite=Lax` cookies	Per-build random action IDs serve as CSRF tokens; SameSite=Lax prevents most CSRF vectors. Privileged actions (connector creation) inherit the same defense	Explicit double-submit cookie: extra code, no clear threat-model justification yet; defer entirely: risks audit gap
2026-05-26	Q15 — Separate temporary ACM cert for `app-w3.autri.ai` parallel-validation domain	Cut by blue-team — see annotation on this section. Parallel domain validation removed from scope; direct cutover with fix-forward.	(kept in log as institutional context)
2026-05-26	Q16 — Connector `client_secret` recovery via "Rotate secret" button per connector	Deferred to v1.1 by blue-team — see annotation. Beta uses delete + recreate as recovery path.	(kept in log as institutional context)
2026-05-26	C1 (D43.C1) — During `app-w3.autri.ai` parallel validation, add `https://app-w3.autri.ai/api/auth/callback` to Cognito allowed callback URLs; remove on `app` CNAME swap	Cut by blue-team alongside Q15.	(kept in log as institutional context)
2026-05-26	C2 (D43.C2) — Connector-Management Lambda separated from SSR; IAM scoped to user pool ARN with verbs `cognito-idp:CreateUserPoolClient` + `UpdateUserPoolClient` + `DeleteUserPoolClient`	SSR Lambda's blast radius stays minimal — an RCE there can't backdoor Cognito clients. Connector-Mgmt Lambda is small, infrequent, tightly scoped	SSR Lambda holds Cognito Admin IAM: wider blast radius; defer to D44 implementation: risks defaulting to the easier-but-less-secure path
2026-05-26	C3 (D43.C3) — Two Cognito resource servers: `app.autri.ai` (web session audience) and `mcp.autri.ai` (MCP audience). JWT `aud` claim disambiguates; each Lambda validates audience explicitly	Closes the carryforward cross-domain SSO gap from prior sessions. Audit / scope separation between web sessions and MCP usage. Per-connector clients (D44) issue tokens with `mcp.autri.ai` audience	Single resource server with both subdomains in allowed audiences: loses audit separation; defer to first deploy: gap discovered in production

Known Issues / Tech Debt

Issue	Severity	Notes
Cold-start UX during low-traffic beta hours	Medium	First request every ~10-15 min idle is 500-800ms. Acceptable for beta; user-feedback signal will dictate provisioned-concurrency need
Cold-start hit on every deploy	Low-Medium	Dev workflow tax. Visible to beta users only if deploy lands during active session. Same mitigations as general cold-start
No measured bundle size for the standalone server	High (pre-implementation)	First task in implementation: build + measure. If >40MB unzipped, pivot to container image per Q8
`db/client.ts` async init may force broader refactor	Medium	If drizzle's singleton typing fights the top-level-await pattern, fallback is a lazy proxy. Touches `~30+` import sites if it cascades
No build pipeline yet (manual deploys only at Phase 1)	Low	Per D42 pattern. Promotion to GHA is planned but not gating beta
Per-connector Cognito client proliferation (D44 implication)	Low	Each connector creates a Cognito app client. Default limit: 1000 clients/pool. Not a beta concern; flagged for v1.1
Token-refresh UX in Claude Desktop (downstream of D44)	Medium	When a bearer token expires, does Claude Desktop transparently refresh via client_id/secret + refresh_token? Unknown empirically. Validate during first beta user setup
`hoisted` pnpm linker behavior on `ingestion/` and `retrieval/` workspaces	Low	Switched at workspace root last session for Amplify. Local dev for those packages should be re-verified at start of W3 implementation
`app-w3.autri.ai` parallel-validation domain not yet provisioned	Medium	Needs temporary ACM cert + DNS record + Cognito callback URL entry. All three tear down after `app` CNAME swap completes — script the teardown to avoid forgetting
Amplify Hosting resources cost ~$0 idle but consume CDK surface	Low	Plan: keep alive through W3 verification, then delete `CfnApp` + `CfnBranch` + `CfnDomain` from `auth-and-compute` stack
Connector `client_secret` rotation flow needs UX work (Q16)	Medium	"Rotate secret" button server action + modal showing new secret once. Implement during D44 connector-creation flow build; test old-secret invalidation behavior empirically (~1hr token still valid post-rotate)
Resumable streams for `/api/chat` (H4 v1.1 work)	Medium	60s CloudFront cap truncates long chat responses. Option (a) buffer-and-replay is ~1-2 days, simplest. Ship if beta usage shows truncation; otherwise defer
Tighter CSP in v1.1 (Q13)	Low	Beta ships with permissive `script-src 'self' 'unsafe-inline'`. Tighten once Next inline-script needs are mapped (likely needs nonce-based CSP)
WebSocket origin via APIGW (L1 v1.1)	Low	If real-time UX (live notifications, multi-user collaboration) becomes a v1.1 requirement, add APIGW WebSocket API as a separate origin behind CloudFront
ENI count CloudWatch alarm (H1)	Low	Alarm on ENI count > 50% subnet capacity. Implementation: CloudWatch custom metric from VPC describe-network-interfaces (Lambda on schedule), or VPC Insights if available
Connector-Management Lambda CDK scaffolding (C2)	Medium	New construct in `auth-and-compute` stack. Smallest viable Lambda; IAM execution role with narrowly-scoped Cognito Admin policy. Test path: invoke from SSR Lambda via `lambda:Invoke`, verify scope enforcement
*`/api/cache/` → S3 cache bucket OAC** (Q11)	Medium	New CloudFront behavior routes directly to existing cache S3 bucket. Cache bucket needs OAC like static bucket. Verify cache file path layout matches URL `[...path]` semantics
CloudWatch alarm topic selection (alarms cross-cutting)	Low	Beta uses existing AWS Budgets SNS topic for all alarms. Post-beta: consider per-severity SNS topics + PagerDuty integration for paying-customer threshold

This sub-system defines the runtime topology for app.autri.ai. If removed, the entire user-facing web product would break — chat, inspector, KB management, connector creation, settings, auth callback all depend on it. Update this doc when topology, build pipeline, or cross-component interfaces change.

Web Stack — W3 DIY (CloudFront + S3 + Lambda)#

Cost Shape#

Build & Deploy#

Architecture#

Per-Component Breakdown#

Overview#

Risks & Constraints#

Current Status#

The Story#

What Is This Sub-system?#

The Big Idea#

Architecture Diagram#

System Boundary#

CloudFront — the front door#

S3 — the static origin#

Lambda + Function URL — the dynamic origin#

Connector-Management Lambda (the privilege-isolation Lambda)#

VPC + private subnets — where the Lambda lives#

RDS Postgres — the database#

Secrets Manager — credential vault#

ACM cert + Cloudflare DNS#

Cognito (unchanged, but load-bearing for the request lifecycle)#

Request Lifecycle#

Key Interfaces#

Build artifacts#

CDK provisioning vs deploy script split#

Build pipeline phasing#

Cache invalidation strategy#

Rollback Strategy#

Idle (zero traffic)#

Beta load#

Hypothetical 1k MAU#

Failure Modes#

Why W3 Over Alternatives#

Related Epics#

Cross-Cutting Concerns#

Decisions Log#

Known Issues / Tech Debt#

Review