Web Stack — W3 DIY (CloudFront + S3 + Lambda)
Sub-system design doc for autri's web app layer (app.autri.ai). Locked direction per D43: hand-rolled CloudFront + S3 + Lambda Function URL topology, scale-to-zero, Lambda VPC-attached to reach private RDS. Replaces the Amplify Hosting choice from Stack B / D39.
Drafted 2026-05-26 after the Amplify SSR ↔ VPC architectural gap surfaced during EPIC-4 Day 11. Implementation pending the next session's design pass through this doc.
Cost Shape
Build & Deploy
The runtime topology above is what's running. Getting code into it is a separate concern with its own seams.
Architecture
Per-Component Breakdown
Overview
Risks & Constraints
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Lambda cold-start latency on SSR routes (~500-800ms first byte) | High at beta scale (low traffic = frequent cold starts) | Medium (user-facing latency) | Accept for beta; revisit provisioned concurrency at v1.1 if user feedback warrants |
| Cold-start hit on every deploy (warm containers invalidated) | Certain | Low-Medium (dev workflow tax) | Document the cadence; revisit provisioned concurrency at 1-2 instances if dev iteration suffers |
| RDS connection pool exhaustion at warm-Lambda concurrency | Low at beta scale; rises sharply with traffic | High (500s on all SSR routes when hit) | Pool size 2 per Lambda; add RDS Proxy at v1.1 if connection exhaustion observed |
| Lambda bundle exceeds 50MB unzipped (zip mode) | Medium pre-measurement; lower with /api/chat split | High (deploy fails) | Split-Lambda strategy (Q1' below) keeps Main Lambda lean; pivot to container image (10GB ceiling) if zip cap hit; measure on first build |
/api/chat 60s CloudFront origin response timeout truncates long streams | Certain for chat sessions >60s | Medium (chat UX degrades for very long responses) | Set CloudFront origin response timeout to 60s max for beta; build resumable streams (option (a) buffer-and-replay, ~1-2 days) in v1.1 if usage shows truncation |
OAuth callback rejected during app-w3.autri.ai parallel validation | Certain without explicit handling | High (auth flow untestable on parallel domain) | Add https://app-w3.autri.ai/api/auth/callback to Cognito allowed callback URLs during parallel validation; remove after app CNAME swap completes |
| Server-action IAM blast radius for connector creation (D44) | Medium (depends on dep CVE landscape) | Critical (compromised SSR Lambda → backdoor Cognito clients) | Separate connector-management Lambda invoked from SSR Lambda via lambda:Invoke; IAM scoped to specific user pool ARN; only cognito-idp:CreateUserPoolClient + UpdateUserPoolClient + DeleteUserPoolClient |
| Cross-domain Cognito SSO ambiguity | Certain without explicit handling | High (auth may silently work web-side and fail MCP-side, or vice versa) | Two Cognito resource servers — app.autri.ai (web session audience) and mcp.autri.ai (MCP audience). JWT aud claim disambiguates; each Lambda validates audience explicitly |
db/client.ts async init incompatible with drizzle's sync pool pattern | Medium (drizzle types assume sync pool) | High (forces refactor across call sites) | Validate top-level-await pattern early; lazy Promise<Pool> is the backup. Env-var branching preserves pnpm dev (DB_SECRET_ARN absent → DATABASE_URL fallback) |
| ENI exhaustion in private subnets | Low at beta scale (would need hundreds of concurrent Lambdas) | Medium (Lambdas fail to provision) | Subnet sizing /24 × 2 AZs ≈ 502 IPs, shared with migrations Lambda + future Fargate workers. Add CloudWatch alarm on ENI count > 50% subnet capacity; measure current ENI count during W3 implementation |
| CloudFront cache-key trap (auth pages served from cache to wrong user) | Low if configured carefully; catastrophic if misconfigured | Critical (data leak between users) | Use AWS-managed AllViewerExceptHostHeader + CachingDisabled policies for default behavior (Lambda origin); CORS-S3Origin + CachingOptimized for static behaviors. Verified in CDK code review |
| Cloudflare proxy-mode flip kills streaming | Low (currently DNS-only / gray cloud) | High (chat UX breaks) | Document that Cloudflare must stay DNS-only; only CloudFront does CDN duty |
| DNS swap to new CloudFront is one-way at propagation cycle | Medium | Medium (longer rollback window if W3 has issues) | Validate W3 end-to-end on parallel app-w3.autri.ai first; swap app CNAME last |
| Secrets Manager fetch on every cold start adds ~50-200ms | Certain | Low | Accept — VPC interface endpoint for Secrets Manager already provisioned, no internet trip |
| Lambda's 6 MB Function URL response cap | Low for autri use case (SSR HTML <100KB) | Critical when hit | Verify no SSR route returns >6 MB; flag in code review |
| Per-Lambda RDS connection: Lambda's single-request-per-container nature wastes pool | Certain | Low | Pool max=2 per Lambda; pool barely earns its keep but is harmless |
| WebSocket support absent on Function URL | Certain | Low for beta (notifications use bell-icon polling) | Polling stays the pattern for beta; APIGW WebSocket API as a separate origin if real-time UX becomes a requirement |
| Standalone output + hoisted pnpm interaction | Resolved (Pattern #2 lessons from Amplify session apply) | N/A | Re-enable output: 'standalone' in next.config.mjs; build via pnpm install --frozen-lockfile && pnpm --filter @autri/app build && pnpm deploy --filter @autri/app --prod /tmp/standalone-build |
Current Status
| Capability | Status |
|---|---|
| Topology design (this doc) | In Progress |
CDK constructs (WebStack or rename existing amplify.ts) | Planned |
| Lambda handler wrapping Next.js standalone server | Planned |
db/client.ts async init refactor | Planned |
Build pipeline (manual pnpm deploy:web script, GHA later per D42 pattern) | Planned |
| CloudFront distribution + S3 bucket + OAC | Planned |
| RDS security group ingress from new Lambda SG | Planned |
| DNS swap from old Amplify CloudFront to new W3 CloudFront | Planned |
| Amplify CDK construct teardown (after W3 verified end-to-end) | Planned |
| Performance baseline (cold start measurements, SSR latency) | Planned |
The Story
The web layer's architecture has evolved through three locked directions in a month:
Stack A (D33, 2026-05-18). All-Fargate: web app, MCP server, and ingestion workers all on Fargate behind an ALB. ~$95/mo idle floor. Locked as the "AWS-native, container-everywhere" direction.
Stack B (D39, 2026-05-21). AgentCore Runtime + Amplify + Fargate Tasks. Surfaced through the EPIC-1 AgentCore spike: a purpose-built AWS service for MCP server hosting (scale-to-zero, OAuth-native, session-aware microVMs) materially cheaper than Fargate for the MCP layer. Amplify Hosting picked for the web layer because it's CloudFront + Lambda + S3 managed — push-to-deploy from GitHub, opinionated build pipeline. ~$35-50/mo idle floor. Web layer SSR runs in Amplify's managed Lambda compute, not in our VPC.
Stack B-prime / W3 DIY (D43, 2026-05-26 — current). Amplify works through TLS + DNS + build, but its SSR Lambda compute cannot be placed in a VPC. Confirmed via aws-amplify/amplify-hosting#3362 — feature request open since March 2023, reopened February 2025 by AWS staff to track as still-pending. CDK CfnApp has no vpcConfig prop. Our RDS lives in a private subnet by design (per the secure-by-default NetworkAndData stack), so the Amplify SSR Lambda has no network path. Empirical: app.autri.ai/kb returned HTTP 500 with ECONNREFUSED 127.0.0.1:5432. Stack B's web layer pivoted to W3 — hand-rolled CloudFront + S3 + Lambda Function URL, where we control Lambda VPC config directly. AgentCore Runtime (for MCP) unchanged.
Why DIY over a framework (SST / OpenNext). Both DIY and framework variants of W3 produce the same runtime topology with the same cost shape. The DIY choice trades ~10 extra hours of CDK wiring for owning the CDK code directly with no framework abstractions to debug through. Worth being explicit: if framework abstractions had a clear win (faster iteration, better default behavior), the choice would flip. They don't, for our scale.
What Is This Sub-system?
The web layer hosts everything served from app.autri.ai: the user-facing Next.js application, including the visual extraction inspector, the KB management UI, the chat surface, the connector-creation flow, and the auth callback. It is the runtime home of all server-rendered pages and all /api/* route handlers in the autri Next.js app.
It is distinct from the MCP layer (mcp.autri.ai, runs on AgentCore Runtime per D39/D40) and the ingestion layer (Fargate Tasks, on-demand). All three layers share the same VPC, RDS instance, Cognito user pool, and Secrets Manager.
The sub-system's external interface is HTTPS at app.autri.ai. Its internal interface is the set of CDK stack outputs (Lambda function name, S3 bucket name, CloudFront distribution ID) that the deploy script consumes.
The Big Idea
W3 is one move: separate cheap-static delivery from on-demand compute, with a smart edge in front routing between them.
Next.js is two things wearing one trench coat:
- A bunch of pre-rendered files (HTML shells, JS bundles, CSS, fonts, images in
_next/static/) that never change between deploys - A server that runs route handlers + Server Components, hits the DB, streams chat tokens
Serving both from the same compute pays Lambda cold-start tax for assets that could've been served from a CDN cache in 5ms. Serving everything from S3 means no server code. W3 splits them.
Architecture Diagram
app.autri.ai (HTTPS)
│
[Cloudflare DNS] (DNS-only, gray cloud)
│
[CloudFront edge]
┌──────────────┬───────────────┬────────────────┐
│ │ │ │
/_next/static/ /api/cache/* /api/chat everything else
/static/ (SSR + other /api/*)
│ │ │ │
[Static S3] [Cache S3] [Chat Lambda] [Main Lambda]
(page renders) Function URL Function URL
│ │
└────────┬───────┘
▼
[Both VPC-attached
private subnets]
┌─────────────────┼─────────────────┐
↓ ↓ ↓
[RDS Postgres] [Secrets Manager] [Connector-Mgmt
port 5432 via VPC endpoint Lambda]
via SG↔SG (invoked from
Main via
lambda:Invoke)
│
▼
[Cognito user pool]
(CreateUserPoolClient,
UpdateUserPoolClient,
DeleteUserPoolClient)
Two Lambdas serving dynamic traffic (per Q1'):
- Main Lambda — SSR pages,
/api/auth/*,/api/upload/*, all other/api/*routes. Lean bundle (no Anthropic SDK). - Chat Lambda —
/api/chatonly. AI SDK + Anthropic SDK weight isolated here.
Connector-Management Lambda (per C2) — small, infrequent, tightly-scoped IAM. Invoked from Main Lambda's connector-creation server action via lambda:Invoke. Owns all cognito-idp Admin calls.
System Boundary
Owned by this sub-system:
- CloudFront distribution for
app.autri.ai(and temporaryapp-w3.autri.aiduring parallel validation) - S3 bucket for static assets (separate from
uploadsbucket) - CloudFront behavior routing to existing S3 cache bucket (
page rendersfromNetworkAndData); the bucket itself stays underNetworkAndData - Main Lambda function holding the Next.js standalone server (SSR pages + non-chat API routes)
- Chat Lambda function holding
/api/chatroute handler (AI SDK + Anthropic SDK isolated) - Connector-Management Lambda with narrowly-scoped Cognito Admin IAM
- Lambda execution roles + IAM policies for all three Lambdas
- Security groups for the SSR + Chat Lambdas (shared SG OK; same RDS/Secrets access pattern)
- ACM cert references (
app.autri.aicert reused fromauth-and-compute; temporary cert issued forapp-w3.autri.ai) - The Lambda → RDS connection path (security group rule on RDS allows ingress from Lambda SG)
- The Lambda → Connector-Mgmt invocation path (
lambda:InvokeIAM) - CloudFront Response Headers Policy (security headers)
- CloudFront Origin Request + Cache Policies (pinned to AWS-managed combos)
- CloudWatch alarms pinned in Cross-Cutting Concerns (Lambda errors, CloudFront 5xx, ENI count, RDS connections)
- The build artifact upload script and CloudFront cache invalidation logic
- Lambda alias-promote rollback machinery (
current+prevaliases per Q10)
Not owned by this sub-system but interfaced with:
- VPC + subnets (
NetworkAndDatastack) - RDS Postgres (
NetworkAndDatastack) — we only add the SG ingress rule - S3 cache bucket (
NetworkAndDatastack) — we only add the CloudFront behavior + OAC - Cognito user pool (
AuthAndComputestack) — Main + Chat Lambdas validate JWTs; Connector-Mgmt Lambda issues admin calls. Two resource servers per C3:app.autri.ai(web session audience) +mcp.autri.ai(MCP audience) - Secrets Manager secret holding DB credentials (
NetworkAndDatastack) — both Lambdas read it on cold start - Secrets Manager secret holding
ANTHROPIC_API_KEY(AuthAndComputestack) — Chat Lambda reads it on cold start - VPC interface endpoints for AWS services (already provisioned in
NetworkAndData) - Cloudflare DNS records — manual updates outside CDK
- The
autriapplication code itself (lives in theautrirepo; this sub-system specifies how it's deployed but not what it does) - Ingestion sub-system — user uploads land in
uploadsS3 bucket; ingestion reads from there (handoff details in ingestion sub-system doc)
CloudFront — the front door
The CDN. Globally distributed cache, terminates TLS at ~400 edge locations, decides which origin serves each request. Its job beyond caching: route by path to the right origin.
You've been watching CloudFront work this whole time — Amplify Hosting is literally CloudFront + an opinionated build pipeline. We're taking direct ownership of the distribution.
Origin Request + Cache Policies (per Q12):
- Default behavior (→ Main Lambda):
AllViewerExceptHostHeader+CachingDisabled. Forwards everything except Host (Lambda doesn't need it); Lambda responses never cache (no cache-key trap risk). /api/chatbehavior (→ Chat Lambda): same as default, plus origin response timeout = 60s max (CloudFront cap; longer chats truncate per H4 Failure Mode)./_next/static/*+/static/*+ favicon behaviors (→ Static S3):CORS-S3Origin+CachingOptimized. Long TTLs, no auth header forwarding./api/cache/*behavior (→ Cache S3 bucket):CORS-S3Origin+CachingOptimized. Reads page-render PNGs directly from S3 via Origin Access Control; no Lambda invocation.
Security headers via Response Headers Policy (per Q13):
- HSTS:
max-age=63072000; includeSubDomains; preload - X-Frame-Options:
DENY - X-Content-Type-Options:
nosniff - Referrer-Policy:
strict-origin-when-cross-origin - Content-Security-Policy (minimal v1):
script-src 'self' 'unsafe-inline'; object-src 'none'; base-uri 'self'; rest at'self'or'data:'. Tighten in v1.1 once Next's inline-script needs are mapped.
What it provides:
- TLS termination using ACM cert for
app.autri.ai - Static asset caching (immutable hashed files → 1 year TTL; never re-hit origin)
- DDoS absorption (absorbs SYN floods before they hit Lambda)
- Compression (gzip/brotli)
- HTTP/2, HTTP/3, IPv6
- Origin Access Control (OAC) so both S3 buckets stay private
What it doesn't provide:
- Authentication (Lambdas handle Cognito JWT validation)
- Business logic (purely routing)
Idle cost: ~$0. Per-request + per-GB-out only. Zero traffic = zero cost.
S3 — the static origin
Two S3 origins served via CloudFront. Both buckets private; CloudFront accesses via Origin Access Control (OAC). Browsers only hit CloudFront, never S3 directly.
Static bucket (new, owned by this sub-system) — pre-built static files:
_next/static/chunks/*.js— Next's compiled bundles (hashed filenames, immutable)_next/static/css/*.css_next/static/media/*— fonts, images bundled by Nextstatic/*— files fromapp/public/favicon.ico,robots.txt
Cache bucket (existing, owned by NetworkAndData) — page renders + ingestion artifacts:
- PDF → PNG per-page renders (consumed by
/api/cache/[...path]route layout for the inspector overlay) - Other ingestion-time cache files
Per Q11, the /api/cache/* CloudFront behavior routes directly to this bucket — no Lambda invocation. The cache key matches the URL path layout (e.g., /api/cache/<doc-id>/page-NN.png → S3 key <doc-id>/page-NN.png in the cache bucket).
NOT in either bucket:
- HTML (server-rendered per request from Main Lambda)
- User uploads (the existing
uploadsbucket inNetworkAndDataowns those)
Idle cost: ~$0.01-0.05/mo for ~10MB static + variable cache content. Egress to CloudFront is intra-AWS, free.
Lambda + Function URL — the dynamic origin
Two Lambda functions per Q1', each with its own Function URL:
Main Lambda — Next.js SSR + non-chat API routes (/api/auth/*, /api/upload/*, server actions). Holds drizzle + pg + Next runtime + Cognito SDK. Lean bundle without AI SDK / Anthropic SDK weight.
Chat Lambda — /api/chat only. Holds AI SDK + Anthropic SDK + retrieval-tool wiring. Separate cold-start profile, separate connection pool.
Both Lambdas:
- VPC-attached to private subnets (share the same SG OK)
- Same
db/client.tsasync init pattern (top-level await + env-var branching per Q4) - Same secrets read on cold start: DB credentials from Secrets Manager
- Chat Lambda additionally reads
ANTHROPIC_API_KEYsecret on cold start
A Function URL is a public HTTPS endpoint AWS attaches to a Lambda. CloudFront uses each as an origin (default behavior → Main; /api/chat behavior → Chat).
Function URL chosen over API Gateway HTTP API because:
- Function URLs support response streaming (since 2023; GA in all regions Apr 2026 per AWS What's New). APIGW HTTP API does not — it buffers the entire response. Streaming matters for
/api/chat. - Function URLs are free; APIGW charges per million requests.
- Function URLs have 6 MB response cap vs APIGW's 10 MB. SSR HTML is <100KB; not a constraint.
- Reconsidered post-split (2026-05-26): even with chat isolated to its own Lambda, APIGW for Main doesn't pay back. Throttling / WAF / custom auth — all redundant with what we're already doing or attachable to CloudFront. Two Function URLs keeps the mental model coherent.
Lambda lifecycle (critical to internalize):
- A Lambda is NOT a long-running server. AWS spins up a container ("execution environment") on demand.
- Cold start: first request to a fresh environment pays the boot tax — container boot + ENI attach + Node parse + JS module init + our top-level await (Secrets Manager fetch + pool init). Total: ~500-800ms for Main; slightly more (~600-1000ms) for Chat due to larger bundle.
- Warm invocations: reuse the same container. Module-level state (pg pool, etc.) preserved. ~30-80ms typical.
- Idle eviction: after ~5-15 min of no traffic, AWS kills the container. Next request is cold again.
- Concurrency: 1 request per container at a time. 2 simultaneous requests → 2 containers.
- Cold-start on every deploy: version updates invalidate all warm containers. Each deploy → next ~5-10 requests pay cold-start.
Idle cost: $0. The truest pay-only-when-used.
Packaging strategy (per Q8 — locked 2026-05-26 post-measurement): container image. Bundle measurement against the current autri codebase showed ~38MB unzipped excluding the /api/cache directory which evaporates when route is removed (per Q11), but also surfaced that hoisted pnpm + Next's outputFileTracing misses workspace-hoisted deps (AI SDK, Anthropic SDK, drizzle-orm) — pnpm deploy --prod packaging step is required to resolve, which produces container-image-friendly output anyway. Cold-start cost of container image (~150ms extra) is small relative to the ~500-800ms cold-start floor we already accepted. The zip-cap fight isn't worth having.
Build command (per Q9):
pnpm install --frozen-lockfile
pnpm --filter @autri/app build
pnpm deploy --filter @autri/app --prod /tmp/standalone-build
Matches the MCP server pattern. pnpm deploy --prod produces a self-contained dir with hoisted deps + Next standalone output; the container Dockerfile COPYs this into the image.
The wrapper handler: Next 14 standalone exports a server.js that runs an HTTP server. To run inside Lambda, we add a ~30-line wrapper that imports Next's request handler and translates Function URL events into Node's IncomingMessage + ServerResponse. Pattern used by SST, OpenNext, etc.
Container deploy ergonomics: ECR push + Lambda image update is ~30-60s slower than zip on first deploy of a version, but container layer caching keeps subsequent updates fast (~10-15s). Use Lambda's "update function code from ECR image URI" path; AWS handles the rest.
Connector-Management Lambda (the privilege-isolation Lambda)
A separate, small, infrequently-invoked Lambda that owns all Cognito Admin API calls. Lives in its own deployment artifact with a tightly-scoped IAM execution role.
Why a separate Lambda (per C2): The D44 connector-creation flow requires cognito-idp:CreateUserPoolClient permission. Granting that to the SSR Lambda would mean any RCE in the SSR Lambda (via dep CVE, prompt injection in a Server Component, etc.) could backdoor Cognito clients — a critical privilege-elevation surface. Isolating these privileges in a small, infrequently-invoked Lambda with a tightly-scoped IAM execution role minimizes the blast radius.
IAM scope:
- Resource: specific Cognito user pool ARN (not
*) - Actions: only
cognito-idp:CreateUserPoolClient+UpdateUserPoolClient+DeleteUserPoolClient - No other Cognito Admin verbs (no
AdminCreateUser,AdminDeleteUser,ListUsers, etc. — those stay in the post-confirmation Lambda's narrower scope)
Invocation path:
- User clicks "Create Connector" or "Rotate Secret" in Main Lambda's UI
- Main Lambda's server action validates the session + checks user owns the library
- Main Lambda calls
lambda:Invokeon Connector-Mgmt Lambda with{action: "create"|"rotate"|"delete", userId, libraryId, ...} - Connector-Mgmt Lambda makes the Cognito Admin API call(s), returns result
- Main Lambda inserts/updates the
connectorsDB row + renders the UI response
Why not direct Cognito Admin calls from Main Lambda: Main Lambda's IAM is broad (RDS, S3, Secrets Manager, lambda:Invoke). Adding Cognito Admin verbs would compound the attack surface. The lambda:Invoke boundary creates a clear audit point — every Cognito client creation is an explicit cross-Lambda call.
Implementation footprint: <100 lines of code. No VPC config needed (Cognito API is reachable via VPC endpoint or NAT). 128MB memory, default timeout. Cold-start cost is acceptable because invocations are user-driven (not on every request).
Not in scope: OAuth code-exchange flow itself (that runs in Main Lambda as a server action — Cognito's token endpoint is public, no Admin IAM needed). Only the Cognito app client creation + rotation + deletion requires Admin IAM.
VPC + private subnets — where the Lambda lives
The NetworkAndData stack already provisions VPC + public + private subnets + NAT Gateway + VPC interface endpoints for Secrets Manager, STS, SSM, and ECR.
W3 attaches the SSR Lambda to the private subnets. This gives the Lambda:
- A private IP via an ENI in our subnet
- Network reachability to RDS (port 5432 via SG-to-SG)
- Network reachability to the internet via NAT (for Anthropic API calls from
/api/chat) - Network reachability to AWS services via the VPC interface endpoints (Secrets Manager fetch on cold start uses these — no internet trip)
This is the exact thing Amplify can't do. Amplify's SSR Lambda runs in AWS's own VPC; you cannot attach it to yours. The W3 Lambda is in our VPC, with explicit SG and IAM control.
Cold-start cost of VPC-attached Lambda: ~100-300ms extra for ENI provisioning. AWS heavily improved this in 2019 with "Hyperplane ENI" architecture; the latency is acceptable.
RDS Postgres — the database
Unchanged from current state. Lives in private subnets. Has a security group. W3 adds one rule: "allow ingress on port 5432 from the new SSR Lambda's security group." That's the network handshake.
Connection management nuance:
- Pool size per Lambda: lean is
max=2. Lambdas serve one request at a time per container, so a deep pool barely earns its keep.max=1would work too butmax=2gives a buffer for the rare case of a long-running request and an incidental health check. - Total concurrent connections = Lambda concurrent containers × 2. At beta scale (~5 users × low traffic), this is in single digits. RDS t4g.small caps around 100 connections. Headroom is generous.
- If usage grows past ~30 concurrent Lambdas, add RDS Proxy as a connection multiplexer. v1.1 concern.
Secrets Manager — credential vault
RDS publishes credentials via Secrets Manager: {username, password, host, port, dbname}. Rotates automatically if rotation is enabled.
Currently app/lib/db/client.ts reads process.env.DATABASE_URL synchronously. W3 changes this: the Lambda fetches the secret on cold start, builds the connection URL, then initializes the pool. This is the async init pattern (see Decisions Log D43.4).
How the Lambda gets permission to read: IAM. The execution role has a policy secretsmanager:GetSecretValue scoped to the specific secret ARN. No long-lived credentials anywhere in code or env vars.
How the Lambda reaches Secrets Manager from inside the VPC: through the existing VPC interface endpoint for Secrets Manager. No NAT traversal needed. ~30-50ms latency on cold start.
ACM cert + Cloudflare DNS
ACM cert for app.autri.ai exists today, validated via DNS CNAMEs in Cloudflare. The cert is tied to the domain, not the CloudFront distribution — so the W3 CloudFront attaches the same cert. No new validation needed.
Cloudflare DNS is configured DNS-only (gray cloud). It resolves app.autri.ai to CloudFront's CNAME and provides DDoS/WAF at the DNS layer. The W3 swap: repoint the app CNAME from the current Amplify CloudFront (d2bkdemcj0sjyg.cloudfront.net) to the new W3 CloudFront distribution's domain.
Must stay DNS-only: if Cloudflare is flipped to proxy mode (orange cloud), Cloudflare becomes another CDN layer in front of CloudFront. Cloudflare's proxy buffers responses — kills /api/chat streaming. The whole topology assumes only CloudFront does CDN duty.
Cognito (unchanged, but load-bearing for the request lifecycle)
The Cognito user pool + Google federated identity remain in the AuthAndCompute stack. The W3 Lambdas validate Cognito JWTs from session cookies (Server Components read cookies during render; auth middleware on /api/* routes does the same).
Cross-domain SSO model (per C3 — new): Cognito has two resource servers:
app.autri.ai— audience for web session tokens (issued by the user's Google federation flow, used by Main + Chat Lambdas)mcp.autri.ai— audience for MCP tokens (issued by per-connector Cognito clients per D44, validated by the MCP server)
Each Lambda validates the JWT aud claim against its expected resource server identifier. Web Lambdas reject tokens with aud=mcp.autri.ai; MCP server rejects tokens with aud=app.autri.ai. Closes the cross-domain SSO carryforward from prior sessions.
OAuth callback URLs (per C1): Production callback is https://app.autri.ai/api/auth/callback. During parallel app-w3.autri.ai validation, also add https://app-w3.autri.ai/api/auth/callback to the Cognito user pool client's allowed callback URLs. Remove after app CNAME swap completes.
Server-side OAuth secret rotation (per Q16): Connector secrets shown once on creation. "Rotate secret" button calls Cognito update-user-pool-client --generate-secret via the Connector-Management Lambda; new secret shown once; user updates Claude Desktop. Old secret invalidated immediately; tokens issued under it remain valid until expiry (~1hr).
Request Lifecycle
Concrete trace for a logged-in user opening app.autri.ai/kb/abc-123/chat and sending a chat message.
1. Browser → Cloudflare DNS
DNS lookup app.autri.ai → Cloudflare returns CNAME → CloudFront edge IP.
2. Browser → CloudFront (TLS handshake) CloudFront presents ACM cert; browser validates. HTTP/2 connection established.
3. CloudFront → Main Lambda (the SSR page)
Path /kb/abc-123/chat doesn't match any specific behavior; falls to default → Main Lambda Function URL. CloudFront's Origin Request Policy (AllViewerExceptHostHeader) forwards cookies + headers + query string. Cache Policy (CachingDisabled) ensures the response is never cached.
4. Main Lambda cold start (first invocation only)
- ~100ms: container boots, ENI attaches
- ~200ms: Node starts, parses our bundle
- ~150ms: top-level await fires — Secrets Manager fetch via VPC endpoint + pg pool init
- Total cold start: ~500-800ms before first byte
5. Main Lambda runs the request
- Next dispatches to
app/kb/[kbId]/chat/page.tsx - Server Component reads session cookie, validates Cognito JWT (audience =
app.autri.ai), getsuser_id - DB query: get KB by id, check user has access via
library_accessrow - Renders React tree to HTML stream
- Lambda flushes HTML chunks back to CloudFront
6. CloudFront → Browser
HTML streams back; browser parses, sees <script src="/_next/static/chunks/..."> tags and fires off requests.
7. Static asset requests (parallel)
Path /_next/static/chunks/abc.js matches the static behavior. CloudFront either serves from edge cache (~5ms) or fetches from S3 (50-200ms first time per edge location). After first request per edge, cached for a year.
8. Inspector renders a page image
The chat page has inline <img src="/api/cache/<doc-id>/page-5.png"> references. Browser GET → CloudFront matches /api/cache/* behavior → CloudFront fetches from S3 cache bucket directly (no Lambda invocation). Sub-10ms latency from edge cache after first request.
9. User sends a chat message
Browser POSTs /api/chat. CloudFront matches the /api/chat behavior → forwards to Chat Lambda Function URL (separate Lambda from step 5). Chat Lambda kicks off Anthropic streamText, returns a streaming response. Function URL streams back to CloudFront → browser. AI SDK on the client parses chunks, updates UI in real time. CloudFront origin response timeout = 60s cap (per Q12 + H4 Failure Mode) — chat responses exceeding 60s of wall-clock streaming get truncated; resumable streams are v1.1.
10. User creates a connector (server action path)
User clicks "Create Connector" in /settings/connectors. Browser POSTs the form to Main Lambda's server action. Main Lambda:
- Validates JWT (audience =
app.autri.ai) - Calls
lambda:Invokeon the Connector-Management Lambda with{action: "create", userId, libraryId, name} - Connector-Mgmt Lambda calls Cognito
CreateUserPoolClient(auth_code grant,app.autri.ai/api/auth/callbackredirect URI), returns{clientId, clientSecret} - Main Lambda inserts
connectorsrow + initiates auth_code exchange (user already authenticated; instant) → capturesaccess_token - Main Lambda renders the 3-paste-field UI: server URL + bearer token + client_id/client_secret (per D44 + D41)
Performance summary:
- Cold-start path: ~500-800ms before first byte (Main); ~600-1000ms (Chat)
- Warm path: ~30-80ms before first byte
- Static assets + cache renders after first edge-cache hit: ~5-20ms
- Streaming chat: continuous, no buffering anywhere in the path, capped at 60s wall-clock
Key Interfaces
| Interface | Type | Consumers |
|---|---|---|
app.autri.ai HTTPS endpoint | External HTTP | end users (browsers), Cognito auth callbacks |
app-w3.autri.ai HTTPS endpoint (temporary, parallel validation only) | External HTTP | smoke testers; tear down post-swap |
| CloudFront distribution ID (main) | CDK stack output | deploy script (for cache invalidation) |
CloudFront distribution ID (parallel app-w3, temporary) | CDK stack output | deploy script during validation only |
| Static S3 bucket name | CDK stack output | deploy script (for static asset sync) |
Main Lambda function name + current/prev alias names | CDK stack output | deploy script (alias-promote rollback per Q10) |
| Chat Lambda function name + aliases | CDK stack output | deploy script |
| Connector-Mgmt Lambda function ARN | CDK stack output | Main Lambda runtime env (CONNECTOR_MGMT_LAMBDA_ARN for lambda:Invoke) |
| Lambda execution role ARNs (Main, Chat, Connector-Mgmt) | CDK stack output | secrets + RDS access + cross-Lambda invoke (IAM policy attachments) |
| Lambda security group ID (shared by Main + Chat) | Internal | RDS security group (ingress rule) |
| Connector-Mgmt Lambda IAM scope | Internal contract | scoped to Cognito user pool ARN; verbs: cognito-idp:CreateUserPoolClient + UpdateUserPoolClient + DeleteUserPoolClient only |
| ACM cert ARN (main) | Stack input | CloudFront viewer cert config |
ACM cert ARN (temporary app-w3) | Stack input (temporary) | parallel CloudFront viewer cert |
| DB Secrets Manager secret ARN | Stack input | Lambda IAM policy + Lambda runtime env (DB_SECRET_ARN) |
ANTHROPIC_API_KEY Secrets Manager secret ARN | Stack input | Chat Lambda IAM policy + runtime env |
| Cognito user pool ID + client ID | Stack input | Lambda code (JWT validation, OAuth callback) |
Cognito resource server IDs (app.autri.ai, mcp.autri.ai) | Stack input | each Lambda's audience-claim validation (per C3) |
| VPC + private subnet IDs | Stack input | Main + Chat + Connector-Mgmt Lambda VPC config |
DB_SECRET_ARN env var | Lambda runtime env | db/client.ts async init |
DATABASE_URL env var | Lambda runtime env (absent in prod) | db/client.ts fallback for local pnpm dev |
AUTRI_APP_URL env var | Lambda runtime env | citation links via D38 unified agent surface |
CONNECTOR_MGMT_LAMBDA_ARN env var | Main Lambda runtime env | server action invokes Connector-Mgmt Lambda |
| Cognito-related env vars | Lambda runtime env | session validation, OAuth callback URL, audience claim expectations |
| CloudWatch alarms (Main/Chat error rates, CloudFront 5xx/4xx, ENI count, RDS connections) | Internal contract | monitoring stack consumes alarm ARNs; alarms emit to AWS Budgets SNS topic for beta |
Build artifacts
Build produces two Lambda packaging artifacts + one static bundle, all from a single pnpm build in the app/ workspace (with output: 'standalone' re-enabled in next.config.mjs):
Build command sequence (per Q9):
pnpm install --frozen-lockfile
pnpm --filter @autri/app build
pnpm deploy --filter @autri/app --prod /tmp/standalone-build
Matches the MCP server pattern. pnpm deploy --prod resolves workspace deps (@autri/retrieval, etc.) into a real node_modules tree, then Next standalone packs only the imports actually traced from app code.
Artifacts:
Static bundle — app/.next/static/ (chunks, css, media) + app/public/. These are immutable per build; hashed filenames mean S3 can hold all versions forever without conflict.
Main Lambda artifact — server-side app/.next/standalone/ filtered to exclude Anthropic SDK + AI SDK imports (or accept the full bundle if measurement shows it fits zip cap). Holds drizzle, pg, Cognito SDK, Next runtime. Includes the ~30-line Function URL handler wrapper.
Chat Lambda artifact — separate package containing only /api/chat/route.ts + its deps (AI SDK + Anthropic SDK + retrieval-tool wiring). Built via a second build step that copies the relevant route + traces deps. Sized small (~5-10MB unzipped).
Packaging (per Q8): zip mode on first deploy; measure both Lambda artifacts; pivot to container image if Main Lambda exceeds 40MB unzipped. Container image escape hatch documented; cold-start cost (+~150ms) accepted as the trade for headroom.
Bundle size measurement is the first task of W3 implementation — drives the zip-vs-container call before CDK is finalized.
CDK provisioning vs deploy script split
Two axes that run on different cadences:
CDK (autri-infra repo) provisions the shape: Lambda function with VPC config + IAM role + env vars, S3 bucket with OAC + lifecycle policies, CloudFront distribution with behaviors, security group rules, all the IAM. Runs only when infra changes.
Deploy script (lives in autri repo) does the update: build, sync static to S3 with appropriate cache headers, zip server bundle, upload as new Lambda version, point alias at new version, invalidate CloudFront cache for HTML paths. Runs on every code change.
Same split as the MCP server pipeline already in place. Same conceptual cleanness: infra shape lives where infra lives; deploy artifacts live where code lives.
Build pipeline phasing
Per D42's pattern ("manual first, automate later"):
Phase 1 (first deploy, ~1 hour to set up): local pnpm deploy:web script in autri. Reads CDK stack outputs via aws cloudformation describe-stacks. Builds, syncs to S3, updates Lambda, invalidates CloudFront. Triggered manually.
Phase 2 (post-W3-verification, ~1 day to set up): GitHub Actions workflow on push to autri main. OIDC role assumed by GHA runner. Same deploy script, just triggered by CI. Audit trail via PR + workflow logs.
Manual-first is acceptable for the 1-dev cadence and matches D42's deployment pattern. Promotion to GHA triggered when frequency of deploys justifies CI cost.
Cache invalidation strategy
CloudFront caches based on path. Static assets at /_next/static/* have hashed filenames — never need invalidation. New build → new hashes → new paths.
HTML responses from Lambda are NOT cached by CloudFront (cache behavior: Cache-Control honors no-cache from Lambda, AND we explicitly set MinTTL=0 on the default behavior). No invalidation needed for HTML.
Only invalidation case: if /static/* (from app/public/) is updated. Those filenames are not hashed by Next. Deploy script issues a CloudFront CreateInvalidation for /static/* after the S3 sync. ~10 second propagation cost.
Rollback Strategy
Per Q10, alias-promote pattern.
Deploy flow:
- Build artifacts; static bundle synced to S3 (immutable, no rollback risk)
- New Lambda version uploaded for Main + Chat
- Capture current
currentalias version → store asprevalias (overwrites oldprev) - Update
currentalias to new version - CloudFront cache invalidation for
/static/*(if needed)
Rollback flow (single command):
aws lambda update-alias \
--function-name <name> \
--name current \
--function-version $(aws lambda get-alias --function-name <name> --name prev --query 'FunctionVersion' --output text)
Static assets in S3 are immutable and additive — rollback only touches Lambda. New code referencing newly-uploaded static paths will fail gracefully (404 on the static asset), but the old code at the rolled-back version references the older static paths which still exist in S3.
Edge cases (Failure Mode section also covers):
- First deploy: no
prevalias exists. Deploy script must check and refuse rollback whenprevis unset - Manual aws CLI intervention between deploys:
currentandprevget out of sync. Deploy script logs both alias versions pre + post; rollback script can read logs for recovery - Cross-Lambda rollback (Main + Chat): roll back together (
pnpm rollback:web) to avoid version drift between SSR and chat surfaces
What rollback does NOT handle:
- Schema migrations applied during the deploy (D42's CDK custom-resource Lambda is the path for those, and migrations should always be backwards-compatible per D42)
- CDK infra changes — those are a separate
cdk deployoperation, with their own rollback story - DNS —
appCNAME unchanged through deploys
Idle (zero traffic)
- CloudFront: ~$0 (only per-request + per-GB-out)
- S3: ~$0.01/mo (10MB stored, intra-AWS egress free)
- Lambda: $0 (no invocations)
- Lambda idle ENI: $0 (no charge for unused ENIs)
- ACM cert: $0
- Existing floor (RDS + NAT + Secrets Manager + Cognito): ~$90/mo (already there, unchanged)
W3 add-on idle cost: essentially $0/mo.
Compare to alternatives:
- Fargate web layer (Stack A): +$50/mo always-on
- App Runner: +$25/mo always-on
- SST/OpenNext (W3 via framework): same $0 idle as DIY
Beta load
5 users × 100 requests/day × 30 days = 15,000 requests/mo. Mostly SSR + a few /api/chat streaming sessions.
- Lambda: 15k × ~200ms × 1024MB memory = ~3.1 GB-seconds = $0.05
- CloudFront: 15k requests + ~5 GB egress = $0.50
- S3: $0.01
Total W3 add-on at beta load: ~$0.50-1/mo. Statistically zero above the existing floor.
Hypothetical 1k MAU
50k requests/day × 30 = 1.5M requests/mo, mixed SSR + static.
- Lambda: ~300 GB-seconds = $5
- CloudFront: ~$30 (mostly egress)
- S3: $0.01
Total: ~$35/mo above existing floor. Cost only matters at scale (>1M req/mo), and even then it's lean. This is the entire point of W3.
Failure Modes
Operational characteristics worth knowing. Where W3 fails subtly:
Cold-start latency spikes. First request after Lambda eviction: 500-800ms. Quiet hour = every hour starts slow. Mitigations: provisioned concurrency ($), keep-warm pings (hacky), or accept for beta (recommended). At scale, k-warm provisioned concurrency targeted at peak hours is the standard pattern.
Cold-start hit on every deploy. Every Lambda version update invalidates all warm containers. Dev cadence implication: each deploy → next ~5-10 requests pay cold-start. Beta-user-visible only if a deploy lands during active session. Mitigations same as the general cold-start case; v1.1 candidate for provisioned concurrency target=1-2.
/api/chat truncation at 60s. CloudFront caps origin response duration at 60s max (default 30s; configurable up to 60). Streaming chat responses that exceed 60s of wall-clock time get silently truncated mid-stream. For beta — most chat turns are <60s; long ones are the exception. v1.1 fix: resumable streams via AI SDK's useChat reconnect + server-side buffer-and-replay (option (a), ~1-2 days lift). For Lambda invocations exceeding the 60s cap, the underlying Anthropic call may still complete on the server side; the failure is purely client-visible truncation.
RDS connection exhaustion. Each warm Lambda holds connections. t4g.small RDS caps at ~100 connections. Lean: pool max=2 per Lambda. ~50 concurrent warm Lambdas across both Main + Chat Lambdas = exhaustion. At beta scale, not a concern. At v1.1: RDS Proxy or capped Lambda concurrency.
ENI exhaustion in the VPC. Each warm Lambda holds an ENI in the subnet. Default VPC limits: 5000 ENIs / VPC. Subnet IPs run out before ENI quotas (a /24 subnet = 251 usable IPs). Subnet sizing matters. CloudWatch alarm on ENI count > 50% subnet capacity provides early warning.
The CloudFront cache-key trap. If we forward cookies as cache key, every authenticated request becomes uncacheable (intended). If we don't forward cookies, we might cache an authenticated page and serve to another user — data leak. Pinned behavior: AWS-managed AllViewerExceptHostHeader Origin Request Policy + CachingDisabled Cache Policy for the Lambda origins; CORS-S3Origin + CachingOptimized for static origins. Verified in CDK code review.
Cloudflare proxy mode flip. Currently DNS-only (gray cloud). If anyone flips to proxy mode (orange cloud), Cloudflare becomes another CDN layer. Cloudflare proxy buffers responses — kills /api/chat streaming. Keep DNS-only and document the constraint.
Streaming-killing intermediaries. APIGW HTTP API buffers (would kill streaming). Cloudflare proxy mode buffers (would kill streaming). Function URL + CloudFront in DNS-only-CDN mode preserves streaming end-to-end. The whole topology shape exists to keep the streaming path clean.
WebSocket absence. Function URL doesn't support WebSockets. Bell-icon notifications stay on polling for beta. If real-time UX (live notifications, multi-user collaboration, etc.) becomes a v1.1 requirement, add an APIGW WebSocket API as a separate origin behind CloudFront.
Lambda payload size cap (6 MB on Function URL). SSR HTML for our pages is <100KB; not a concern in normal operation. Could become one if a future feature dumps a large JSON response — flag in code review.
db/client.ts async init incompatibility with drizzle's singleton. Drizzle's pool singleton pattern assumes synchronous pool creation. Top-level await in the module breaks the type contract subtly: the exported db becomes Promise<NodePgDatabase> instead of NodePgDatabase. Two paths: (a) wrap call sites with await getDb(), breaking ~dozens of import sites; (b) export a sync proxy that lazy-awaits on first call. Decision: validate option A early in implementation, fall back to a proxy pattern if call-site changes are unmanageable. Local dev preserved via env-var branching: DB_SECRET_ARN present → fetch from Secrets Manager; absent → fall back to DATABASE_URL env var.
DNS swap is a one-way commitment for ~minutes. Once app CNAME points at the new CloudFront, rolling back means a propagation cycle (~5-15 min via Cloudflare). Mitigation: validate W3 end-to-end on parallel app-w3.autri.ai first; swap app CNAME only after smoke test passes. The parallel domain needs its own temporary ACM cert + a temporary Cognito callback URL entry (both removed post-swap).
Amplify teardown timing. Keep the existing Amplify app alive through W3 implementation (zero idle cost since Amplify scales to zero). Tear down CfnApp + CfnBranch + CfnDomain from CDK once W3 is verified end-to-end, not before. Don't delete the Amplify ACM cert validation CNAME prematurely either — the cert is shared.
Deploy rollback failure modes. Alias-promote pattern: if prev alias is missing (first deploy), rollback isn't possible — deploy script must check and refuse to deploy if no prev alias exists. If current and prev get out of sync (manual aws CLI intervention), rollback could regress further than intended. Mitigation: deploy script logs both alias versions before + after; rollback script reads logs as recovery aid.
Why W3 Over Alternatives
Landscape positioning for red-team's "should we have chosen X instead?" angle:
| Approach | Idle cost | Setup complexity | Real-world usage | Why we didn't pick |
|---|---|---|---|---|
| Vercel | free → $$ at scale | trivial (push to deploy) | most Next.js teams | Hides infra; doesn't VPC-into our private RDS; lock-in to Vercel's pricing curve |
| Heroku / Render / Railway | $7-25/mo minimum | trivial | indie devs, prototypes | Always-on minimums; VPC story is complex; we already have an AWS commitment |
| ECS Fargate | ~$50/mo always-on | medium | teams that want long-running containers | $50/mo always-on with no upside vs W3 for our shape; Stack A baseline |
| App Runner | ~$25/mo always-on | low | AWS-curious devs who want containers | Managed Fargate; still no scale-to-zero; less control |
| Amplify Hosting | ~$0 idle | low | teams wanting AWS push-to-deploy | The original Stack B choice. Can't VPC. (D43 root cause.) |
| SST / OpenNext (W3 via framework) | ~$0 | medium (3-5h scaffold) | AWS-natives wanting serverless Next | Same topology, framework abstractions to debug through; we chose ownership |
| W3 DIY (us) | ~$0 | higher one-time (~10-15h) | AWS power users, cost-sensitive teams | Locked per D43 |
| EKS / K8s | varies | very high | enterprise w/ infra teams | Massive overkill for our scale; ops burden does not pay back |
Strategic fit for autri's business shape: beta SaaS + unknown traffic + capital-disciplined runway → scale-to-zero is exactly the right cost shape. The $0 idle floor means we can let it sit dormant for weeks between users without burning budget, then absorb a bursty spike without provisioning anything. That's a much better match than Fargate's $50/mo always-on (assumes constant traffic we don't have yet).
Red-team angle: SST/OpenNext. The most credible alternative challenge is "we chose DIY for ownership, but framework abstractions would've saved 10 hours and given us the same runtime shape." Counter: framework versions of W3 are themselves opinionated about Lambda packaging, edge functions, and ISR. If we hit framework limits in implementation, we'd be debugging through the framework AND the AWS layer. DIY keeps the AWS layer as our only abstraction boundary. Worth re-evaluating at v1.1 if the maintenance burden of hand-rolled CDK + deploy script grows.
Related Epics
| Epic | Doc | Status | Summary |
|---|---|---|---|
| EPIC-4: AWS Production Deploy | link | In Progress (Day 11+) | Originally implemented Stack B with Amplify; pivoted to W3 per D43. The W3 implementation work is folded into EPIC-4's amended scope. |
| EPIC-5: Beta Launch + Cost-Data Deliverable | link | Planned | Depends on W3 being live so beta users can reach the web app. |
Cross-Cutting Concerns
| Concern | How This Sub-system Is Affected |
|---|---|
| Authentication (Cognito, D34) | Main Lambda validates Cognito JWTs from session cookies on every request. JWT aud claim must equal app.autri.ai resource server identifier (cross-domain SSO contract — see Decisions Log). OAuth callback handler at /api/auth/callback runs in Main Lambda. Per-connector OAuth flow (D44) initiated from Main Lambda, executed by Connector-Management Lambda via lambda:Invoke. |
| MCP wedge (D19, D38) | Inspector links from MCP citations resolve to app.autri.ai/docs/[id]#chunk-<chunkId>. Main Lambda's hash-anchor handler scrolls + highlights cited chunks. Inspector-as-citation-surface depends on W3 being live. |
| Library/connector model (D34, D35) | Connector creation UI (per D44, gated on W3 being live) is a server action in Main Lambda. Action invokes Connector-Management Lambda (which creates the per-connector Cognito app client + initiates auth_code exchange); Main Lambda then inserts connectors row + displays 3-paste-field UI. Connector secret rotation ("Rotate client_secret" button per Decisions Log) follows the same invocation path. |
| Multi-tenancy isolation (D13) | Server-side scope enforcement in Server Components and /api/* handlers — every DB query scoped by JWT's user_id + library access checks. RLS at DB layer pending. Main Lambda is the chokepoint for enforcement at the request layer. |
| Streaming chat (D12 split + AI SDK) | /api/chat runs in Chat Lambda (split from Main per Q1'). Uses AI SDK's streamText + Anthropic provider. CloudFront origin response timeout pinned to 60s max for this behavior; resumable streams deferred to v1.1. |
/api/cache/[...path] page renders | Cache PNGs live in S3 cache bucket (provisioned in NetworkAndData for page renders). CloudFront behavior /api/cache/* routes directly to S3 — no Lambda invocation. Sub-10ms latency from edge cache after first request per edge. Cache S3 bucket needs Origin Access Control like the static bucket. |
| User uploads → ingestion handoff | Main Lambda accepts uploads, writes to existing S3 uploads bucket (in NetworkAndData). Ingestion sub-system reads from there (S3 event trigger or direct Fargate task launch — detailed in ingestion sub-system doc, not in W3 scope). |
| DB migrations (D42) | Unchanged. Migrations still run via CDK custom-resource Lambda in NetworkAndData. Main + Chat Lambdas are consumers of the schema, not managers. |
Local dev compatibility (pnpm dev) | db/client.ts env-var branches: DB_SECRET_ARN present (Lambda runtime) → fetch from Secrets Manager; absent (local dev) → use DATABASE_URL from .env.local. Same code path; no LocalStack required. |
| CSRF protection | Server actions use Next 14's built-in CSRF defense (per-build random action IDs as tokens) + SameSite=Lax session cookies. No explicit double-submit cookie or CSRF middleware needed for beta. Privileged actions (connector creation, secret rotation) inherit the same defense. Re-audit in pre-paid-customer security pass. |
| Observability — Lambda + CloudFront metrics | Per-Lambda CloudWatch Log Groups (30-day retention per monitoring stack). Per-route structured JSON logs (request ID, user_id when known, route, duration, status). Cost Explorer tag: cost-bucket=web. |
| Observability — CloudWatch alarms | Pinned set: (1) Main Lambda error rate > 5% over 5 min, (2) Chat Lambda error rate > 5% over 5 min, (3) Lambda concurrent executions > 50% of account quota, (4) CloudFront 5xx rate > 1% over 5 min, (5) CloudFront 4xx rate > 5% (catches broken static asset references), (6) ENI count > 50% subnet capacity, (7) RDS active-connections > 70% of max_connections. Alarms emit to existing AWS Budgets SNS topic for now; consider per-severity SNS topics post-beta. |
| Cost & budget alarms | Lambda + CloudFront + S3 are line items in the existing AWS Budgets ($50/$100/$200 thresholds). $0 idle cost means W3 doesn't move the needle until traffic shows up — but a runaway error loop (Lambda failing → retrying → invoking) could spike costs; concurrency cap is the defense. |
Decisions Log
| Date | Decision | Rationale | Alternatives Considered |
|---|---|---|---|
| 2026-05-26 | D43 — W3 DIY for web layer; supersedes Amplify | Amplify SSR cannot VPC; private RDS unreachable from Amplify Lambda; W3 gives us VPC control | Stack A all-Fargate ($50/mo idle); App Runner ($25/mo idle); SST/OpenNext W3 (~$0 idle, framework dep) |
| 2026-05-26 | Q1' — Split /api/chat into a separate Chat Lambda from day one (revises original Q1 single-Lambda) | Chat Lambda carries AI SDK + Anthropic SDK weight; Main Lambda stays lean for SSR. Cleaner cold-start profiles per route class. CloudFront routes by path | Single Lambda for everything: monolithic bundle, every cold start parses chat deps; defer-and-split-later: locks in the wrong shape early |
| 2026-05-26 | Q2 — Lambda Function URL only, no API Gateway | APIGW HTTP API buffers (kills /api/chat streaming); Function URL supports response streaming (GA all regions Apr 2026); cheaper; simpler. Reconsidered post-split — APIGW for Main doesn't pay back either | APIGW HTTP API + separate Function URL for /api/chat: two control planes; APIGW REST: more overhead |
| 2026-05-26 | Q3 — CloudFront multi-behavior routing: /_next/static/* + /static/* + favicon → static S3; /api/cache/* → cache S3; /api/chat → Chat Lambda; default → Main Lambda | Standard multi-behavior pattern; ~5 behaviors total; no Lambda@Edge needed; /api/cache bypasses Lambda entirely | CloudFront Functions / Lambda@Edge for routing: overkill |
| 2026-05-26 | Q4 — db/client.ts async init via top-level await on Lambda cold start; env-var branching preserves local dev | Next 14 supports top-level await in server modules; ~50-200ms cold-start hit acceptable; DB_SECRET_ARN present → fetch secret, absent → fallback to DATABASE_URL | Lazy getPool(): Promise<Pool>: touches all import sites; LocalStack for dev: heavier setup; build-time env injection: defeats secret rotation |
| 2026-05-26 | Q5 — Manual pnpm deploy:web script first; promote to GitHub Actions post-verification | D42 pattern (manual-first, automate-later); GHA needs OIDC + role setup (+1 day); manual gets W3 working end-to-end faster | GHA from day one: +1 day setup; auto-deploy on every commit might not be wanted yet |
| 2026-05-26 | Q6 — Build flow lives in autri repo; CDK shape in autri-infra | Same split as MCP pipeline; clean repo responsibility (code where code is; infra where infra is); cross-repo via stack outputs | All-in-autri-infra: infra owning app build logic feels wrong |
| 2026-05-26 | Q7 — Re-enable output: 'standalone' in next.config.mjs | Standalone is the right shape for Lambda packaging (self-contained server.js + minimal node_modules); Amplify reason for dropping doesn't apply | Keep non-standalone: would force packaging full node_modules tree, blowing past Lambda's 50MB cap |
| 2026-05-26 | Q8 — Lambda packaging: container image (locked post-measurement) | Bundle measurement on the actual autri codebase: ~38MB unzipped excluding /api/cache artifacts (which evaporate when route is removed per Q11). But hoisted pnpm causes Next's outputFileTracing to miss workspace-hoisted deps (AI SDK + Anthropic SDK + drizzle-orm) — pnpm deploy --prod is required to resolve, which produces container-image-friendly output. Zip-cap headroom is thin; container image's 10GB ceiling makes bundle size a non-design-constraint. ~150ms extra cold start is acceptable given the existing ~500-800ms floor | Zip + measure: bundle is on edge, fights add up over time; aggressive externals (Lambda Layers): fragile, breaks on each new dep |
| 2026-05-26 | Q9 — Build command sequence: pnpm install --frozen-lockfile && pnpm --filter @autri/app build && pnpm deploy --filter @autri/app --prod /tmp/standalone-build | Matches MCP server pattern. pnpm deploy --prod produces self-contained dir with hoisted deps + Next standalone output. Container image COPYs this | Standard Next standalone without pnpm deploy: misses workspace deps (@autri/retrieval); turborepo/nx: heavier than warranted |
| 2026-05-26 | Q10 — Deployment rollback via alias-promote pattern: current + prev Lambda aliases | Deploy: new version uploaded → prev re-points to old current → current re-points to new. Rollback: swap aliases. Static assets immutable in S3, untouched by rollback | Manual rollback via aws lambda update-alias: easy to forget under pressure; deploy-script-records-version: same as alias-promote but without named handles |
| 2026-05-26 | Q11 — /api/cache/[...path] routes directly from CloudFront to S3 cache bucket (no Lambda invocation) | Sub-10ms edge cache after first request; no Lambda cost per cache hit; cache key matches URL path layout. Also evaporates 197MB of local-cache artifacts from Next's output trace | Lambda reads S3 on demand: +50-200ms latency + Lambda cost per cache request; build-time sync: doesn't work for runtime-generated cache from ingestion |
| 2026-05-26 | Q12 — CloudFront Origin Request + Cache Policies pinned to AWS-managed: default behavior = AllViewerExceptHostHeader + CachingDisabled; static behaviors = CORS-S3Origin + CachingOptimized | Industry-standard combo for this exact topology. Cookie/auth headers forwarded to Lambda; Lambda responses never cached; static assets cache by path. Prevents cache-key trap | Custom policies with explicit allowlist: tighter control, more CDK, risk of missing a needed header; unspecified-let-CDK-be-truth: future readers must reverse-engineer |
| 2026-05-26 | Q13 — Minimal-but-real security headers via CloudFront Response Headers Policy | HSTS (max-age=63072000, includeSubDomains, preload), X-Frame-Options=DENY, X-Content-Type-Options=nosniff, Referrer-Policy=strict-origin-when-cross-origin, minimal CSP (script-src 'self' 'unsafe-inline'; others 'self'/'data:'). Tighten CSP in v1.1 once Next inline-script needs mapped | AWS managed SecurityHeadersPolicy: no CSP; defer entirely: bad audit posture |
| 2026-05-26 | Q14 — CSRF via Next 14 server actions' built-in defense + SameSite=Lax cookies | Per-build random action IDs serve as CSRF tokens; SameSite=Lax prevents most CSRF vectors. Privileged actions (connector creation) inherit the same defense | Explicit double-submit cookie: extra code, no clear threat-model justification yet; defer entirely: risks audit gap |
| 2026-05-26 | Q15 — Separate temporary ACM cert for app-w3.autri.ai parallel-validation domain | Cut by blue-team — see annotation on this section. Parallel domain validation removed from scope; direct cutover with fix-forward. | (kept in log as institutional context) |
| 2026-05-26 | Q16 — Connector client_secret recovery via "Rotate secret" button per connector | Deferred to v1.1 by blue-team — see annotation. Beta uses delete + recreate as recovery path. | (kept in log as institutional context) |
| 2026-05-26 | C1 (D43.C1) — During app-w3.autri.ai parallel validation, add https://app-w3.autri.ai/api/auth/callback to Cognito allowed callback URLs; remove on app CNAME swap | Cut by blue-team alongside Q15. | (kept in log as institutional context) |
| 2026-05-26 | C2 (D43.C2) — Connector-Management Lambda separated from SSR; IAM scoped to user pool ARN with verbs cognito-idp:CreateUserPoolClient + UpdateUserPoolClient + DeleteUserPoolClient | SSR Lambda's blast radius stays minimal — an RCE there can't backdoor Cognito clients. Connector-Mgmt Lambda is small, infrequent, tightly scoped | SSR Lambda holds Cognito Admin IAM: wider blast radius; defer to D44 implementation: risks defaulting to the easier-but-less-secure path |
| 2026-05-26 | C3 (D43.C3) — Two Cognito resource servers: app.autri.ai (web session audience) and mcp.autri.ai (MCP audience). JWT aud claim disambiguates; each Lambda validates audience explicitly | Closes the carryforward cross-domain SSO gap from prior sessions. Audit / scope separation between web sessions and MCP usage. Per-connector clients (D44) issue tokens with mcp.autri.ai audience | Single resource server with both subdomains in allowed audiences: loses audit separation; defer to first deploy: gap discovered in production |
Known Issues / Tech Debt
| Issue | Severity | Notes |
|---|---|---|
| Cold-start UX during low-traffic beta hours | Medium | First request every ~10-15 min idle is 500-800ms. Acceptable for beta; user-feedback signal will dictate provisioned-concurrency need |
| Cold-start hit on every deploy | Low-Medium | Dev workflow tax. Visible to beta users only if deploy lands during active session. Same mitigations as general cold-start |
| No measured bundle size for the standalone server | High (pre-implementation) | First task in implementation: build + measure. If >40MB unzipped, pivot to container image per Q8 |
db/client.ts async init may force broader refactor | Medium | If drizzle's singleton typing fights the top-level-await pattern, fallback is a lazy proxy. Touches ~30+ import sites if it cascades |
| No build pipeline yet (manual deploys only at Phase 1) | Low | Per D42 pattern. Promotion to GHA is planned but not gating beta |
| Per-connector Cognito client proliferation (D44 implication) | Low | Each connector creates a Cognito app client. Default limit: 1000 clients/pool. Not a beta concern; flagged for v1.1 |
| Token-refresh UX in Claude Desktop (downstream of D44) | Medium | When a bearer token expires, does Claude Desktop transparently refresh via client_id/secret + refresh_token? Unknown empirically. Validate during first beta user setup |
hoisted pnpm linker behavior on ingestion/ and retrieval/ workspaces | Low | Switched at workspace root last session for Amplify. Local dev for those packages should be re-verified at start of W3 implementation |
app-w3.autri.ai parallel-validation domain not yet provisioned | Medium | Needs temporary ACM cert + DNS record + Cognito callback URL entry. All three tear down after app CNAME swap completes — script the teardown to avoid forgetting |
| Amplify Hosting resources cost ~$0 idle but consume CDK surface | Low | Plan: keep alive through W3 verification, then delete CfnApp + CfnBranch + CfnDomain from auth-and-compute stack |
Connector client_secret rotation flow needs UX work (Q16) | Medium | "Rotate secret" button server action + modal showing new secret once. Implement during D44 connector-creation flow build; test old-secret invalidation behavior empirically (~1hr token still valid post-rotate) |
Resumable streams for /api/chat (H4 v1.1 work) | Medium | 60s CloudFront cap truncates long chat responses. Option (a) buffer-and-replay is ~1-2 days, simplest. Ship if beta usage shows truncation; otherwise defer |
| Tighter CSP in v1.1 (Q13) | Low | Beta ships with permissive script-src 'self' 'unsafe-inline'. Tighten once Next inline-script needs are mapped (likely needs nonce-based CSP) |
| WebSocket origin via APIGW (L1 v1.1) | Low | If real-time UX (live notifications, multi-user collaboration) becomes a v1.1 requirement, add APIGW WebSocket API as a separate origin behind CloudFront |
| ENI count CloudWatch alarm (H1) | Low | Alarm on ENI count > 50% subnet capacity. Implementation: CloudWatch custom metric from VPC describe-network-interfaces (Lambda on schedule), or VPC Insights if available |
| Connector-Management Lambda CDK scaffolding (C2) | Medium | New construct in auth-and-compute stack. Smallest viable Lambda; IAM execution role with narrowly-scoped Cognito Admin policy. Test path: invoke from SSR Lambda via lambda:Invoke, verify scope enforcement |
/api/cache/* → S3 cache bucket OAC (Q11) | Medium | New CloudFront behavior routes directly to existing cache S3 bucket. Cache bucket needs OAC like static bucket. Verify cache file path layout matches URL [...path] semantics |
| CloudWatch alarm topic selection (alarms cross-cutting) | Low | Beta uses existing AWS Budgets SNS topic for all alarms. Post-beta: consider per-severity SNS topics + PagerDuty integration for paying-customer threshold |
This sub-system defines the runtime topology for app.autri.ai. If removed, the entire user-facing web product would break — chat, inspector, KB management, connector creation, settings, auth callback all depend on it. Update this doc when topology, build pipeline, or cross-component interfaces change.