AWS Infra Options

Drafted 2026-05-19 (session 2 follow-up). Comparison doc for the AWS hosting decision underneath the locked direction in infra-and-auth-plan.md. Triggered by review threads on the parent doc + a meaningful discovery: AWS released Bedrock AgentCore Runtime in March 2026 as a purpose-built MCP server host, which changes the calculus.

This doc breaks the "where on AWS do we run Autri?" question into independent layers (MCP server, web app, ingestion workers, DB, auth) and prices out the composite stacks. The parent doc locked "AWS-native" as direction; this doc decides which AWS-native pattern — and surfaces the AgentCore Runtime option that should be evaluated before we lock anything.

Composite Stacks

Combining the layer picks:

Layer 1: MCP Server Hosting

TL;DR

Locked direction (pending Day 0 spike): Bedrock AgentCore Runtime for the MCP server + AWS Amplify for the Next.js app + Fargate Tasks for ingestion workers + RDS Postgres + Cognito. Materially cheaper at idle (~$35-50/mo vs $90/mo) AND less ops overhead than the all-Fargate plan from D33. AgentCore Runtime is purpose-built for our exact MCP workload, and Amplify gives Dan a fast CI/CD path he's familiar with.

Research-validated 2026-05-19: AgentCore Runtime requires Streamable HTTP transport (MCP spec 2025-03-26+, not legacy HTTP+SSE). Cold-start is sub-second from a 10-microVM warm pool. 8h max compute lifecycle with Mcp-Session-Id persisting across microVM swaps. 100 TPM new-session rate, 1,000 concurrent sessions per account in us-east-1 — generous for beta + early growth. Details on the Open questions thread.

The all-Fargate plan from D33 stays as the documented fallback. Day 0 POC spike validates Cognito SSO flow + Claude Desktop reconnect behavior at the 8h compute boundary + cost telemetry wiring; if any of those surface a blocker, Stack A is the known-good escape hatch.

Why this doc exists

Annotation threads on the parent doc surfaced three independent questions:

Should we be on Fargate at all, or take a serverless approach (per Dan's brother's instinct, originally for QuoteAI)?
Is AWS Amplify worth reconsidering (Dan has used it before)?
Are there AWS-native MCP services we should evaluate?

Each is a real question. Rather than answer them piecemeal in the parent doc, this doc lays out the full design space.

The MCP-hosting landscape changed (May 2026)

The single most important update: Bedrock AgentCore Runtime (GA, with stateful MCP features added March 10, 2026) is an AWS service purpose-built to host MCP servers. Per-session isolated microVMs, OAuth 2.1 native, scales 0→thousands of sessions, no charge for I/O-wait idle time.

This is the answer to "Are there AWS MCP services?" — and it materially changes our analysis. Most of the prior reasoning ("Fargate is the only viable host because Lambda has 15-min timeouts and API Gateway has 30-sec timeouts") was correct for the year 2024 AWS landscape. AgentCore Runtime didn't exist when we wrote D33.

It's worth a careful evaluation before we commit to building on Fargate. Details in the § Layer 1: MCP Server Hosting section below.

Separately, AWS MCP Server (GA May 6, 2026) is not a hosting product — it's a single AWS-published MCP server exposing AWS's own ~300 services to AI agents. Not relevant to our hosting decision; useful for Dan to know about as a potential consumer-side integration ("ask Autri's agent about its underlying AWS infra").

Architectural layers

The decision is layered. Each layer can be picked semi-independently:

Layer	Options	Notes
L1: MCP server	AgentCore Runtime / Fargate / App Runner	Needs SSE + OAuth; bursty load
L2: Web app	Amplify / Fargate / S3+CloudFront+API Gateway+Lambda / App Runner	Next.js with SSR for some routes
L3: Ingestion workers	Fargate Tasks / Step Functions + Lambda	Long-running (30+ min for novels)
L4: Database	RDS Postgres + pgvector	Effectively locked
L5: Auth	Cognito	Locked per D34

L4 and L5 are the same across all stacks. L1, L2, L3 are where the real choices live.

Option M1: Bedrock AgentCore Runtime (NEW)

Description: AWS-managed serverless runtime for MCP servers. Each MCP session runs in an isolated microVM (up to 8h lifetime, 15-min idle timeout). Routes by Mcp-Session-Id header. Auth: IAM, OAuth 2.1, Cognito/Entra ID/Okta.

Pricing: ~$0.0895/vCPU-hour + $0.00945/GB-hour, per-second billing, no charge for I/O-wait idle time. I.e., if your MCP server is waiting for the LLM to respond, you don't pay for that time. AWS quotes 10M sessions/mo ≈ $7.2k/mo for reference.

At our scale:

Idle (no users): ~$0
100 MAU, moderate use (1k sessions/mo): **$5-15/mo**
1k MAU, moderate use (10k sessions/mo): **$50-150/mo**

MCP compatibility: Native. Built for this. Supports stateful features (elicitation, sampling, progress notifications). OAuth 2.1 is first-class.

Setup complexity: Medium. Package MCP server as a container (per AgentCore Runtime spec), push to ECR, configure auth + session policy. AWS provides reference architectures.

Migration cost:

Into AgentCore from Fargate: moderate. Need to repackage the MCP server to AgentCore Runtime's lifecycle (session handlers, idle timeout handling). Not a full rewrite.
Out of AgentCore to Fargate: same as above, reversed. The MCP protocol is the same; you're swapping the runtime.

Strengths:

Purpose-built for our exact workload (long-lived MCP sessions, OAuth, bursty traffic)
Idle is free — at beta scale (5-10 users) this is essentially free
AWS-managed: no instances to patch, no Fargate tasks to scale
Stateful microVM model handles MCP's session semantics natively

Weaknesses:

Newer service — fewer reference implementations, less community knowledge
8-hour microVM lifetime is generous but not infinite; long-lived agent sessions need reconnection handling
Pricing surprises possible at scale (vCPU-hour adds up if sessions are CPU-heavy)
AWS-only — if we ever need multi-cloud, this doesn't port

Validated (research subagent, 2026-05-19):

Transport: Streamable HTTP required. AgentCore does NOT support legacy HTTP+SSE. MCP servers must listen on 0.0.0.0:8000/mcp and target MCP spec 2025-03-26+. Streaming responses use text/event-stream content-type within Streamable HTTP semantics (not legacy SSE transport). Implication: mcp-servers/doc-search adapter targets Streamable HTTP — small refactor, MCP SDK supports both.
Cold-start: sub-second from warm pool of 10 microVMs per endpoint. Beyond pool: 2-5s container deploys, 2-3s code deploys. No "provisioned concurrency" SKU exists — warm pool is automatic, not user-configurable. AWS-recommended pattern for true zero-cold-start: pre-emptive VM warmup (initialize a session before user's first real request). Confidence: medium — numbers from an AWS engineer's GitHub issue, not a published SLA.
Session lifecycle: 15 min idle timeout (adjustable via idleRuntimeSessionTimeout); 8h max compute lifecycle (adjustable via maxLifetime); logical session remains valid until AgentCore Runtime ARN is deleted; at the 8h compute boundary, a new microVM is provisioned with the same Mcp-Session-Id (in-memory state lost unless persisted via session storage or AgentCore Memory).
Scale: 1,000 concurrent active sessions per account in us-east-1/us-west-2 (500 elsewhere). 100 TPM new container deployments / 25 TPS code deployments. 2 vCPU / 8 GB per session (fixed hardware).

Remaining open question for Day 0 spike:

Client behavior at the 8h compute boundary — does Claude Desktop / Copilot Studio / Cursor gracefully reconnect on the same Mcp-Session-Id when AgentCore swaps the underlying microVM, or does it see a hard disconnect and require a fresh handshake? AWS docs describe server-side behavior; client behavior is empirical. Test in spike with Claude Desktop (may require waiting 8h to observe, OR forcing a microVM swap via redeploy).

Option M2: ECS Fargate behind ALB (D33 baseline)

Description: Run the MCP server as a long-running container task in ECS Fargate, behind an Application Load Balancer with ALB idle timeout raised to 300s+ for SSE.

Pricing: Fargate task (0.5 vCPU, 1GB) ~$20/mo + ALB $20/mo = ~$40/mo always-on, regardless of usage.

At our scale:

Idle: ~$40/mo
100 MAU: ~$40-50/mo
1k MAU: ~$60-150/mo (may scale to 2-3 tasks)

MCP compatibility: Manual but well-trodden. AWS published a reference solution (Deploying MCP Servers on AWS) using exactly this pattern.

Setup complexity: Medium-high. VPC, task definition, ALB target group, IAM, ECR. CDK helps.

Migration cost: Container is portable to any Kubernetes/Docker host. Easiest to leave.

Strengths:

Well-understood pattern, lots of AWS reference material
Container is portable (could move to App Runner, EKS, or off-AWS)
No 8h session lifetime concern
We already know this stack

Weaknesses:

Always-on cost even when nobody's using the MCP server
More moving parts (ALB, target groups, security groups, task definitions, NAT Gateway)
Doesn't scale down to zero

Option M3: AWS App Runner

Description: Managed container service. You push a container image; App Runner runs it behind a load balancer with TLS, auto-scaling, optional VPC integration.

Pricing: Provisioned: $0.064/vCPU-hour + $0.007/GB-hour (active), $0.009/vCPU-hour (idle on hot instance). Min 0.25 vCPU + 0.5 GB. ~$10-15/mo idle for one min-size service, $5-50/mo at modest traffic.

At our scale:

Idle: ~$10-15/mo
100 MAU: ~$15-30/mo
1k MAU: ~$50-150/mo

MCP compatibility: Works for HTTP+SSE — App Runner supports long-running requests. Less Direct than AgentCore which is purpose-built.

Setup complexity: Low. apprunner create-service with a container image. No VPC mandatory; can add for RDS access.

Migration cost: Container is portable. Easy to leave.

Strengths:

Simplest of the three for "just run my container"
Cheaper idle than Fargate (no ALB cost)
Auto-scales

Weaknesses:

Less control than Fargate
AWS-only DX (less well-known than ECS)
Idle-time billing exists (~$10/mo) — not zero like AgentCore

Layer 1 comparison

	AgentCore Runtime	Fargate + ALB	App Runner
Idle cost	$0	$40	$10-15
100 MAU	$5-15	$40-50	$15-30
1k MAU	$50-150	$60-150	$50-150
MCP fit	Purpose-built	Manual	Manual
Setup time	Medium	Medium-high	Low
Maturity	New (Mar 2026)	Well-known	Mid-maturity
Auth built-in	Yes (Cognito/IAM)	Manual	Manual
Portability	AWS-only	Container	Container

Layer 1 lean: AgentCore Runtime for the savings + purpose-built fit. Fargate stays in our back pocket as the proven fallback.

Layer 2: Web App Hosting (Next.js app)

The Next.js app hosts: the web chat UI, KB management, connector management, settings, auth callback, and a few short-lived API routes.

Option W4: AWS App Runner

Same shape as Layer 1 M3 but hosting the Next.js app instead of MCP. Pricing same: ~$10-15/mo idle.

Option W1: AWS Amplify

Description: Connect Amplify to a GitHub repo, Amplify builds and hosts Next.js app on CloudFront + Lambda. SSR via Lambda. CI/CD per push to main. Free SSL, custom domain hookup.

Pricing: $0.01/build-min (typical build = 2-5 min = pennies per deploy) + $0.15/GB served (static assets, edge-cached via CloudFront) + $0.06/GB-hour for SSR Lambda compute.

At our scale:

Idle: ~$0 (scale to zero)
100 MAU: ~$5-20/mo
1k MAU: ~$50-300/mo

Production-scale viability (validated 2026-05-19): Amplify is a true production tool — it's AWS-managed packaging of primitives (CloudFront, Lambda, S3, Cognito) that scale to billions. Real production customers include Bose, Skyscanner, Lululemon, BMW, and parts of Disney+ (typically for portions of their stack, which is the pattern we'd follow). The underlying primitives are unbounded; Amplify's management layer is what's outgrown first, and migration off is mostly a CI/CD swap because the app code doesn't change.

What's outgrown first (none likely before 100k+ MAU, in approximate order):

Build pipeline concurrent-build limits (configurable up)
Edge caching control (Amplify abstracts CloudFront config; raw CloudFront more flexible)
SSR Lambda runtime config (Amplify picks memory/timeout per region)
VPC networking patterns (less granular than CDK-defined networking)

Migration cost off Amplify (to roll-your-own CloudFront + S3 + API Gateway + Lambda): ~1 week of dedicated effort — move build to GitHub Actions (~1 day), define CloudFront + S3 + Lambda in CDK (~2-3 days), domain swap via DNS, app code unchanged.

For Autri specifically: Amplify handles us comfortably through 100k+ MAU. By the time we'd consider migrating, we'd have the resources to do it carefully. The one real concern at scale is cost — Amplify SSR Lambda invocations get expensive vs. running Next.js as a long-lived Fargate container at sustained-high traffic (1M+ RPS). That's a "we made it" problem, not a beta-sprint problem.

Pros:

Push-to-deploy from GitHub (zero CI/CD config)
Dan has used it before
Free CloudFront in front
Cognito integration via Amplify CLI is one command
Scales to zero
Production-validated at scale (see above)

Cons:

Amplify SSR runs on Lambda with ~30s timeout — fine for chat API routes (sub-second LLM streaming responses fit; longer streams could be an issue)
Less control than rolling your own Next.js on Fargate
Amplify v1 had reputation issues; v2 (since 2024) is much better

Option W2: ECS Fargate

Description: Next.js app as a long-running container behind ALB. Same Fargate pattern as Layer 1 M2.

Pricing: ~$40/mo always-on (Fargate task + ALB share — though if MCP also on Fargate we share the ALB).

At our scale:

Idle: ~$40/mo (or $20/mo if sharing ALB with MCP server)
100 MAU: ~$40-50/mo
1k MAU: ~$60-200/mo

Pros:

Most control over runtime
SSR has no Lambda timeout constraint
Same pattern as MCP server (operational consistency)

Cons:

Always-on cost
More ops setup (CI/CD via GitHub Actions → ECR → ECS deploy)
We have to wire CloudFront + S3 for static asset caching ourselves

Option W3: CloudFront + S3 + API Gateway + Lambda (the "build it yourself serverless" path)

Description: Static Next.js export → S3 + CloudFront for the frontend. Dynamic API routes → API Gateway + Lambda.

Pricing: S3 + CloudFront ~$1-5/mo + API Gateway $1/M requests + Lambda $0.20/M requests + GB-second.

At our scale:

Idle: ~$1-2/mo
100 MAU: ~$5-15/mo
1k MAU: ~$30-150/mo

Pros:

Cheapest idle
Most "AWS native" feel — every component is a primitive

Cons:

Most setup work (Lambda + API Gateway + S3 + CloudFront all configured manually OR via Amplify which... is what Amplify already does)
30-sec API Gateway timeout limits some patterns
Cold starts on infrequent Lambda invocations
This is basically "Amplify minus the convenience layer" — Amplify already wraps this pattern

Layer 2 comparison

	Amplify	Fargate	S3+CF+APIGW+Lambda	App Runner
Idle cost	$0	$20-40	$1-2	$10-15
100 MAU	$5-20	$40-50	$5-15	$15-30
1k MAU	$50-300	$60-200	$30-150	$50-150
Setup time	Lowest	Medium-high	Highest	Low
Dan familiarity	Yes	New (CDK)	Most pieces familiar	New
GitHub CI/CD	Built in	Manual	Manual	Built in
30s Lambda cap impact	Low (chat fits)	None	Same as Amplify	None

Layer 2 lean: Amplify for dev velocity + zero idle cost + Dan's familiarity. Fargate is the fallback if Amplify SSR proves limiting.

Layer 3: Ingestion Workers

Ingestion is long-running (30+ min for a novel). Cannot be on Lambda directly.

Layer 3 lean: Fargate Tasks. Step Functions is for later if Fargate cold-start becomes a UX issue.

Option I1: Fargate Tasks (one-off)

Spawn a Fargate task per ingestion job. Task runs for the duration of the extraction, then exits. Pay per-second for what runs.

Pricing: ~$0.04 per task-hour (0.5 vCPU + 1GB). A 30-min novel ingestion = $0.02. At zero ingestion: $0.

Pros: Zero idle. Same Fargate concept we'd use anyway. Easy to wire from app trigger → RunTask API.

Cons: Startup latency (~30-60s task cold start). Acceptable for ingestion.

Option I2: Step Functions + Lambda chained

Chain Lambdas via Step Functions, each Lambda processing a chunk of the ingestion. Total flow takes the same time but split across Lambdas under the 15-min cap.

Pricing: Lambda $0.20/M + GB-second + Step Functions transitions $0.025/1k.

Pros: Truly serverless. Pay only for execution.

Cons: Complexity. Need to redesign ingestion as discrete state-machine steps. Inter-Lambda context passing via S3 or DynamoDB. Not worth the complexity at our scale — Fargate Tasks are simpler.

Stack A: All-Fargate (D33 baseline)

L1 MCP: Fargate + ALB
L2 App: Fargate (sharing ALB)
L3 Workers: Fargate Tasks
L4 DB: RDS
L5 Auth: Cognito

Idle: ~$95/mo (1 Fargate task $20 + ALB $20 + RDS $30 + NAT $35 + misc) 100 MAU: ~$180/mo 1k MAU: ~$500-1200/mo

Pros: Consistency (one runtime for everything), well-understood, no AgentCore newness risk. Cons: Always-on cost, most operational surface (CDK + Fargate + ALB).

Stack B: AgentCore + Amplify + Fargate Tasks (NEW LEAN)

L1 MCP: Bedrock AgentCore Runtime
L2 App: Amplify (GitHub-connected)
L3 Workers: Fargate Tasks
L4 DB: RDS
L5 Auth: Cognito

Idle: ~$35-50/mo (RDS $30 + maybe NAT $35 if we keep workers in VPC + minimal Amplify/AgentCore = ~$30 cheaper than Stack A) 100 MAU: ~$60-90/mo 1k MAU: ~$300-700/mo (significantly cheaper because AgentCore scales to zero between sessions)

Pros: Cheaper at every scale. Less ops surface. Push-to-deploy. Purpose-built MCP host. Dan knows Amplify. Cons: Two newer services (AgentCore is 2 months old; Amplify v2 is mature but worth verifying our use case fits). Two control planes to learn instead of one. AWS-only lock-in deeper.

Stack C: AgentCore + Fargate Web + Fargate Tasks

L1 MCP: AgentCore Runtime
L2 App: Fargate (no Amplify)
L3 Workers: Fargate Tasks
L4 DB: RDS
L5 Auth: Cognito

Idle: ~$75/mo (Fargate $20 + ALB $20 + RDS $30 + NAT $35 - oops same as Stack A almost) 100 MAU: ~$120/mo 1k MAU: ~$400-900/mo

Pros: AgentCore savings on MCP side, Fargate consistency on app side. Cons: Inherits Fargate's always-on cost for the app. AgentCore added complexity for marginal benefit vs Stack A.

Stack D: Serverless-first (no Fargate at all)

L1 MCP: AgentCore Runtime
L2 App: Amplify
L3 Workers: Step Functions + Lambda (chained, not Fargate Tasks)
L4 DB: RDS + RDS Proxy (for Lambda connection pooling)
L5 Auth: Cognito

Idle: ~$30-45/mo (RDS $30 + RDS Proxy $15 + minimal others) 100 MAU: ~$50-80/mo 1k MAU: ~$250-600/mo

Pros: Cheapest. Zero containers. Closest to your brother's serverless vision. Cons: Step Functions complexity for ingestion (redesign needed). RDS Proxy add. Less control. If a Lambda step in ingestion fails, recovery is harder than a single Fargate Task that crashes.

Stack comparison

	Stack A (D33)	Stack B (new lean)	Stack C	Stack D (serverless)
Idle	$95/mo	$35-50/mo	$75/mo	$30-45/mo
100 MAU	$180/mo	$60-90/mo	$120/mo	$50-80/mo
1k MAU	$500-1200/mo	$300-700/mo	$400-900/mo	$250-600/mo
Setup time	5-7 days	3-5 days	5-7 days	7-10 days
Ops surface	High	Medium	Medium-high	Lowest
Dan familiarity	New	Mostly familiar	New	Least familiar
Migration cost out	Container is portable	AgentCore is AWS-only	Mixed	Step Functions = redesign work
Risk	Low (known)	Medium (AgentCore is 2 months old)	Medium	High (most components newer to us)

Recommendation

New lean: Stack B (AgentCore Runtime + Amplify + Fargate Tasks).

Why this beats Stack A:

~$60/mo less at idle, ~$90/mo less at 100 MAU. Compounds over the months we'll be in beta.
AgentCore Runtime is purpose-built for our MCP workload. Idle-free, OAuth-native, session-aware. We were going to roll the equivalent on Fargate manually.
Amplify gives Dan's-familiar push-to-deploy without writing CDK for the app side. CDK only needed for AgentCore, RDS, Cognito, networking.
Container is still in the picture (Fargate Tasks for ingestion, AgentCore Runtime runs containers). Not abandoning the container ecosystem — just not running them 24/7.

What we'd want to verify before committing:

AgentCore Runtime transport support — Streamable HTTP vs HTTP+SSE. Confirm via AWS docs or POC.
AgentCore cold-start latency on first request to a fresh microVM. Acceptable for an interactive MCP client?
Amplify SSR + Cognito flow — does the JWT validation chain work cleanly with Cognito hosted UI?
NAT Gateway necessity — can we avoid it entirely? RDS connections from AgentCore/Amplify/Fargate Tasks all need VPC connectivity OR public-accessible RDS with strict security groups. Want to confirm the cheapest viable network topology.

Recommended path to lock the decision:

Day 0-1: spike Stack B locally + minimum AWS POC — deploy a "hello world" MCP server to AgentCore Runtime, connect Amplify to a dummy Next.js repo, verify the layers talk. ~4-6 hours of work.
If spike works: lock Stack B, update D33 to reflect, proceed with the (reshaped local-first) beta sprint.
If spike reveals AgentCore blockers: fall back to Stack A. We don't lose much time.

Open questions

Updated 2026-05-19 — most prior open questions resolved via subagent research + review threads. Remaining items go into the Day 0 spike checklist.

Resolved (no longer open):

~~Wait to commit until POC AgentCore?~~ → YES — Day 0 spike before locking. Stack A is the fallback if AgentCore has a blocker.
~~Multi-region readiness from day one?~~ → NO — us-east-1 only for beta, likely year 1. Multi-region triggered by first paying customer with a latency/compliance need.
~~Activate Founders credits offset?~~ → Post-beta: defer application until landing page is real. $1k credits will offset ~20 months of Stack B idle.
~~AgentCore transport (SSE vs Streamable HTTP)?~~ → Streamable HTTP required. Our mcp-servers/doc-search adapter targets MCP spec 2025-03-26+.
~~AgentCore cold-start latency?~~ → Sub-second from warm pool of 10 microVMs per endpoint; 2-3s beyond pool. Fine for our scale.
~~Session lifecycle limits?~~ → 15 min idle, 8h max compute, logical session persists across microVM swaps. 1k concurrent sessions/region, 100 TPM new-session rate. Generous for beta.

Remaining items — Day 0 spike checklist:

Claude Desktop reconnect behavior at the 8h compute boundary. AWS docs describe server-side semantics cleanly; client behavior is empirical. Test: open a Claude Desktop MCP session against AgentCore, wait through the 8h boundary, verify graceful reconnect with same Mcp-Session-Id. (Alternative empirical: force a microVM swap via re-deploy and observe client behavior.)
Cognito SSO state across app.autri.ai and mcp.autri.ai. OAuth flow: user authenticates on app.autri.ai, generates a connector, the OAuth token validates against mcp.autri.ai's resource server. 30-min end-to-end test in the spike.
Domain routing topology. Cloudflare DNS → AWS endpoints with ACM certs for app.autri.ai (Amplify) and mcp.autri.ai (AgentCore). Apex autri.ai reserved for landing page (separate, deferred).
Cost telemetry wiring. CloudWatch dashboards + Cost Explorer tags (project=autri env=beta cost-bucket=<layer>) + AWS Budgets alerts at $50/$100/$200 thresholds + Cost Anomaly Detection daily emails. Verify all surfaces give us real-time visibility before any external beta user signs up.
AgentCore Runtime billing visibility specifically. Per-session vCPU-seconds + memory-GB-seconds in CloudWatch — confirm we can see per-session costs, not just aggregate.

Deliverable from Day 0 spike: half-page spike notes documenting findings, with explicit "Stack B locked" or "Fallback to Stack A — blocker is X" decision. Goes in this doc's appendix or becomes its own short doc.

What this doc does NOT decide

IaC tool (CDK vs Terraform vs raw CloudFormation). Separate decision; lean CDK for TypeScript continuity with the app codebase.
Observability stack (CloudWatch vs Datadog vs others). Defer until beta produces a real monitoring need.
Backup / DR strategy. Beta-scale: RDS automated backups + S3 versioning is enough. Production-scale: revisit.
Multi-tenancy isolation patterns (single-DB vs DB-per-tenant). Per D13, single-DB with RLS is the chosen pattern. Not affected by infra choice.

AWS Infra Options

Composite Stacks

Layer 1: MCP Server Hosting

TL;DR

Why this doc exists

The MCP-hosting landscape changed (May 2026)

Architectural layers

Option M1: Bedrock AgentCore Runtime (NEW)

Option M2: ECS Fargate behind ALB (D33 baseline)

Option M3: AWS App Runner

Layer 1 comparison

Layer 2: Web App Hosting (Next.js app)

Option W4: AWS App Runner

Option W1: AWS Amplify

Option W2: ECS Fargate

Option W3: CloudFront + S3 + API Gateway + Lambda (the "build it yourself serverless" path)

Layer 2 comparison

Layer 3: Ingestion Workers

Layer 3 lean: Fargate Tasks. Step Functions is for later if Fargate cold-start becomes a UX issue.

Option I1: Fargate Tasks (one-off)

Option I2: Step Functions + Lambda chained

Stack A: All-Fargate (D33 baseline)

Stack B: AgentCore + Amplify + Fargate Tasks (NEW LEAN)

Stack C: AgentCore + Fargate Web + Fargate Tasks

Stack D: Serverless-first (no Fargate at all)

Stack comparison

Recommendation

Open questions

What this doc does NOT decide

Sources

Review

AWS Infra Options#

Composite Stacks#

Layer 1: MCP Server Hosting#

TL;DR#

Why this doc exists#

The MCP-hosting landscape changed (May 2026)#

Architectural layers#

Option M1: Bedrock AgentCore Runtime (NEW)#

Option M2: ECS Fargate behind ALB (D33 baseline)#

Option M3: AWS App Runner#

Layer 1 comparison#

Layer 2: Web App Hosting (Next.js app)#

Option W4: AWS App Runner#

Option W1: AWS Amplify#

Option W2: ECS Fargate#

Option W3: CloudFront + S3 + API Gateway + Lambda (the "build it yourself serverless" path)#

Layer 2 comparison#

Layer 3: Ingestion Workers#

Layer 3 lean: Fargate Tasks. Step Functions is for later if Fargate cold-start becomes a UX issue.#

Option I1: Fargate Tasks (one-off)#

Option I2: Step Functions + Lambda chained#

Stack A: All-Fargate (D33 baseline)#

Stack B: AgentCore + Amplify + Fargate Tasks (NEW LEAN)#

Stack C: AgentCore + Fargate Web + Fargate Tasks#

Stack D: Serverless-first (no Fargate at all)#

Stack comparison#

Recommendation#

Open questions#

What this doc does NOT decide#

Sources#

Review

AWS Infra Options

Composite Stacks

Layer 1: MCP Server Hosting

TL;DR

Why this doc exists

The MCP-hosting landscape changed (May 2026)

Architectural layers

Option M1: Bedrock AgentCore Runtime (NEW)

Option M2: ECS Fargate behind ALB (D33 baseline)

Option M3: AWS App Runner

Layer 1 comparison

Layer 2: Web App Hosting (Next.js app)

Option W4: AWS App Runner

Option W1: AWS Amplify

Option W2: ECS Fargate

Option W3: CloudFront + S3 + API Gateway + Lambda (the "build it yourself serverless" path)

Layer 2 comparison

Layer 3: Ingestion Workers

Layer 3 lean: Fargate Tasks. Step Functions is for later if Fargate cold-start becomes a UX issue.

Option I1: Fargate Tasks (one-off)

Option I2: Step Functions + Lambda chained

Stack A: All-Fargate (D33 baseline)

Stack B: AgentCore + Amplify + Fargate Tasks (NEW LEAN)

Stack C: AgentCore + Fargate Web + Fargate Tasks

Stack D: Serverless-first (no Fargate at all)

Stack comparison

Recommendation

Open questions

What this doc does NOT decide

Sources