AWS Infra Options
Drafted 2026-05-19 (session 2 follow-up). Comparison doc for the AWS hosting decision underneath the locked direction in infra-and-auth-plan.md. Triggered by review threads on the parent doc + a meaningful discovery: AWS released Bedrock AgentCore Runtime in March 2026 as a purpose-built MCP server host, which changes the calculus.
This doc breaks the "where on AWS do we run Autri?" question into independent layers (MCP server, web app, ingestion workers, DB, auth) and prices out the composite stacks. The parent doc locked "AWS-native" as direction; this doc decides which AWS-native pattern — and surfaces the AgentCore Runtime option that should be evaluated before we lock anything.
Composite Stacks
Combining the layer picks:
Layer 1: MCP Server Hosting
TL;DR
Locked direction (pending Day 0 spike): Bedrock AgentCore Runtime for the MCP server + AWS Amplify for the Next.js app + Fargate Tasks for ingestion workers + RDS Postgres + Cognito. Materially cheaper at idle (~$35-50/mo vs $90/mo) AND less ops overhead than the all-Fargate plan from D33. AgentCore Runtime is purpose-built for our exact MCP workload, and Amplify gives Dan a fast CI/CD path he's familiar with.
Research-validated 2026-05-19: AgentCore Runtime requires Streamable HTTP transport (MCP spec 2025-03-26+, not legacy HTTP+SSE). Cold-start is sub-second from a 10-microVM warm pool. 8h max compute lifecycle with Mcp-Session-Id persisting across microVM swaps. 100 TPM new-session rate, 1,000 concurrent sessions per account in us-east-1 — generous for beta + early growth. Details on the Open questions thread.
The all-Fargate plan from D33 stays as the documented fallback. Day 0 POC spike validates Cognito SSO flow + Claude Desktop reconnect behavior at the 8h compute boundary + cost telemetry wiring; if any of those surface a blocker, Stack A is the known-good escape hatch.
Why this doc exists
Annotation threads on the parent doc surfaced three independent questions:
- Should we be on Fargate at all, or take a serverless approach (per Dan's brother's instinct, originally for QuoteAI)?
- Is AWS Amplify worth reconsidering (Dan has used it before)?
- Are there AWS-native MCP services we should evaluate?
Each is a real question. Rather than answer them piecemeal in the parent doc, this doc lays out the full design space.
The MCP-hosting landscape changed (May 2026)
The single most important update: Bedrock AgentCore Runtime (GA, with stateful MCP features added March 10, 2026) is an AWS service purpose-built to host MCP servers. Per-session isolated microVMs, OAuth 2.1 native, scales 0→thousands of sessions, no charge for I/O-wait idle time.
This is the answer to "Are there AWS MCP services?" — and it materially changes our analysis. Most of the prior reasoning ("Fargate is the only viable host because Lambda has 15-min timeouts and API Gateway has 30-sec timeouts") was correct for the year 2024 AWS landscape. AgentCore Runtime didn't exist when we wrote D33.
It's worth a careful evaluation before we commit to building on Fargate. Details in the § Layer 1: MCP Server Hosting section below.
Separately, AWS MCP Server (GA May 6, 2026) is not a hosting product — it's a single AWS-published MCP server exposing AWS's own ~300 services to AI agents. Not relevant to our hosting decision; useful for Dan to know about as a potential consumer-side integration ("ask Autri's agent about its underlying AWS infra").
Architectural layers
The decision is layered. Each layer can be picked semi-independently:
| Layer | Options | Notes |
|---|---|---|
| L1: MCP server | AgentCore Runtime / Fargate / App Runner | Needs SSE + OAuth; bursty load |
| L2: Web app | Amplify / Fargate / S3+CloudFront+API Gateway+Lambda / App Runner | Next.js with SSR for some routes |
| L3: Ingestion workers | Fargate Tasks / Step Functions + Lambda | Long-running (30+ min for novels) |
| L4: Database | RDS Postgres + pgvector | Effectively locked |
| L5: Auth | Cognito | Locked per D34 |
L4 and L5 are the same across all stacks. L1, L2, L3 are where the real choices live.
Option M1: Bedrock AgentCore Runtime (NEW)
Description: AWS-managed serverless runtime for MCP servers. Each MCP session runs in an isolated microVM (up to 8h lifetime, 15-min idle timeout). Routes by Mcp-Session-Id header. Auth: IAM, OAuth 2.1, Cognito/Entra ID/Okta.
Pricing: ~$0.0895/vCPU-hour + $0.00945/GB-hour, per-second billing, no charge for I/O-wait idle time. I.e., if your MCP server is waiting for the LLM to respond, you don't pay for that time. AWS quotes 10M sessions/mo ≈ $7.2k/mo for reference.
At our scale:
- Idle (no users): ~$0
- 100 MAU, moderate use (
1k sessions/mo): **$5-15/mo** - 1k MAU, moderate use (
10k sessions/mo): **$50-150/mo**
MCP compatibility: Native. Built for this. Supports stateful features (elicitation, sampling, progress notifications). OAuth 2.1 is first-class.
Setup complexity: Medium. Package MCP server as a container (per AgentCore Runtime spec), push to ECR, configure auth + session policy. AWS provides reference architectures.
Migration cost:
- Into AgentCore from Fargate: moderate. Need to repackage the MCP server to AgentCore Runtime's lifecycle (session handlers, idle timeout handling). Not a full rewrite.
- Out of AgentCore to Fargate: same as above, reversed. The MCP protocol is the same; you're swapping the runtime.
Strengths:
- Purpose-built for our exact workload (long-lived MCP sessions, OAuth, bursty traffic)
- Idle is free — at beta scale (5-10 users) this is essentially free
- AWS-managed: no instances to patch, no Fargate tasks to scale
- Stateful microVM model handles MCP's session semantics natively
Weaknesses:
- Newer service — fewer reference implementations, less community knowledge
- 8-hour microVM lifetime is generous but not infinite; long-lived agent sessions need reconnection handling
- Pricing surprises possible at scale (vCPU-hour adds up if sessions are CPU-heavy)
- AWS-only — if we ever need multi-cloud, this doesn't port
Validated (research subagent, 2026-05-19):
- Transport: Streamable HTTP required. AgentCore does NOT support legacy HTTP+SSE. MCP servers must listen on
0.0.0.0:8000/mcpand target MCP spec 2025-03-26+. Streaming responses usetext/event-streamcontent-type within Streamable HTTP semantics (not legacy SSE transport). Implication:mcp-servers/doc-searchadapter targets Streamable HTTP — small refactor, MCP SDK supports both. - Cold-start: sub-second from warm pool of 10 microVMs per endpoint. Beyond pool: 2-5s container deploys, 2-3s code deploys. No "provisioned concurrency" SKU exists — warm pool is automatic, not user-configurable. AWS-recommended pattern for true zero-cold-start: pre-emptive VM warmup (initialize a session before user's first real request). Confidence: medium — numbers from an AWS engineer's GitHub issue, not a published SLA.
- Session lifecycle: 15 min idle timeout (adjustable via
idleRuntimeSessionTimeout); 8h max compute lifecycle (adjustable viamaxLifetime); logical session remains valid until AgentCore Runtime ARN is deleted; at the 8h compute boundary, a new microVM is provisioned with the sameMcp-Session-Id(in-memory state lost unless persisted via session storage or AgentCore Memory). - Scale: 1,000 concurrent active sessions per account in us-east-1/us-west-2 (500 elsewhere). 100 TPM new container deployments / 25 TPS code deployments. 2 vCPU / 8 GB per session (fixed hardware).
Remaining open question for Day 0 spike:
- Client behavior at the 8h compute boundary — does Claude Desktop / Copilot Studio / Cursor gracefully reconnect on the same
Mcp-Session-Idwhen AgentCore swaps the underlying microVM, or does it see a hard disconnect and require a fresh handshake? AWS docs describe server-side behavior; client behavior is empirical. Test in spike with Claude Desktop (may require waiting 8h to observe, OR forcing a microVM swap via redeploy).
Option M2: ECS Fargate behind ALB (D33 baseline)
Description: Run the MCP server as a long-running container task in ECS Fargate, behind an Application Load Balancer with ALB idle timeout raised to 300s+ for SSE.
Pricing: Fargate task (0.5 vCPU, 1GB) ~$20/mo + ALB $20/mo = ~$40/mo always-on, regardless of usage.
At our scale:
- Idle: ~$40/mo
- 100 MAU: ~$40-50/mo
- 1k MAU: ~$60-150/mo (may scale to 2-3 tasks)
MCP compatibility: Manual but well-trodden. AWS published a reference solution (Deploying MCP Servers on AWS) using exactly this pattern.
Setup complexity: Medium-high. VPC, task definition, ALB target group, IAM, ECR. CDK helps.
Migration cost: Container is portable to any Kubernetes/Docker host. Easiest to leave.
Strengths:
- Well-understood pattern, lots of AWS reference material
- Container is portable (could move to App Runner, EKS, or off-AWS)
- No 8h session lifetime concern
- We already know this stack
Weaknesses:
- Always-on cost even when nobody's using the MCP server
- More moving parts (ALB, target groups, security groups, task definitions, NAT Gateway)
- Doesn't scale down to zero
Option M3: AWS App Runner
Description: Managed container service. You push a container image; App Runner runs it behind a load balancer with TLS, auto-scaling, optional VPC integration.
Pricing: Provisioned: $0.064/vCPU-hour + $0.007/GB-hour (active), $0.009/vCPU-hour (idle on hot instance). Min 0.25 vCPU + 0.5 GB. ~$10-15/mo idle for one min-size service, $5-50/mo at modest traffic.
At our scale:
- Idle: ~$10-15/mo
- 100 MAU: ~$15-30/mo
- 1k MAU: ~$50-150/mo
MCP compatibility: Works for HTTP+SSE — App Runner supports long-running requests. Less Direct than AgentCore which is purpose-built.
Setup complexity: Low. apprunner create-service with a container image. No VPC mandatory; can add for RDS access.
Migration cost: Container is portable. Easy to leave.
Strengths:
- Simplest of the three for "just run my container"
- Cheaper idle than Fargate (no ALB cost)
- Auto-scales
Weaknesses:
- Less control than Fargate
- AWS-only DX (less well-known than ECS)
- Idle-time billing exists (~$10/mo) — not zero like AgentCore
Layer 1 comparison
| AgentCore Runtime | Fargate + ALB | App Runner | |
|---|---|---|---|
| Idle cost | $0 | $40 | $10-15 |
| 100 MAU | $5-15 | $40-50 | $15-30 |
| 1k MAU | $50-150 | $60-150 | $50-150 |
| MCP fit | Purpose-built | Manual | Manual |
| Setup time | Medium | Medium-high | Low |
| Maturity | New (Mar 2026) | Well-known | Mid-maturity |
| Auth built-in | Yes (Cognito/IAM) | Manual | Manual |
| Portability | AWS-only | Container | Container |
Layer 1 lean: AgentCore Runtime for the savings + purpose-built fit. Fargate stays in our back pocket as the proven fallback.
Layer 2: Web App Hosting (Next.js app)
The Next.js app hosts: the web chat UI, KB management, connector management, settings, auth callback, and a few short-lived API routes.
Option W4: AWS App Runner
Same shape as Layer 1 M3 but hosting the Next.js app instead of MCP. Pricing same: ~$10-15/mo idle.
Option W1: AWS Amplify
Description: Connect Amplify to a GitHub repo, Amplify builds and hosts Next.js app on CloudFront + Lambda. SSR via Lambda. CI/CD per push to main. Free SSL, custom domain hookup.
Pricing: $0.01/build-min (typical build = 2-5 min = pennies per deploy) + $0.15/GB served (static assets, edge-cached via CloudFront) + $0.06/GB-hour for SSR Lambda compute.
At our scale:
- Idle: ~$0 (scale to zero)
- 100 MAU: ~$5-20/mo
- 1k MAU: ~$50-300/mo
Production-scale viability (validated 2026-05-19): Amplify is a true production tool — it's AWS-managed packaging of primitives (CloudFront, Lambda, S3, Cognito) that scale to billions. Real production customers include Bose, Skyscanner, Lululemon, BMW, and parts of Disney+ (typically for portions of their stack, which is the pattern we'd follow). The underlying primitives are unbounded; Amplify's management layer is what's outgrown first, and migration off is mostly a CI/CD swap because the app code doesn't change.
What's outgrown first (none likely before 100k+ MAU, in approximate order):
- Build pipeline concurrent-build limits (configurable up)
- Edge caching control (Amplify abstracts CloudFront config; raw CloudFront more flexible)
- SSR Lambda runtime config (Amplify picks memory/timeout per region)
- VPC networking patterns (less granular than CDK-defined networking)
Migration cost off Amplify (to roll-your-own CloudFront + S3 + API Gateway + Lambda): ~1 week of dedicated effort — move build to GitHub Actions (~1 day), define CloudFront + S3 + Lambda in CDK (~2-3 days), domain swap via DNS, app code unchanged.
For Autri specifically: Amplify handles us comfortably through 100k+ MAU. By the time we'd consider migrating, we'd have the resources to do it carefully. The one real concern at scale is cost — Amplify SSR Lambda invocations get expensive vs. running Next.js as a long-lived Fargate container at sustained-high traffic (1M+ RPS). That's a "we made it" problem, not a beta-sprint problem.
Pros:
- Push-to-deploy from GitHub (zero CI/CD config)
- Dan has used it before
- Free CloudFront in front
- Cognito integration via Amplify CLI is one command
- Scales to zero
- Production-validated at scale (see above)
Cons:
- Amplify SSR runs on Lambda with ~30s timeout — fine for chat API routes (sub-second LLM streaming responses fit; longer streams could be an issue)
- Less control than rolling your own Next.js on Fargate
- Amplify v1 had reputation issues; v2 (since 2024) is much better
Option W2: ECS Fargate
Description: Next.js app as a long-running container behind ALB. Same Fargate pattern as Layer 1 M2.
Pricing: ~$40/mo always-on (Fargate task + ALB share — though if MCP also on Fargate we share the ALB).
At our scale:
- Idle: ~$40/mo (or $20/mo if sharing ALB with MCP server)
- 100 MAU: ~$40-50/mo
- 1k MAU: ~$60-200/mo
Pros:
- Most control over runtime
- SSR has no Lambda timeout constraint
- Same pattern as MCP server (operational consistency)
Cons:
- Always-on cost
- More ops setup (CI/CD via GitHub Actions → ECR → ECS deploy)
- We have to wire CloudFront + S3 for static asset caching ourselves
Option W3: CloudFront + S3 + API Gateway + Lambda (the "build it yourself serverless" path)
Description: Static Next.js export → S3 + CloudFront for the frontend. Dynamic API routes → API Gateway + Lambda.
Pricing: S3 + CloudFront ~$1-5/mo + API Gateway $1/M requests + Lambda $0.20/M requests + GB-second.
At our scale:
- Idle: ~$1-2/mo
- 100 MAU: ~$5-15/mo
- 1k MAU: ~$30-150/mo
Pros:
- Cheapest idle
- Most "AWS native" feel — every component is a primitive
Cons:
- Most setup work (Lambda + API Gateway + S3 + CloudFront all configured manually OR via Amplify which... is what Amplify already does)
- 30-sec API Gateway timeout limits some patterns
- Cold starts on infrequent Lambda invocations
- This is basically "Amplify minus the convenience layer" — Amplify already wraps this pattern
Layer 2 comparison
| Amplify | Fargate | S3+CF+APIGW+Lambda | App Runner | |
|---|---|---|---|---|
| Idle cost | $0 | $20-40 | $1-2 | $10-15 |
| 100 MAU | $5-20 | $40-50 | $5-15 | $15-30 |
| 1k MAU | $50-300 | $60-200 | $30-150 | $50-150 |
| Setup time | Lowest | Medium-high | Highest | Low |
| Dan familiarity | Yes | New (CDK) | Most pieces familiar | New |
| GitHub CI/CD | Built in | Manual | Manual | Built in |
| 30s Lambda cap impact | Low (chat fits) | None | Same as Amplify | None |
Layer 2 lean: Amplify for dev velocity + zero idle cost + Dan's familiarity. Fargate is the fallback if Amplify SSR proves limiting.
Layer 3: Ingestion Workers
Ingestion is long-running (30+ min for a novel). Cannot be on Lambda directly.
Layer 3 lean: Fargate Tasks. Step Functions is for later if Fargate cold-start becomes a UX issue.
Option I1: Fargate Tasks (one-off)
Spawn a Fargate task per ingestion job. Task runs for the duration of the extraction, then exits. Pay per-second for what runs.
Pricing: ~$0.04 per task-hour (0.5 vCPU + 1GB). A 30-min novel ingestion = $0.02. At zero ingestion: $0.
Pros: Zero idle. Same Fargate concept we'd use anyway. Easy to wire from app trigger → RunTask API.
Cons: Startup latency (~30-60s task cold start). Acceptable for ingestion.
Option I2: Step Functions + Lambda chained
Chain Lambdas via Step Functions, each Lambda processing a chunk of the ingestion. Total flow takes the same time but split across Lambdas under the 15-min cap.
Pricing: Lambda $0.20/M + GB-second + Step Functions transitions $0.025/1k.
Pros: Truly serverless. Pay only for execution.
Cons: Complexity. Need to redesign ingestion as discrete state-machine steps. Inter-Lambda context passing via S3 or DynamoDB. Not worth the complexity at our scale — Fargate Tasks are simpler.
Stack A: All-Fargate (D33 baseline)
- L1 MCP: Fargate + ALB
- L2 App: Fargate (sharing ALB)
- L3 Workers: Fargate Tasks
- L4 DB: RDS
- L5 Auth: Cognito
Idle: ~$95/mo (1 Fargate task $20 + ALB $20 + RDS $30 + NAT $35 + misc) 100 MAU: ~$180/mo 1k MAU: ~$500-1200/mo
Pros: Consistency (one runtime for everything), well-understood, no AgentCore newness risk. Cons: Always-on cost, most operational surface (CDK + Fargate + ALB).
Stack B: AgentCore + Amplify + Fargate Tasks (NEW LEAN)
- L1 MCP: Bedrock AgentCore Runtime
- L2 App: Amplify (GitHub-connected)
- L3 Workers: Fargate Tasks
- L4 DB: RDS
- L5 Auth: Cognito
Idle: ~$35-50/mo (RDS $30 + maybe NAT $35 if we keep workers in VPC + minimal Amplify/AgentCore = ~$30 cheaper than Stack A) 100 MAU: ~$60-90/mo 1k MAU: ~$300-700/mo (significantly cheaper because AgentCore scales to zero between sessions)
Pros: Cheaper at every scale. Less ops surface. Push-to-deploy. Purpose-built MCP host. Dan knows Amplify. Cons: Two newer services (AgentCore is 2 months old; Amplify v2 is mature but worth verifying our use case fits). Two control planes to learn instead of one. AWS-only lock-in deeper.
Stack C: AgentCore + Fargate Web + Fargate Tasks
- L1 MCP: AgentCore Runtime
- L2 App: Fargate (no Amplify)
- L3 Workers: Fargate Tasks
- L4 DB: RDS
- L5 Auth: Cognito
Idle: ~$75/mo (Fargate $20 + ALB $20 + RDS $30 + NAT $35 - oops same as Stack A almost) 100 MAU: ~$120/mo 1k MAU: ~$400-900/mo
Pros: AgentCore savings on MCP side, Fargate consistency on app side. Cons: Inherits Fargate's always-on cost for the app. AgentCore added complexity for marginal benefit vs Stack A.
Stack D: Serverless-first (no Fargate at all)
- L1 MCP: AgentCore Runtime
- L2 App: Amplify
- L3 Workers: Step Functions + Lambda (chained, not Fargate Tasks)
- L4 DB: RDS + RDS Proxy (for Lambda connection pooling)
- L5 Auth: Cognito
Idle: ~$30-45/mo (RDS $30 + RDS Proxy $15 + minimal others) 100 MAU: ~$50-80/mo 1k MAU: ~$250-600/mo
Pros: Cheapest. Zero containers. Closest to your brother's serverless vision. Cons: Step Functions complexity for ingestion (redesign needed). RDS Proxy add. Less control. If a Lambda step in ingestion fails, recovery is harder than a single Fargate Task that crashes.
Stack comparison
| Stack A (D33) | Stack B (new lean) | Stack C | Stack D (serverless) | |
|---|---|---|---|---|
| Idle | $95/mo | $35-50/mo | $75/mo | $30-45/mo |
| 100 MAU | $180/mo | $60-90/mo | $120/mo | $50-80/mo |
| 1k MAU | $500-1200/mo | $300-700/mo | $400-900/mo | $250-600/mo |
| Setup time | 5-7 days | 3-5 days | 5-7 days | 7-10 days |
| Ops surface | High | Medium | Medium-high | Lowest |
| Dan familiarity | New | Mostly familiar | New | Least familiar |
| Migration cost out | Container is portable | AgentCore is AWS-only | Mixed | Step Functions = redesign work |
| Risk | Low (known) | Medium (AgentCore is 2 months old) | Medium | High (most components newer to us) |
Recommendation
New lean: Stack B (AgentCore Runtime + Amplify + Fargate Tasks).
Why this beats Stack A:
- ~$60/mo less at idle, ~$90/mo less at 100 MAU. Compounds over the months we'll be in beta.
- AgentCore Runtime is purpose-built for our MCP workload. Idle-free, OAuth-native, session-aware. We were going to roll the equivalent on Fargate manually.
- Amplify gives Dan's-familiar push-to-deploy without writing CDK for the app side. CDK only needed for AgentCore, RDS, Cognito, networking.
- Container is still in the picture (Fargate Tasks for ingestion, AgentCore Runtime runs containers). Not abandoning the container ecosystem — just not running them 24/7.
What we'd want to verify before committing:
- AgentCore Runtime transport support — Streamable HTTP vs HTTP+SSE. Confirm via AWS docs or POC.
- AgentCore cold-start latency on first request to a fresh microVM. Acceptable for an interactive MCP client?
- Amplify SSR + Cognito flow — does the JWT validation chain work cleanly with Cognito hosted UI?
- NAT Gateway necessity — can we avoid it entirely? RDS connections from AgentCore/Amplify/Fargate Tasks all need VPC connectivity OR public-accessible RDS with strict security groups. Want to confirm the cheapest viable network topology.
Recommended path to lock the decision:
- Day 0-1: spike Stack B locally + minimum AWS POC — deploy a "hello world" MCP server to AgentCore Runtime, connect Amplify to a dummy Next.js repo, verify the layers talk. ~4-6 hours of work.
- If spike works: lock Stack B, update D33 to reflect, proceed with the (reshaped local-first) beta sprint.
- If spike reveals AgentCore blockers: fall back to Stack A. We don't lose much time.
Open questions
Updated 2026-05-19 — most prior open questions resolved via subagent research + review threads. Remaining items go into the Day 0 spike checklist.
Resolved (no longer open):
Wait to commit until POC AgentCore?→ YES — Day 0 spike before locking. Stack A is the fallback if AgentCore has a blocker.Multi-region readiness from day one?→ NO — us-east-1 only for beta, likely year 1. Multi-region triggered by first paying customer with a latency/compliance need.Activate Founders credits offset?→ Post-beta: defer application until landing page is real. $1k credits will offset ~20 months of Stack B idle.AgentCore transport (SSE vs Streamable HTTP)?→ Streamable HTTP required. Ourmcp-servers/doc-searchadapter targets MCP spec 2025-03-26+.AgentCore cold-start latency?→ Sub-second from warm pool of 10 microVMs per endpoint; 2-3s beyond pool. Fine for our scale.Session lifecycle limits?→ 15 min idle, 8h max compute, logical session persists across microVM swaps. 1k concurrent sessions/region, 100 TPM new-session rate. Generous for beta.
Remaining items — Day 0 spike checklist:
- Claude Desktop reconnect behavior at the 8h compute boundary. AWS docs describe server-side semantics cleanly; client behavior is empirical. Test: open a Claude Desktop MCP session against AgentCore, wait through the 8h boundary, verify graceful reconnect with same
Mcp-Session-Id. (Alternative empirical: force a microVM swap via re-deploy and observe client behavior.) - Cognito SSO state across
app.autri.aiandmcp.autri.ai. OAuth flow: user authenticates onapp.autri.ai, generates a connector, the OAuth token validates againstmcp.autri.ai's resource server. 30-min end-to-end test in the spike. - Domain routing topology. Cloudflare DNS → AWS endpoints with ACM certs for
app.autri.ai(Amplify) andmcp.autri.ai(AgentCore). Apexautri.aireserved for landing page (separate, deferred). - Cost telemetry wiring. CloudWatch dashboards + Cost Explorer tags (
project=autri env=beta cost-bucket=<layer>) + AWS Budgets alerts at $50/$100/$200 thresholds + Cost Anomaly Detection daily emails. Verify all surfaces give us real-time visibility before any external beta user signs up. - AgentCore Runtime billing visibility specifically. Per-session vCPU-seconds + memory-GB-seconds in CloudWatch — confirm we can see per-session costs, not just aggregate.
Deliverable from Day 0 spike: half-page spike notes documenting findings, with explicit "Stack B locked" or "Fallback to Stack A — blocker is X" decision. Goes in this doc's appendix or becomes its own short doc.
What this doc does NOT decide
- IaC tool (CDK vs Terraform vs raw CloudFormation). Separate decision; lean CDK for TypeScript continuity with the app codebase.
- Observability stack (CloudWatch vs Datadog vs others). Defer until beta produces a real monitoring need.
- Backup / DR strategy. Beta-scale: RDS automated backups + S3 versioning is enough. Production-scale: revisit.
- Multi-tenancy isolation patterns (single-DB vs DB-per-tenant). Per D13, single-DB with RLS is the chosen pattern. Not affected by infra choice.
Sources
- AWS What's New — AgentCore Runtime stateful MCP (Mar 2026)
- Bedrock AgentCore Runtime FAQs
- AgentCore Runtime pricing
- AWS Solutions Guidance: Deploying MCP Servers on AWS
- AWS What's New — AWS MCP Server GA (May 2026)
- AWS Amplify hosting docs
- AWS App Runner pricing
- re:Invent 2025 — AgentCore Gateway session recap