Foundry Foundry

Autri — Roadmap

High-level priorities and order. This sequences the work that delivers the objectives in North Star (B1–B7). Deliberately not a story-level breakdown — too many unknowns; designs happen as each epic opens ("scope before building, design while building"). Read alongside the North Star (the why) and decisions.md (the active calls).

Created 2026-06-06 from the north-star coverage red-team. Horizons are ordered; epics within a horizon are roughly ordered but flexible.


How to read this

  • Each epic is tagged with the objective(s) it serves (→ B1…B7), a rough size (S / M / L), and a build-style:
    • 🧱 Foundation — design-heavy, high-unknown, touches shared architecture. Human-led, sequential. Risky to fan out to parallel agents.
    • 🔀 Parallelizable — well-specified, independent, strong verification available. Good /hl:ship-wave or ultracode candidate once specified.
  • The build-style tag is also the "is this safe to fan out with agents?" tag. The gate on safe fan-out is design-certainty, not tooling — we already have the parallel-agent pipeline + eval/QA gates.

The critical path (the spine everything hangs on)

Current state (verified 2026-06-06): the web app is already deployed and feature-complete — live at app.autri.ai, real Cognito + Google auth with an email allowlist (curated beta gating), multi-tenancy enforced in code across read/mutation/query surfaces, and upload → inspect → chat → cost-display all working. This corrects the earlier draft (and the red-team's "nothing deployed" finding), which were based on a stale CLAUDE.md. Deploy + auth exist.

So the customer beta is a deploy-current-main → verify → invite → measure effort, not a foundation build. Everything cloud-facing (API, billing, integration) builds on top of a live platform.

[deploy current main + reconcile CDK drift]   ← small; also lands the last tenancy-leak fix
        ▼
[verify tenancy + no-silent-loss in prod]  →  invite the STEM Racing cohort
        ▼
   real beta usage → cost + quality + behavior data
        ▼
   monetize (pricing/billing)  +  API/key-auth (integration & dogfood)
        ▼
   enterprise (silo, compliance, SSO)  +  substrate (public API, SDK, agents)

The real remaining "foundation" is narrow: verify tenancy in prod, reliability/monitoring, and (for enterprise, later) SSO + silo. Not a deploy+auth lift.

Horizon 0 — Make the beta web app genuinely good (mostly in hand)

Corrected 2026-06-06: since the app is already live, these are quality/fast-follow work that runs alongside or just after the beta launch — they do not block it (except source-view, which is cohort-gated and moved to Horizon 1). Most are parallelizable.

  • Batch ingestion build → B2. M · 🧱→🔀 Validated (~50% cheaper, ~2× faster for LLM-routed docs); epic drafted, needs red-team then build. COGS win that matters most at scale — fast-follow, not launch-blocking. Also serves speed-to-trust (faster first ingest).
  • Verifiable-answers loop + corrective action → B1. M · 🔀 Wire citation → source → bbox end-to-end on the answer path (verify what already exists first), plus a fix-path when an answer/citation is clearly wrong.
  • Prose keywords / FTS enrichment → B1. S–M · 🔀 During grouping, emit per-chunk keywords + header → prose's first real 3-index hybrid. Nearly-free output; eval-testable before any prod spend. (Lever B.)
  • Quality across more doc types → B1. M · 🔀 Tables/figures recall; investigate the "PDF scripture routed to LLM not the deterministic verse path" surprise (possibly a bigger verse win).

Horizon 1 — Deploy the beta (THE chokepoint · foundation · human-led)

Corrected 2026-06-06: the app is already deployed and feature-complete (see Current State above), so this is shipping current main + verifying the risky things hold in prod + inviting real users — not a foundation build.

  • Deploy current main + reconcile CDK drift → B4, B2. S–M · 🧱 Main is ~45 commits ahead of prod. Deploying ships cost-observability AND lands the query-playground tenancy enforcement that's committed-but-not-deployed (D13's last read-leak surface) — so it's partly a security action. Reconcile the cert-export + AgentCore-runtime drift (D56) first; diff deployed-vs-main; prune Docker.
  • Verify multi-tenant isolation in prod with ≥2 real users → B4. S · 🧱 Enforced in code, never exercised with real multiple users. A cross-org leak in a trust-first beta is catastrophic — verify end-to-end before the cohort. Highest real risk in the whole plan.
  • Confirm no silent data loss + basic monitoring/backups → B7. S · 🧱 A failed ingest must surface to the user, not vanish; real user data needs monitoring + backups.
  • Real worker cost verification → B2. S · 🔀 One real ingest through the deployed worker to confirm cost columns populate (the "measure first" premise). Lands once main deploys.
  • (Cohort-gated) Source view for all doc types → B1. S · 🔀 Only if the first cohort brings non-PDF docs — STEM Racing = PDFs (inspector works fully); prose/docx users hit a dead source pane.

Beta launches here. Then Horizon 2 unblocks on real usage data. Success metric per D56: do users trust + use the inspector / KB / chat.

Horizon 2 — Learn from beta, then monetize + open the API

Unlocked by real beta data. Pricing depends on knowing cost + quality (B1+B2). The API is the integration path AND the dogfood harness.

  • Cost + quality + margin measurement at beta scale → B2, B5. S–M · 🔀 Real per-doc/query cost; validate the 70% blended target; stand up margin-per-customer + never-negative-floor monitoring. Unblocks pricing.
  • Pricing · metering · billing → B5. L · 🧱→🔀 Usage metering (chunk caps, query limits, fair-use), tier enforcement, Stripe, overage/dunning, customer-facing usage visibility.
  • Authenticated REST API + API-key auth → B3, B6. M · 🧱 The decided path (D56 amendment): clean API + static API-key auth, sidestepping the OAuth/DCR/resource-indicator pain. Foundational and reusable by the web app — never throwaway.
    • 🐕 Dogfood track (the API epic's acceptance test): ingest our own corpus — Foundry docs + decisions.md/CLAUDE.md/next.md (a superset Foundry can't even index) — into autri and point dev sub-agents at the API.
      • A/B scorecard: autri retrieval vs Foundry search on the same questions over the same corpus. Foundry search measured weak (a pointed "why defer MCP?" query missed D56 entirely). Crisp, eval-harness-able success metric, and we're the harshest judges. First real consumer of B6, zero external risk.
      • Early Foundry→autri webhook (pull a thin slice of connectors forward): write a doc → it lands in autri automatically → instantly queryable. Makes the dogfood live. Depends on: the API + a deployed ingest endpoint, so it's early-in-H2, not before the API exists.
    • Per-host plugin glue → B3. S · 🔀 Claude Project / GPT Action (OpenAPI) / optional thin static-token MCP server. Agents need this to know how to call the API. One per host; independent.
  • Live data connectors + freshness (customer-facing) → B3, B1. L · 🧱 Connect to where customer data lives (SharePoint, OneDrive, Google Drive); change detection (webhook vs poll); auto re-ingest; deletion propagation. Big epic-cluster — the Foundry webhook above is our internal first slice of it. May extend into Horizon 3.

Horizon 3 — Enterprise & substrate

The long game. Each item gates on its Horizon-1/2 foundation.

  • Silo deploy (parameterized IaC) → B4. L · 🧱 A dedicated enterprise deployment as config, not a fork.
  • Enterprise compliance → B4. L · 🧱 SOC2 posture, audit logs, data-residency enforcement, IP allowlisting, pen-testing / vuln management.
  • Reliability hardening + SLAs → B7. M–L · 🧱 Autoscaling, disaster recovery, incident response, the contractual SLA pricing already promises.
  • Full MCP OAuth (WorkOS Connect) → B3. L · 🧱 Only when an enterprise customer requires SSO (per D56 amendment) — deferred, possibly a long while.
  • Substrate productization → B6. L · 🧱→🔀 Public versioned API, SDK, developer keys, the agent-layer framework QuoteAI-style products sit on. API-usage metering for third-party consumers.
  • Broad source-format coverage (continuing) → B1. 🔀 Spreadsheets, slides, HTML, images/OCR, transcripts — prioritized as demand surfaces. Strong fan-out candidate: N independent parsers with eval-gated verification.

Sequencing principles (the "why this order")

  1. Quality you can ship without deploying comes cheap — do it while the foundation is built. Horizon 0 runs alongside Horizon 1, not before it.
  2. Deploy + auth is the chokepoint — start it now. It's the longest pole, the highest-unknown, and B3/B5/B6/B7 all wait on it. Delaying it delays everything downstream.
  3. Don't price before you can measure. B5 waits on real B1+B2 data from beta. ("Measure first.")
  4. API-key over OAuth; dogfood before strangers. The API advances B3 and B6 at once and gets validated internally on our own docs before any external consumer.
  5. Enterprise and substrate are earned, not front-loaded. They follow validated product + cost + integration, not precede them.

Open questions / to resolve as horizons open

  • What's the minimum credible beta? Which Horizon-0 items are truly required before deploy vs. fast-follow?
  • Corrective-action mechanics — what actually happens when a user flags a wrong answer/citation? (Design when the verifiable-answers epic opens.)
  • Connectors scope — which source first (Foundry for dogfood? SharePoint for enterprise?), webhook vs poll, deletion semantics.
  • The full objective → epic → feature → story breakdown — per epic, when that epic opens, not now.

Review

🔒

Enter your access token to view annotations