Autri — Roadmap
High-level priorities and order. This sequences the work that delivers the objectives in North Star (B1–B7). Deliberately not a story-level breakdown — too many unknowns; designs happen as each epic opens ("scope before building, design while building"). Read alongside the North Star (the why) and
decisions.md(the active calls).Created 2026-06-06 from the north-star coverage red-team. Horizons are ordered; epics within a horizon are roughly ordered but flexible.
How to read this
- Each epic is tagged with the objective(s) it serves (→ B1…B7), a rough size (S / M / L), and a build-style:
- 🧱 Foundation — design-heavy, high-unknown, touches shared architecture. Human-led, sequential. Risky to fan out to parallel agents.
- 🔀 Parallelizable — well-specified, independent, strong verification available. Good
/hl:ship-wave or ultracode candidate once specified.
- The build-style tag is also the "is this safe to fan out with agents?" tag. The gate on safe fan-out is design-certainty, not tooling — we already have the parallel-agent pipeline + eval/QA gates.
The critical path (the spine everything hangs on)
Current state (verified 2026-06-06): the web app is already deployed and feature-complete — live at app.autri.ai, real Cognito + Google auth with an email allowlist (curated beta gating), multi-tenancy enforced in code across read/mutation/query surfaces, and upload → inspect → chat → cost-display all working. This corrects the earlier draft (and the red-team's "nothing deployed" finding), which were based on a stale CLAUDE.md. Deploy + auth exist.
So the customer beta is a deploy-current-main → verify → invite → measure effort, not a foundation build. Everything cloud-facing (API, billing, integration) builds on top of a live platform.
[deploy current main + reconcile CDK drift] ← small; also lands the last tenancy-leak fix
▼
[verify tenancy + no-silent-loss in prod] → invite the STEM Racing cohort
▼
real beta usage → cost + quality + behavior data
▼
monetize (pricing/billing) + API/key-auth (integration & dogfood)
▼
enterprise (silo, compliance, SSO) + substrate (public API, SDK, agents)
The real remaining "foundation" is narrow: verify tenancy in prod, reliability/monitoring, and (for enterprise, later) SSO + silo. Not a deploy+auth lift.
Horizon 0 — Make the beta web app genuinely good (mostly in hand)
Corrected 2026-06-06: since the app is already live, these are quality/fast-follow work that runs alongside or just after the beta launch — they do not block it (except source-view, which is cohort-gated and moved to Horizon 1). Most are parallelizable.
- Batch ingestion build → B2. M · 🧱→🔀 Validated (~50% cheaper, ~2× faster for LLM-routed docs); epic drafted, needs red-team then build. COGS win that matters most at scale — fast-follow, not launch-blocking. Also serves speed-to-trust (faster first ingest).
- Verifiable-answers loop + corrective action → B1. M · 🔀 Wire citation → source → bbox end-to-end on the answer path (verify what already exists first), plus a fix-path when an answer/citation is clearly wrong.
- Prose keywords / FTS enrichment → B1. S–M · 🔀 During grouping, emit per-chunk keywords + header → prose's first real 3-index hybrid. Nearly-free output; eval-testable before any prod spend. (Lever B.)
- Quality across more doc types → B1. M · 🔀 Tables/figures recall; investigate the "PDF scripture routed to LLM not the deterministic verse path" surprise (possibly a bigger verse win).
Horizon 1 — Deploy the beta (THE chokepoint · foundation · human-led)
Corrected 2026-06-06: the app is already deployed and feature-complete (see Current State above), so this is shipping current main + verifying the risky things hold in prod + inviting real users — not a foundation build.
- Deploy current main + reconcile CDK drift → B4, B2. S–M · 🧱 Main is ~45 commits ahead of prod. Deploying ships cost-observability AND lands the query-playground tenancy enforcement that's committed-but-not-deployed (D13's last read-leak surface) — so it's partly a security action. Reconcile the cert-export + AgentCore-runtime drift (D56) first; diff deployed-vs-main; prune Docker.
- Verify multi-tenant isolation in prod with ≥2 real users → B4. S · 🧱 Enforced in code, never exercised with real multiple users. A cross-org leak in a trust-first beta is catastrophic — verify end-to-end before the cohort. Highest real risk in the whole plan.
- Confirm no silent data loss + basic monitoring/backups → B7. S · 🧱 A failed ingest must surface to the user, not vanish; real user data needs monitoring + backups.
- Real worker cost verification → B2. S · 🔀 One real ingest through the deployed worker to confirm cost columns populate (the "measure first" premise). Lands once main deploys.
- (Cohort-gated) Source view for all doc types → B1. S · 🔀 Only if the first cohort brings non-PDF docs — STEM Racing = PDFs (inspector works fully); prose/docx users hit a dead source pane.
→ Beta launches here. Then Horizon 2 unblocks on real usage data. Success metric per D56: do users trust + use the inspector / KB / chat.
Horizon 2 — Learn from beta, then monetize + open the API
Unlocked by real beta data. Pricing depends on knowing cost + quality (B1+B2). The API is the integration path AND the dogfood harness.
- Cost + quality + margin measurement at beta scale → B2, B5. S–M · 🔀 Real per-doc/query cost; validate the 70% blended target; stand up margin-per-customer + never-negative-floor monitoring. Unblocks pricing.
- Pricing · metering · billing → B5. L · 🧱→🔀 Usage metering (chunk caps, query limits, fair-use), tier enforcement, Stripe, overage/dunning, customer-facing usage visibility.
- Authenticated REST API + API-key auth → B3, B6. M · 🧱 The decided path (D56 amendment): clean API + static API-key auth, sidestepping the OAuth/DCR/resource-indicator pain. Foundational and reusable by the web app — never throwaway.
- 🐕 Dogfood track (the API epic's acceptance test): ingest our own corpus — Foundry docs +
decisions.md/CLAUDE.md/next.md(a superset Foundry can't even index) — into autri and point dev sub-agents at the API.- A/B scorecard: autri retrieval vs Foundry search on the same questions over the same corpus. Foundry search measured weak (a pointed "why defer MCP?" query missed D56 entirely). Crisp, eval-harness-able success metric, and we're the harshest judges. First real consumer of B6, zero external risk.
- Early Foundry→autri webhook (pull a thin slice of connectors forward): write a doc → it lands in autri automatically → instantly queryable. Makes the dogfood live. Depends on: the API + a deployed ingest endpoint, so it's early-in-H2, not before the API exists.
- Per-host plugin glue → B3. S · 🔀 Claude Project / GPT Action (OpenAPI) / optional thin static-token MCP server. Agents need this to know how to call the API. One per host; independent.
- 🐕 Dogfood track (the API epic's acceptance test): ingest our own corpus — Foundry docs +
- Live data connectors + freshness (customer-facing) → B3, B1. L · 🧱 Connect to where customer data lives (SharePoint, OneDrive, Google Drive); change detection (webhook vs poll); auto re-ingest; deletion propagation. Big epic-cluster — the Foundry webhook above is our internal first slice of it. May extend into Horizon 3.
Horizon 3 — Enterprise & substrate
The long game. Each item gates on its Horizon-1/2 foundation.
- Silo deploy (parameterized IaC) → B4. L · 🧱 A dedicated enterprise deployment as config, not a fork.
- Enterprise compliance → B4. L · 🧱 SOC2 posture, audit logs, data-residency enforcement, IP allowlisting, pen-testing / vuln management.
- Reliability hardening + SLAs → B7. M–L · 🧱 Autoscaling, disaster recovery, incident response, the contractual SLA pricing already promises.
- Full MCP OAuth (WorkOS Connect) → B3. L · 🧱 Only when an enterprise customer requires SSO (per D56 amendment) — deferred, possibly a long while.
- Substrate productization → B6. L · 🧱→🔀 Public versioned API, SDK, developer keys, the agent-layer framework QuoteAI-style products sit on. API-usage metering for third-party consumers.
- Broad source-format coverage (continuing) → B1. 🔀 Spreadsheets, slides, HTML, images/OCR, transcripts — prioritized as demand surfaces. Strong fan-out candidate: N independent parsers with eval-gated verification.
Sequencing principles (the "why this order")
- Quality you can ship without deploying comes cheap — do it while the foundation is built. Horizon 0 runs alongside Horizon 1, not before it.
- Deploy + auth is the chokepoint — start it now. It's the longest pole, the highest-unknown, and B3/B5/B6/B7 all wait on it. Delaying it delays everything downstream.
- Don't price before you can measure. B5 waits on real B1+B2 data from beta. ("Measure first.")
- API-key over OAuth; dogfood before strangers. The API advances B3 and B6 at once and gets validated internally on our own docs before any external consumer.
- Enterprise and substrate are earned, not front-loaded. They follow validated product + cost + integration, not precede them.
Open questions / to resolve as horizons open
- What's the minimum credible beta? Which Horizon-0 items are truly required before deploy vs. fast-follow?
- Corrective-action mechanics — what actually happens when a user flags a wrong answer/citation? (Design when the verifiable-answers epic opens.)
- Connectors scope — which source first (Foundry for dogfood? SharePoint for enterprise?), webhook vs poll, deletion semantics.
- The full objective → epic → feature → story breakdown — per epic, when that epic opens, not now.