Autri — Functional MVP Spec
Created 2026-05-15. Scopes the Functional MVP — the smallest version of Autri that validates "is this product worth shipping?" through dogfooding by Dan and a hand-onboarded test cohort (STEM Racing kids + Dan's dad). Local-only, no commerce. The Deployed MVP (auth, onboarding for strangers, AWS deploy, Stripe, public landing) is a separate doc, deferred until Functional MVP validates.
Refined iteratively; once each H2 is tight, epic docs spin out per item.
The 5 MVP items
Each H2 below gets refined (design questions, dependencies, implementation direction) before spinning out to its own epic doc.
Goal
Validate the product locally — to Dan first, then to a hand-onboarded test cohort — before any commerce wrap. The Brehob deal outcome (post-Andy-meeting 2026-05-15) funds the commerce wrap if it lands; either way, Functional MVP comes first. A chargeable shell wrapped around a mediocre product is worse than a great product without billing.
Two MVPs
Autri ships in two distinct MVPs stacked on each other:
| Functional MVP (this doc) | Deployed MVP (separate doc, deferred) | |
|---|---|---|
| Question it answers | Is this product worth shipping? | Can we charge for it? |
| Audience | Dan + hand-onboarded test cohort | First paying customers |
| Validation surface | Local dev environment | Production (autri.ai) |
| In scope | The 5 items below + visual QA via inspector | Auth, RLS enforcement, sample KB / onboarding for strangers, AWS-native deploy, Bedrock prod path, Stripe + tier enforcement, public landing |
| Test corpora | STEM Racing PDFs + Dad's QuoteAI corpus | Add: real customer corpora as they sign on |
Current state — what's already built
Ingestion (D10/D11):
- PDF text-layer first, vision fallback for figure-heavy/sparse pages
- Operations-based extractor (LLM semantics, code mechanics) — verbatim content + bbox by construction
- Haiku default, Sonnet retry button per-doc
- Continue-on-error per-file
Retrieval (D4/D14):
- Three KB-scoped primitives:
lookup_section,fts_search,vector_search(in@autri/retrieval) - Agentic router via MCP (stdio for dev):
@autri/mcp-doc-search retrieval_logtable powers source-of-result attribution
UI:
/docs— flat doc list with confidence tier chips/docs/[id]— inspector with bbox overlays + parsed text/docs/[id]/query— query playground with color-coded source-of-result traces/kb— KB list/kb/[id]— doc list within a KB/kb/[id]/chat— chat surface with markdown answers, source chips, bbox preview
Schema (D13/D14/D15):
- Multi-tenancy FK chain
- Logical-documents + supersession + default-latest filter
Pilot state: 1272 chunks across 2 STEM Racing PDF docs. Default org Hannah Labs, default KB STEM Racing Charlotte.
Test cohort and corpora
Two cohorts, two corpus shapes — chosen for maximum surface coverage with zero new-acquisition cost. Compounds across QuoteAI (Dad's corpus is already ingested there) and Autri.
STEM Racing kids (Charlotte team):
- Corpus: World Finals Technical + Competition Regulations (already ingested — 1272 chunks)
- Content shape: figure-heavy PDFs, technical regulations
- Query pattern: rule lookup + creative-interpretation ("can we use X mod given C7.6.2?")
- Tests: PDF path + figure handling + rule-lookup retrieval
Dad's QuoteAI corpus (Brehob quote history):
- Subset (a): structured quote spreadsheets (XLSX) — past-quote line-items in QuoteAI's format
- Subset (b): raw past-quote PDFs — the originals before being spreadsheet-ified
- Content shape: business documents, table-heavy, figure-light
- Query pattern: "what did we quote for similar scenarios?" semantic search across past work
- Tests: XLSX path + PDF path on totally different content type than STEM (cross-product compound)
Together: two real users with two distinct content shapes, four source-type exercises (PDF technical, PDF business, XLSX structured, eventually DOCX as item 3 expands).
1. UI/UX flow
Problem. A lot of functionality exists — inspector, chat, KB nav, query playground — but the UX flow doesn't yet tell a real user "the system is working and I trust it." Today ingestion is CLI-only, navigation is a flat doc list, retrieval traces only surface on the playground page (not chat).
MVP needs:
Chat as the homepage.
- Familiar webchat pattern (Claude-web-chat feel). Center column for chat; right rail for sources (initial lean — iterate until it feels right).
- Color-coded retrieval traces surfaced inline with each source — the differentiator can't live only on the playground page.
- Multi-turn chat history threaded into the router.
- Inline
[N]citations rendered in the assistant answer body. - Empty state: if user has zero KBs, redirect to the KB upload page (or block chat input with a "create a KB first" prompt).
Knowledge base management view.
- List existing KBs, create new KB, delete KB (no per-KB ACL UI in Functional MVP — that's Deployed).
- Click into a KB → high-level source-doc view (today's
/kb/[id], polished).
Ingestion pipeline view (the cool one).
- Pipeline-style flow visualization with stages: File upload → Ingestion → Agent validation → Human review → Ready
- Pattern reference: QuoteAI's streaming-checklist for quote generation — gives the user confidence the system is doing real work.
- Status doesn't need to be real-time-streaming (polling is fine); a status bar that updates per stage is enough.
- Per-file failure visibility — when one doc fails (vision timeout, parse error, schema validation failure), show why without blowing up the batch.
Source-doc drill-in.
- Today's
/docs/[id]inspector is the core surface — polished. - Need: navigate within a doc (page-by-page, section-by-section).
Retrieval trace surface inside chat.
- The color-coded "which index returned this chunk" visualization is a first-class part of the chat UX, not a separate playground page. Trust comes from legibility.
Empty states + failure UX.
- No KBs yet → routed to upload page
- KB has no docs → empty state with "Upload your first document" CTA
- Ingestion failures → per-doc error display, retry, partial-success
- LLM returns invalid JSON → automatic Sonnet retry, fall through to "needs human review" tier
Branding/visual design: Claude Design handles the visual layer (typography, color, layout polish) in parallel.
Design questions to refine:
- (a) Right-rail sources or different layout? Initial lean is right-rail (matches QuoteAI's rail pattern, keeps chat as focal column). Iterate until it feels right.
- (b) Async ingestion: poll or SSE? Probably polling for v1 (one less moving part); upgrade to SSE if it feels slow.
- (c) Bulk upload of related docs. If user drops 50 files, treat as 50 separate docs or one logical doc with 50 sections? Probably 50 separate docs.
Dependencies: none — parallelizable with the extraction spike. Out of scope (Deployed MVP or later): onboarding for strangers, sample KB / try-without-uploading, per-KB ACLs, mobile responsiveness.
2. MCP over SSE + OAuth
Problem. Current MCP is stdio (@autri/mcp-doc-search). Stdio is local-only — won't survive deploy. The strategic positioning ("be the MCP server they consume") demands hosted SSE. Per D5 pruned-note: stdio MCP is planned for retirement in favor of SSE+OAuth.
MVP needs:
- SSE transport for the doc-search server (retire stdio)
- OAuth 2.0 flow for token issuance — user clicks "Connect Claude / Cursor / Whatever" → consent → token issued
- Token scope model: per-token list of allowed KB IDs + allowed tools
- Token management UI in-app: list active, revoke, re-scope
- Lift E12 wholesale from Foundry — already designed there
Design questions to refine:
- (a) Local dev OAuth: real Cognito dev pool, or stub? Lean: real Cognito dev pool — normalizes the auth pathway from day one (~1-2 day setup), no two-pathway drift. Reuses primitive between Functional and Deployed MVP.
- (b) Tool surface on the MCP server. Add
list_knowledge_basesandlist_documentsto support D17 (hybrid agent + KB selection). Figure access is v1.5. - (c) Per-tool authorization. Read-only vs. write tokens? Probably yes — defaults to read-only.
Dependencies: local OAuth setup (real Cognito or stub) for issuance. Out of scope: rate-limiting per-token (Deployed MVP), audit log surface, OAuth client registration UI.
3. Multi-doc-type extraction (spike-and-iterate)
Problem. Today's extractor is PDF-vision-first. DOCX/XLSX/MD need different parsing approaches. Figures/diagrams in PDFs are bbox-overlaid but lack semantic content.
Approach: spike first, design after. Per process.md: "Make design decisions during implementation as they surface — that's when the real constraints are visible." Trying to design the perfect agent.md schema before we've shipped DOCX is exactly the kind of design-before-real-constraints the methodology warns against.
Spike plan:
- Refactor current PDF extractor into
extractors/pdf/— defines what a doc-type extractor IS structurally. - Build DOCX as the second type — validates that the abstraction holds.
- Iterate against the test corpora (STEM Racing + Dad's quotes) — what works, what doesn't, what cross-type abstraction emerges.
- Once two types ship, write up the abstraction that actually emerged (not the one we guessed). That writeup becomes the proper design doc.
- Then expand to Markdown/plain text and XLSX.
Source types in priority order:
- PDF (refactor existing → first source-type-specific extractor)
- DOCX (validates abstraction; key for author segment + Dad's templates)
- Plain text + Markdown (trivial off DOCX; covers dev/prosumer)
- XLSX (Dad's quote spreadsheets — exercises a totally different shape than text-flow docs)
Agent-validation pipeline stage (new).
Today's pipeline: extract → finalize (compute confidence tier) → human review.
Proposed new stage between extract and human review: agent validation.
- LLM reads extracted chunks against the source pages
- Flags suspicious chunks for human review (hallucinated content, structurally wrong, semantically off)
- Auto-approves high-confidence chunks
- Tightens the human-review loop — humans only see what needs human judgment
This complements D11 (operations-based extractor → verbatim text by construction) by catching errors in semantic chunking even when the text is verbatim.
Figures/pictures in PDFs (sub-problem):
- Today: figures are bbox-overlaid; chunks referencing them have no semantic content of their own
- Proposed:
describe_figureoperation per figure region → Haiku vision generates text description → stored aschunk_type = 'figure_description', embedded with surrounding text context - Cost: ~$0.0003 per figure (tractable)
- Spike-test this alongside the doc-type work
Open during the spike (not pre-decided):
agent.mdschema (structured vs. freeform)- DOCX chunking strategy (paragraph-bound, heading-bound, hybrid)
- XLSX semantics (named ranges, detected tables, cell-level)
- Figure description embedding (text-only with context vs. multi-modal CLIP-style — text-only likely right)
Dependencies: test corpora (already have). Out of scope: OCR'd PDFs (image-only, no text layer), HTML/URL ingestion, audio/video.
4. KB update flow
Problem. Today: manual one-doc-at-a-time CLI ingest. Authors and legal teams accumulate docs continuously; they need a recurring update path. (Manual upload UI lands as part of item 1; the more nuanced version-detection lives here.)
MVP needs:
- Manual upload covers Functional MVP (drag-drop in item 1's surface)
- D15 version-detection heuristic on upload — filename + title + structural overlap auto-supersedes prior versions
- Update vs. supersede UX — when uploading a new doc that matches an existing logical-name, surface "looks like a new version of X — supersede?" prompt; don't auto-supersede silently
- Re-extract trigger — button on doc inspector to re-run extraction with the current extractor version (useful as the extractor improves during the spike)
Design questions:
- (a) Manual confirm vs. auto-supersede. Lean: confirm. Cheaper to be wrong than to silently overwrite.
- (b) Bulk re-extract. If extractor improves significantly mid-spike, offer "re-extract all docs in this KB"? Probably yes as a doc-level action; surface in the inspector.
Dependencies: D15 version-detection heuristic needs implementation (designed in Foundry, not built). Out of scope (post-MVP / Deployed MVP): Google Drive folder sync, Dropbox sync, SharePoint, S3 bucket, Git repo sync, scheduled refreshes, webhook-driven ingest.
5. File diff mechanism
Problem. When v2 of a doc is uploaded, show "what changed since v1." Schema is in (D15); algorithm + UI are not.
Approach: chunk-level structural diff, not git diff.
Git-diff is the wrong primitive here. Git diffs are line-based on text; our content is chunk-based with embeddings and bboxes; PDF binaries don't diff. Instead:
- For each new version, run version-detection heuristic (D15) → match to existing
logical_document - For each matched logical doc, run chunk-level diff:
- For each new chunk, find nearest neighbor in prior version (cosine distance + section ID match)
- High similarity (>0.95, same section ID) → unchanged
- Medium → changed (render side-by-side text diff)
- No match → added
- Old chunks unmatched → removed
- Surface in
/kb/[id]/doc/[id]/diff?compare=v1,v2: green/yellow/red sections
Git-diff IS useful for the content of a single matched chunk — line-by-line git-style diff for inner-chunk rendering. Git-diff for inner-chunk view, structural diff for inter-chunk view.
Build sequence:
- Version-detection heuristic on upload (part of item 4)
- Chunk-level diff algorithm
/diffUI
Design questions:
- (a) Diff scope. Just v_prev vs v_current, or arbitrary version-pair (v1 vs v5)? Lean: arbitrary-pair.
- (b) Inner-chunk rendering. Word-level vs. line-level diff? Probably both — toggle in UI.
Dependencies: item 4 (need multiple versions to diff); item 1 (UI home for the diff view). Out of scope: three-way merge, branch-based collaboration, semantic-change classification.
Sequencing — order of attack
Most items parallelize cleanly.
Parallel track A — Visual design (Claude Design): UI/UX visual design for item 1 — chat pattern, KB management, ingestion pipeline view, doc drill-in.
Parallel track B — Extraction spike (Dan): Item 3 spike-and-iterate against test corpora. PDF refactor first, then DOCX. Iterate. Write up emergent abstraction.
Parallel track C — MCP foundation (whenever): Item 2 — independent of the others. Probably wait until Cognito dev pool setup is convenient anyway.
Serial after tracks A + B converge:
- Item 1 implementation (UX flow) — uses both the visual design and the now-shipped multi-doc-type extractor
- Item 4 (KB update flow) — needs item 1's upload UI shipped
- Item 5 (file diff) — needs item 4's multiple-version state
What's NOT in Functional MVP
Explicitly cut from Functional MVP:
Moved to Deployed MVP:
- Auth + multi-tenancy enforcement (RLS policies, scope-passing contract)
- Onboarding for strangers (sample KB, "try without uploading," first-run tour)
- AWS-native deploy (ECS, RDS, S3, CloudFront)
- Bedrock prod path swap (D16)
- Stripe + tier enforcement
- Public landing page
- Per-KB ACLs / sharing
Post-MVP (after Deployed MVP):
- Quality baseline / regression-test loop (visual QA via inspector + chat is enough for Functional MVP)
- Auto-sync integrations (Google Drive, Dropbox, SharePoint, S3, Git)
- Data export / KB dump
- HTML/URL ingestion
- OCR for image-only PDFs
- Multi-modal embeddings beyond text
- Pro tier, Studio tier, Enterprise features
- Hover popovers on source chips, reciprocal rank fusion, pgvector tuning beyond
lists=20
Success criteria — how do we know Functional MVP is done?
Concrete behaviors that count as success:
- Inspector trust. Dan drops a fresh 50-page PDF, watches the ingestion pipeline view, and trusts the extraction without manual chunk review. Color-coded retrieval traces explain every chunk surfaced in chat.
- Multi-doc-type breadth. PDFs, DOCX, MD/plain text work cleanly. XLSX is a stretch goal but probably done given Dad's spreadsheets.
- Hosted MCP works. Dan connects Claude Desktop / Cursor / Claude Code to a local Autri MCP server via OAuth, scopes a token to one KB, and runs successful agent queries.
- STEM Racing kids cohort. At least one kid imports the regs (or uses the existing KB) and runs 5 creative-interpretation queries with correctly cited sources, without Dan's intervention.
- Dad cohort. Dad's spreadsheets and raw PDFs both ingest cleanly; he runs at least 3 "what did we quote for similar scenarios?" queries that return useful results (validates the cross-product compound thesis).
- Versioning + diff works. Dan re-uploads a STEM Racing reg with edits and the diff view shows the changes correctly across both inter-chunk and inner-chunk views.
If 5 of 6 land, Functional MVP is done. Move to Deployed MVP.
Open questions / refinement areas
Highest-leverage refinement areas across items:
- The right-rail-vs-other-layout decision (item 1) — drives the rest of the navigation model.
- Agent-validation prompt design (item 3) — what does the validator look for? Spike will tell us.
- Local OAuth approach (item 2) — drives how much Deployed-MVP foundation we accidentally land in Functional MVP.
These three set cones for everything else.
Next steps
- Refine this doc together (this session + future sessions). Surface design questions as we hit them; iterate.
- Once H2s are tight, spin out epic docs per item:
projects/autri/epics/e1-ui-ux.md,e2-mcp-sse-oauth.md,e3-multi-doc-type.md,e4-update-flow.md,e5-file-diff.md. - Each epic gets implementation-grade detail before any feature branch opens.
- Claude Design takes UI/UX visual design as a parallel track.