Autri — Project Design Doc
Migrated from
projects/lorebase/design.mdon 2026-05-15 as part of the Autri brand commit (D1). The source doc had 12 historical annotations on## Glossary(all resolved/orphaned/replied — captured in decisions.md D13/D14/D15) which were not migrated. Source doc deleted post-migration.
Project status: Pre-S0 spike. Foundation built end-to-end in one session (2026-05-06). First corpus — STEM Racing World Finals 2026 regulations, 126 pages — fully ingested with the v3 pipeline.
Sub-system Index
None yet — single-package monorepo (app/, ingestion/, mcp-servers/*).
Overview
What Is This?
Autri is a generic knowledge-base platform with a visual extraction inspector as its product wedge. Users upload a document corpus; Autri parses, chunks, embeds, and indexes it; the inspector UI gives full visibility into how every document was processed — anchored to the original source — so users can vet quality before trusting answers from the system.
The thing most RAG tools get wrong is they're black boxes: when retrieval is bad, you can't tell why. Autri shows you exactly which fragment of the source PDF became which chunk, where on the page it sits, and how it was classified — bidirectionally. Hover a chunk in the text pane, the bbox lights up on the source page. Hover a region on the source, the corresponding chunk lights up.
Who Is It For?
Pilot corpora — two of them, deliberately different:
| Pilot | Corpus | Why it stresses the architecture |
|---|---|---|
| STEM Racing Charlotte | World Finals 2026 regulations (PDFs, ~120 pages) | Tests the inspector wedge: trust-via-visibility for non-technical users (high-school students). Annual update cadence, public-ish content, rule-lookup query patterns. |
| QuoteAI | Customer quote history (structured records, growing daily) | Tests the universal-chunks-model claim against a fundamentally different source type. Daily ingestion cadence, commercial sensitivity, semantic search across past quotes, large + growing doc count. |
If the architecture works on both, it works on most things. STEM tests the inspector wedge; QuoteAI tests incremental ingestion + structured-record source type + many-docs-in-one-KB at scale. Together they force the design to actually be generic instead of accidentally STEM-specific or QuoteAI-specific.
Long-term users: any organization that needs to trust an LLM's answers about their own documents — legal, regulatory, technical, or internal-knowledge teams. The inspector is the trust-building wedge for both small users (the racing kids) and enterprise pitches.
Business Model
This project is the foundation underneath QuoteAI. QuoteAI is "knowledge base + quoting app on top"; Autri is the knowledge base, made generic and inspectable.
Near-term: QuoteAI rebases onto Autri to ensure quality + accuracy. Without Autri's verbatim extraction + hybrid retrieval, QuoteAI's answers can't be trusted in front of a paying customer.
Long-term: Autri is the foundation for many products; QuoteAI is the first of many. If QuoteAI lands at John Hannah's company, the same KB engine pitches to other corpora-heavy buyers. If Autri ships standalone, the inspector wedge is sufficient differentiation.
Tech Stack
| Layer | Technology | Rationale |
|---|---|---|
| Frontend | Next.js 14 (App Router) + TypeScript + Tailwind + shadcn/ui | Mirror QuoteAI exactly so the eventual rebase is friction-free |
| State | React Server Components + Drizzle queries | Inspector is read-mostly; RSC keeps the app simple |
| Backend / API | Next.js API routes | One stack; deploy as a unit |
| Database | Postgres 16 + pgvector (Docker for local) | Hybrid retrieval — Full-Text Search (FTS) + vector + structural lookup; QuoteAI parity. See § Hybrid Retrieval. |
| Auth | None yet (single-user dev) | Multi-tenant in productization |
| Deployment | Local-only for now | Productization is post-pilot |
| Payments | N/A |
Key Libraries & Dependencies
| Library | Purpose | Notes |
|---|---|---|
pdftoppm (poppler) | PDF → per-page PNG render | Shell-out from ingestion/render.ts; no node-side image library |
pdftotext -bbox-layout (poppler) | PDF text-layer extraction with positions | Shell-out from ingestion/parse.ts; emits XHTML with normalized fragment bboxes |
claude CLI subprocess | LLM extraction calls | Bills against user's Max plan; the Anthropic Agent SDK explicitly does not support Max billing |
drizzle-orm | Typed query builder | Schema mirrors QuoteAI |
pgvector | Vector index | ivfflat with vector_cosine_ops, 1536-dim |
fast-xml-parser | XHTML parser for poppler output | parseTagValue: false is critical — default coerces numeric content like "7.6" / "2026" to numbers, breaking text reconstruction |
System Architecture
The Core Insight
The LLM's only job is determining which content fragments belong together as semantic chunks. Everything else — text extraction, coordinate math, persistence — is mechanical work that belongs in deterministic code.
This is the load-bearing architectural decision behind v3 of the extractor and behind everything that grows on top of Autri. It came from realizing we'd been asking the LLM to do everything: OCR text it could read for free from the PDF text layer, compute bbox coordinates by visual estimation from pixel images, draw bounding boxes, write captions, and make chunking decisions. The first three are mechanical. The fourth is irreducibly LLM-shaped.
When you carve the LLM job down to its semantic core:
- Verbatim text becomes guaranteed. Content comes from the PDF text layer, never regenerated by the model. The model emits operations like
register_chunk(fragment_ids: ["f4", "f5", "f6"]); code looks upfragments[fid].textand concatenates. The model literally cannot hallucinate text. - Bboxes become mathematically exact. Each chunk's bbox is the union of its fragments' bboxes (computed math), not a vision-estimated rectangle. The drift-toward-the-bottom-of-the-page issue that plagued vision extraction is structurally impossible.
- Cost drops dramatically. Smaller prompts (no transcription instructions), smaller outputs (operation lists with IDs, not full content), fewer turns. ~5x cheaper at higher quality.
- Smaller models match bigger ones. Haiku now performs at Sonnet's quality on this corpus, because the LLM job is small enough that model size stops mattering.
- The contract is stable across model swaps. Future model upgrades work without prompt rework — same operations contract.
This principle generalizes beyond Autri. It's also captured in workspace memory as feedback_llm_does_semantics_code_does_mechanics.md.
Pipeline
End-to-end flow from a source PDF to indexed chunks. The agentic stage (highlighted) is the only LLM call; everything else is deterministic, shell-based, or pure SQL.
flowchart TD
PDF[PDF source]
PDF --> RENDER["render — pdftoppm<br/>~200ms/page @ 200 DPI"]
PDF --> PARSE["parse — pdftotext + fast-xml-parser<br/>~50ms/page"]
RENDER --> PNG["page-NN.png<br/>visual anchor for inspector overlay"]
PARSE --> JSON["page-NN-text.json<br/>fragments [id, text, bbox]<br/>+ page_size + text_density"]
PNG --> CLASSIFY["classify — text density routing<br/><i>not yet wired</i>"]
JSON --> CLASSIFY
CLASSIFY --> EXTRACT["<b>extract — agentic stage</b><br/>Claude Haiku via claude CLI subprocess<br/>Reads JSON (and PNG for figures)<br/>Emits register_section / register_chunk / register_figure ops<br/>+ confidence (0..1), notes"]
EXTRACT --> COMMIT["commit — deterministic, transactional<br/>chunk.content = concat(fragments[i].text) ← VERBATIM<br/>chunk.bbox = union(fragments[i].bbox) ← EXACT<br/>section.title = same composition<br/>figure.bbox = vision-estimated (only vision math)<br/>heading chunks auto-emitted per register_section"]
COMMIT --> DB[(Postgres<br/>documents → pages →<br/>sections, chunks, figures)]
classDef agentic fill:#3b1d6e,stroke:#7c3aed,stroke-width:2px,color:#fff
class EXTRACT agentic
Layer Descriptions
| Module | Owns | Notes |
|---|---|---|
ingestion/render.ts | PDF → PNG | Pure pdftoppm wrapper, configurable DPI |
ingestion/parse.ts | PDF → text-layer JSON | pdftotext -bbox-layout + XML parsing; emits per-page fragment files |
ingestion/load.ts | DB writes for documents + pages | Idempotent on source_hash; PNG dimensions read from native PNG header (no image deps) |
ingestion/extractor/cli-client.ts | claude CLI subprocess invocation | Schema-validated output via --json-schema; envelope-fallback parser for the result-vs-structured_output quirk |
ingestion/extractor/operations.ts | Operations Zod schema + validator | Validates every fragment_id resolves to a real fragment |
ingestion/extractor/extractor.ts | Operations → DB writes | Transactional commit; cross-page section-ID resolution via DB lookup |
ingestion/extractor/prompts/ | Versioned prompt files | page-extract-v3.md is current; version stored in extractor_version per row |
app/lib/db/ | Drizzle client + typed schema | HMR-safe singleton pool |
app/app/docs/[id]/page.tsx | Inspector UI (Server Component) | Data fetch + composition |
app/components/PageInspector.tsx | Hover-bbox overlay (Client Component) | Bidirectional highlight (text → bbox, bbox → text) |
app/app/api/cache/[...path] | Static file server for ingestion/cache/ | Path-traversal guarded |
Architecture Diagram
The dependency graph from cache files through the ingestion package to the database, then back out through the Next.js app and the agentic router. cache/ is gitignored and reproducible from source PDFs; everything below it is the durable artifact.
flowchart TB
subgraph CACHE["ingestion/cache/ (gitignored)"]
PDF["source PDFs"]
PNG["page-NN.png"]
JSON["page-NN-text.json"]
end
subgraph ING["@autri/ingestion"]
RENDER["render.ts<br/><i>pdftoppm shell</i>"]
PARSE["parse.ts<br/><i>pdftotext shell</i>"]
LOAD["load.ts<br/><i>idempotent on source_hash</i>"]
CLI["cli-client.ts<br/><i>claude CLI subprocess</i>"]
EXTRACTOR["extractor.ts<br/><i>commits agent ops</i>"]
FINALIZE["finalize.ts + link-sections.ts<br/><i>doc-level post-process</i>"]
EMBED["embed.ts<br/><i>OpenAI text-embedding-3-small</i>"]
end
PDF --> RENDER
PDF --> PARSE
PDF --> LOAD
RENDER --> PNG
PARSE --> JSON
PNG --> CLI
JSON --> CLI
CLI --> EXTRACTOR
EXTRACTOR --> FINALIZE
LOAD --> DB[(Postgres + pgvector<br/>documents → pages →<br/>sections, chunks, figures<br/>+ retrieval_log)]
EXTRACTOR --> DB
FINALIZE --> DB
EMBED --> DB
DB --> APP["Next.js app<br/>/docs (corpus listing)<br/>/docs/[id] (inspector)<br/>/docs/[id]/query (playground)<br/>/api/cache/... (static)"]
DB --> RETRIEVAL["@autri/retrieval<br/>lookup_section · fts_search · vector_search<br/>+ router"]
RETRIEVAL --> MCP["@autri/mcp-doc-search<br/><i>stdio MCP server</i>"]
RETRIEVAL --> APP
MCP --> AGENT["claude CLI<br/>(agentic router)"]
AGENT --> APP
Data Model
Core Entities
documents (
id, name, source_type, source_path, source_hash,
page_count, status, confidence_tier, extractor_version,
ingested_at, approved_at, approved_by
)
pages (
id, document_id, page_number, image_path,
width_px, height_px, parsed_text,
extraction_confidence, extractor_version,
UNIQUE (document_id, page_number)
)
sections (
id, document_id, section_id, parent_section_id, title, depth,
UNIQUE (document_id, section_id)
)
chunks (
id, document_id, section_id, chunk_index,
content, content_hash,
chunk_type ('text' | 'table' | 'figure' | 'heading'),
bbox JSONB,
embedding VECTOR(1536), embedder_version, extractor_version,
confidence, flagged, flagged_reason,
UNIQUE (document_id, chunk_index)
)
+ ivfflat (embedding vector_cosine_ops) WITH (lists = 20)
+ gin (to_tsvector('english', content))
figures (
id, document_id, page_id, chunk_id,
image_path, bbox JSONB, caption, figure_type, extractor_version
)
retrieval_log (
id, query, tool_name, tool_params JSONB,
result_chunk_ids UUID[], result_scores REAL[],
latency_ms, created_at
)
Bbox Convention
chunks.bbox is a JSONB array of regions: [{"page": 5, "x": 0.12, "y": 0.34, "w": 0.76, "h": 0.18}, ...]. Coordinates are normalized 0..1 of the page, top-left origin. This means:
- Bboxes survive image resizing / DPI changes for display
- The same chunk can in principle span multiple pages (not yet exercised — current chunks are single-page)
- Inspector overlay positioning is pure percentage math, no pixel coordinates
Figure Duplication (Intentional)
Figures appear in BOTH figures (with caption + figure_type metadata) AND chunks (as chunk_type='figure' rows with caption as content). This is deliberate: the inspector renders all chunks uniformly via the hover-overlay machinery, so figures get the same visual treatment as text chunks. The figures table is the source of truth for figure-specific metadata (caption, type); the chunks row provides the bbox + section-FK that the inspector consumes.
AI Interface Architecture
Extractor Surface (built)
The agent runs as a claude -p subprocess with --tools Read --add-dir <cache> --json-schema <ops-schema>. It reads page-NN-text.json (and page-NN.png if needed for figures) via the Read tool, returns structured JSON validated against operations.ts's Zod schema, then exits.
Why subprocess (not Anthropic SDK):
- Bills against Max — the user's Claude CLI OAuth session is inherited. The Agent SDK explicitly does not support
claude.ailogin per Anthropic policy. - Sandboxed — each page invocation is isolated; failure of one doesn't affect others.
- Visible — the trace is shell output, easy to debug.
For production / CI paths, swap to the Anthropic SDK with ANTHROPIC_API_KEY. The CLI subprocess is contained behind cli-client.ts; the swap is mechanical.
Hybrid Retrieval
The retrieval layer is hybrid by default — three primitives, each with a different sweet spot, exposed both as native tool-use in the production query path and as MCP servers for dev/debug from Claude Code or Claude Desktop. The agent picks. Most RAG systems are vector-only; vector-only silently fails on exact-match queries. Hybrid is the architectural bet, and the part of the IP we're underselling in this doc.
The three primitives:
| Primitive | Index | Wins for | Latency |
|---|---|---|---|
lookup_section(documentId, sectionId, recursive?) | Postgres unique index on (document_id, section_id) | Direct rule lookup. "What does C7.6.2 say?" Sub-millisecond, deterministic. Recursive variant walks parent_section_id for "everything in C7." | <10ms |
fts_search(documentId, query, k) | Postgres gin (to_tsvector('english', content)) | Keyword / exact-phrase / rule-number queries. Stemming + stop-words handled. Beats vector when the user's wording matches the doc's wording — which is more often than vector-only proponents admit. | <50ms |
vector_search(documentId, query, k) | pgvector ivfflat cosine over text-embedding-3-small (1536-dim) | Conceptual / paraphrase / synonym queries where the user's wording differs from the doc's. The "find me the rule about creativity even though the doc says 'innovation'" path. | <100ms |
Router architecture (D5):
The router spawns the local claude CLI as a subprocess (D12, Max-billed) with the doc-search MCP server attached over stdio and only the three tools allowed. The agent reads the user's query, picks tool(s), and emits a natural-language answer. System prompt instructs tool selection by query shape:
- Exact rule IDs →
lookup_section - Keyword / phrase →
fts_search - Conceptual / paraphrase →
vector_search - Ambiguous → run multiple in parallel; let the score ranking sort it out
Every tool call writes a row to retrieval_log (tool_name, query, tool_params, result_chunk_ids, result_scores, latency_ms, created_at). The router harvests these rows post-call within its wall-clock window and returns them as a unified hit list with source-of-result attribution — every chunk knows which index found it.
flowchart LR
USER[User query] --> ROUTER["claude CLI subprocess<br/>(router with system prompt)"]
ROUTER --> MCP["@autri/mcp-doc-search<br/>(stdio MCP server)"]
MCP --> P1[lookup_section]
MCP --> P2[fts_search]
MCP --> P3[vector_search]
P1 --> DB[(Postgres)]
P2 --> DB
P3 --> DB
P1 -.logs.-> LOG[(retrieval_log)]
P2 -.logs.-> LOG
P3 -.logs.-> LOG
DB --> ROUTER
LOG --> ROUTER
ROUTER --> ANSWER[Answer + hits with source attribution]
The IP angle — three differentiators most RAG systems lack:
-
Hybrid by default. Three indexes always available, agent picks. Vector-only systems silently fail on exact-match queries (an embedder will paraphrase "C7.6.2" into something less specific). Hybrid catches the failure modes that single-index systems can't see.
-
Source-of-result attribution. Every chunk in the result list shows which index found it (color-coded borders in the playground UI: blue for lookup_section, amber for fts_search, green for vector_search). Trust comes from legibility — if you can see why a chunk was returned, you can decide whether to trust it.
-
The retrieval_log + inspector overlay makes the trace a UX feature, not a debug log. Users see the agent's reasoning replay as part of normal use, not as an opt-in. This is the same trust-via-visibility wedge that drives the extraction inspector — applied to retrieval.
Open question — when to ensemble: if no single index dominates a query, reciprocal rank fusion is the natural next step (combine the rankings from all three indexes weighted by 1/(rank + k)). Not built yet; flagged as an open question because we need data on agent-pick patterns first. If the agent already runs multiple primitives in parallel for ambiguous queries, RRF may be redundant — the score ordering already does this implicitly.
Open question — KB-scoped variants. Today's primitives take documentId. As the unit shifts from document → knowledge base (see § Multi-tenancy & Knowledge Bases > The Unit Shift), the primitives widen to (knowledgeBaseId, ..., documentIds?). The chunks model and pgvector index don't change; the WHERE clauses do.
Pattern symmetry with extraction (D11): the retrieval architecture mirrors the extraction architecture. The LLM emits structured tool calls referencing IDs from typed inputs; code applies the operations deterministically. Same shape on both ends of the pipeline. The chunks are the IDs the router operates on; the fragments are the IDs the extractor operates on.
Pattern Symmetry
The retrieval agent (planned) mirrors the extraction agent (built): emit structured operations / tool calls referencing IDs from typed inputs; code applies the operations deterministically. Same shape on both ends of the pipeline.
Multi-tenancy & Knowledge Bases
Captures the multi-tenancy and knowledge-base architecture for Autri. The product wedge (the visual extraction inspector) shipped in v0 as a single-tenant, document-scoped artifact; this section locks in the shape we'll grow into as Autri becomes the foundation underneath QuoteAI and other downstream products.
The architectural pivot driving this section: the right primitive is the knowledge base, not the document. Once that lands, the rest (multi-tenancy pattern, scope passing, agent + KB selection, source-type-specific ingestion) falls out cleanly.
Subsections:
- The unit shift: document → knowledge base. Why retrieval primitives become KB-scoped, what changes mechanically.
- Tenant model. The
organizations / users / knowledge_bases / documentschain, why pattern-A multi-tenancy (shared DB + row-level isolation), how the operator-managed-now → customer-managed-later trajectory plays out.
Future subsections (TODO this session, after annotations on 1+2 settle): scope passing contract, agent + KB selection (hybrid model), source-type-specific ingestion, open questions on DB host (Neon vs Supabase) and auth provider.
The unit shift: document → knowledge base
Today's retrieval primitives are document-scoped. Every lookup_section, fts_search, vector_search call takes a documentId. That fits "I'm browsing this one rulebook" but breaks at the next horizon: a real organization doesn't operate at the document level, they operate at the knowledge base level — a coherent collection of documents that share access, retention, sensitivity, and indexing decisions.
A knowledge base is the natural unit because the things that vary in real systems vary KB-by-KB, not doc-by-doc:
- Access maps to it. A user has access to a KB; whether the KB has 1 doc or 1,000 is irrelevant for permissioning.
- Sensitivity maps to it. "Public rules" and "internal team comms" are different KBs, not different docs in the same KB. Mixing them under one ID is a leak waiting to happen.
- Update cadence maps to it. Rules change once a year; quote history grows daily. Different update strategies apply per KB, not per doc.
- Retrieval ergonomics map to it. Cross-doc queries within a KB are expected ("find me everything in our rulebook corpus about pit displays"). Cross-KB queries are unusual and explicit.
What changes mechanically:
The retrieval primitives shift from (documentId, ...) to (knowledgeBaseId, ..., documentIds?). The chunks model and pgvector index don't change — just the WHERE clauses. The agent's MCP tools widen accordingly: fts_search(knowledgeBaseId, query, k) searches all docs in the KB; an optional documentIds: string[] arg narrows back to specific docs when needed.
Document-scoped queries don't disappear — they're a natural special case (documentIds: [oneId]). The existing playground at /docs/[id]/query becomes /kb/[id]/query with an optional ?docs=... filter. Bookmarks keep working through a redirect from the old path.
Rejected alternatives:
- Keep doc-as-primitive, add a multi-doc query primitive. Rejected — doubles the retrieval-primitive surface area, and "search across docs in a KB" becomes a special-case feature instead of the default. The right unit shifts because the wrong shape becomes a maintenance tax.
- Folder-style nested KBs. Rejected for v0 — KBs are flat. Hierarchical KBs are a v2 concern when we have a customer asking; for now, free-form
descriptionon each KB handles categorization needs. - Workspaces above KBs. Rejected — adds a layer with no current use case.
organizationsprovides the grouping; if "workspace" emerges as a need (e.g., a customer wants to separate prod KBs from staging KBs), revisit then.
Tenant model
Multi-tenant from day one (per the deployment-pattern-A choice — shared DB, row-level isolation, defense-in-depth via RLS). The shape:
organizations (
id UUID PK,
name TEXT NOT NULL,
slug TEXT UNIQUE NOT NULL, -- for URLs
created_at TIMESTAMPTZ
)
users (
id UUID PK,
email TEXT UNIQUE NOT NULL,
name TEXT,
organization_id UUID REFERENCES organizations(id),
role TEXT NOT NULL DEFAULT 'member', -- 'admin' | 'member'
created_at TIMESTAMPTZ
)
knowledge_bases (
id UUID PK,
organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
name TEXT NOT NULL,
slug TEXT NOT NULL,
description TEXT,
created_by UUID REFERENCES users(id),
created_at TIMESTAMPTZ,
UNIQUE (organization_id, slug)
)
documents (existing, +)
knowledge_base_id UUID NOT NULL REFERENCES knowledge_bases(id) ON DELETE CASCADE
-- existing fields preserved (name, source_type, source_path, etc.)
flowchart TB
ORG[organizations]
USERS[users]
KB[knowledge_bases]
DOCS[documents]
DERIVED["pages · sections · chunks · figures · retrieval_log<br/><i>tenant inherited via FK chain</i>"]
ORG --> USERS
ORG --> KB
KB --> DOCS
DOCS --> DERIVED
USERS -.created_by.-> KB
classDef tenant fill:#1e3a5f,stroke:#3b82f6,stroke-width:2px,color:#fff
classDef new fill:#1e3a5f,stroke:#3b82f6,stroke-width:1px,color:#fff
class ORG,USERS tenant
class KB,DOCS new
FK chain is the canonical tenant identifier. Every retrieval-touched row inherits its tenant via documents.knowledge_base_id → knowledge_bases.organization_id. Joins gate every query. We deliberately do NOT denormalize tenant_id onto every table — the FK chain is the source of truth, and adding redundant columns invites drift on ingestion paths that forget to set them. We'll revisit if query-plan analysis shows the joins are too expensive (likely never at our scale).
Access model (v0): all members of an organization have read access to all KBs in that organization. Per-KB ACLs are deferred — YAGNI until a customer asks for "this KB is read-only for these users." When that happens, add a knowledge_base_acls table without touching existing rows.
Defense-in-depth: RLS as safety net. Once the DB host is decided (Neon vs Supabase, see Open Questions), enable row-level security policies that enforce organization_id = current_user.organization_id at the database layer. App-layer filtering is the primary gate; RLS is the backstop for the inevitable bug where a developer forgets a WHERE clause. Doesn't replace careful code — it caps the bug count near zero instead of unbounded.
Operator-managed → customer-managed trajectory:
| Phase | Who creates the org/KB | Autri admin UI | Auth |
|---|---|---|---|
| v0 (now) | We do, on customer's behalf | Internal-only | None / dev |
| v1 | Customer self-signs-up | Customer sees their org's KBs only | Real auth (Clerk / Supabase / etc.) |
| v2 | Same, plus per-KB ACLs and team management | Same, plus role-based gating | Existing |
Schema doesn't change between phases. v0 → v1 is purely UI + auth-flow work. v1 → v2 adds an knowledge_base_acls table without touching any existing rows. The IP / focus argument (operator-managed first) is a productization-strategy choice, not an architectural one — schema is ready for v1 from the moment this lands.
Source-type-agnostic. A KB can contain any mix of documents.source_type values — PDFs, emails, HTML, structured records, agent-generated content (e.g., new quotes from QuoteAI). Ingestion paths fork by source type but converge into the same chunks table. (Fully treated in the future "Source-type-specific ingestion" subsection.)
Rejected alternatives:
- Per-tenant database (Aurora-per-customer). Rejected — operational cost dominates (~$50/mo Aurora minimum × N tenants, migrations × N, monitoring × N, backups × N). pgvector ivfflat recall improves with more data, so one shared index outperforms many small ones at our scale. Becomes correct only with hard compliance requirements (HIPAA / SOC2 with data-residency) or one customer at 100x scale of the others.
- Schema-per-tenant. Rejected — middle-ground awkward. Better isolation than shared-table at higher operational complexity than shared-DB-with-RLS. Pick one extreme or the other; the middle pays for both.
- Denormalize
tenant_idonto every table. Rejected as default — invites drift. Use the FK chain. Revisit if query-plan analysis shows join cost is real. - No
userstable; only auth provider IDs. Rejected — even if we use a third-party auth (Clerk, Supabase Auth), we still want ausersrow local to associate KBs with creators, track invitations, log activity. The auth provider's user ID becomes one column onusers, not a replacement for the table.
Deployment & isolation tiers
Three deployment tiers, each appropriate for a different customer shape. The tier determines the database isolation, AWS account boundary, and pricing model. The application code is identical across tiers — what changes is the infrastructure config (Terraform variables) and which Postgres pool the request hits.
flowchart TB
subgraph T1["Tier 1 — Shared Multi-tenant (default)"]
direction LR
T1A[Many customers] --> T1B[Hannah Labs AWS account]
T1B --> T1C[(Shared RDS Postgres + RLS)]
end
subgraph T2["Tier 2 — Dedicated Database (Brehob lands here)"]
direction LR
T2A["Enterprise customer<br/>(e.g., Brehob — Cognito + Entra federation)"] --> T2B[Hannah Labs AWS account]
T2B --> T2C[(Dedicated Aurora<br/>per customer)]
end
subgraph T3["Tier 3 — Customer-hosted (optional uplift)"]
direction LR
T3A["Top-tier enterprise demanding<br/>full data sovereignty"] --> T3B[Customer's AWS account<br/>via Terraform deploy]
T3B --> T3C[(Customer's Aurora<br/>their data, their hardware)]
end
Tier 1 — Shared multi-tenant (default)
- Who: small / startup / free-tier customers, self-serve signups
- Where: Hannah Labs' AWS account, shared infrastructure
- DB: Shared RDS Postgres + pgvector with row-level security policies enforcing
organization_id = current_user.organization_id - Compute: Shared ECS Fargate cluster, multi-tenant Next.js app
- Cost basis: Per-org cost is fractional — RDS + ECS amortized across N customers
- Pricing: Pay-as-you-go or low-tier subscription
Tier 2 — Dedicated database (Brehob's tier)
- Who: mid-market / paid enterprise customers requesting data isolation. Brehob lands here under QuoteAI's PRICING-D2 ($3,500/mo + $12,500 setup).
- Where: Hannah Labs' AWS account, customer-specific Aurora cluster
- DB: Per-customer Aurora Serverless v2 with pgvector.
tenants.dedicated_db_urlcolumn points at the cluster; app code selects pool by tenant lookup at request time. - Compute: Shared ECS Fargate cluster (compute is cheap; data isolation is the value)
- Auth: Cognito user pool with customer's IdP federated in (Brehob = Microsoft Entra / Azure AD per QuoteAI INFRA-D3). Customer's users authenticate against their own corporate identity; Cognito issues session tokens; ~$0/mo at expected scale.
- Cost basis: Aurora cluster is the major variable cost. AWS run cost validated at ~$771/mo mid-scenario for Brehob shape (Bedrock chat + drafting + embeddings + RDS + Fargate + supporting services).
- Pricing: ~$3,500/mo subscription = ×4.5 multiple at validated cost. Setup fee $12,500 ($5K base + $7,500 Phase 1 corpus ingestion at ~$1.50/doc).
Tier 3 — Customer-hosted (optional sovereignty uplift)
- Who: top-tier enterprise demanding full data sovereignty (regulated industries, customers with strict data-residency clauses, or specific procurement requirements). Optional upgrade from Tier 2 when justified.
- Where: Customer's AWS account, deployed via shared Terraform modules
- DB: Customer's Aurora cluster, in customer's VPC, customer-managed backups
- Compute: Customer's ECS Fargate. Customer's Bedrock quotas. Customer's Cognito (or federated to ours).
- Cost basis: Customer pays AWS direct; Hannah Labs bills for software license + implementation + ongoing support contract
- Pricing: Highest tier — software license + implementation services + ongoing support. Brehob has the option to migrate here later if their procurement / security review demands it; we don't lead with it.
Tier escalation triggers:
| Trigger | Move to |
|---|---|
| Customer asks "where is our data?" with concern in their voice | Tier 2 |
| Customer's procurement asks about SOC 2 / ISO 27001 / data residency | Tier 2 or 3 |
| Customer wants their data in their AWS account, full stop | Tier 3 |
| Customer is regulated (HIPAA, financial, defense) | Tier 3 (often required) |
Bedrock as the model provider in production:
Production runtime uses AWS Bedrock for Claude calls (better uptime than direct Anthropic API, AWS-native billing, regional failover, data-residency guarantees). Dev keeps the claude CLI subprocess pattern (D12, Max-billed) for local iteration; the abstraction in cli-client.ts makes the dev → prod swap mechanical.
Bedrock model lag (Anthropic ships Claude updates direct API first, Bedrock catches up days to weeks later) is the only real downside. Mitigation: pin models to specific versions in prod; upgrade on a schedule. Y1 must-ships per QuoteAI cost validation: prompt caching (~30-50% chat reduction), sliding window (~20 turns), daily Bedrock budget alarm.
Y1.5 cost optimization — validated headroom: A/B Haiku-rerank vs Sonnet on a 15-20 query panel post-trip. If Haiku passes quality bar, swap retrieval/rerank to Haiku and unlock margin headroom for high-usage chat scenarios.
Terraform-managed infrastructure:
All three tiers share the same Terraform module library:
infra/
├── modules/
│ ├── autri-stack/ # ECS + RDS/Aurora + S3 + Bedrock policies
│ ├── tenant-database/ # Per-tenant Aurora (Tier 2/3)
│ └── monitoring/ # CloudWatch + alarms
└── deployments/
├── shared/ # Tier 1 — multi-tenant
├── tenant-{name}/ # Tier 2 — per-customer in Hannah Labs AWS account
└── customer-{name}/ # Tier 3 — config in customer's AWS account
Same modules, different config. Adding a Tier-2 customer = clone tenant-template/, update variables, terraform apply. Tier 3 = same idea, deployed into the customer's AWS account instead. Reproducible, version-controlled, fast.
Rejected alternatives:
- All customers on dedicated DBs from day 1. Rejected — operational cost dominates, kills startup-tier economics. Per-tenant DB is the value-added tier, not the default.
- All customers customer-hosted. Rejected — most customers don't have AWS expertise / don't want operational responsibility. Self-managed is a feature for those who specifically want it.
- No shared deployment at all (every customer is its own AWS account). Rejected — adds bureaucracy and infra cost where most customers want managed simplicity.
- Multi-cloud (Azure + GCP) from day 0. Rejected — AWS-only is sufficient for the foreseeable customer pipeline; multi-cloud is a productization concern that adds 3× infra complexity. Add when a deal demands it.
- Vercel + Neon for prod. Rejected — fine for early SaaS but doesn't pass enterprise procurement (no isolation tier, no AWS-native deploy option). Autri needs to be sellable to AWS-native enterprises from day 0.
Document versioning & supersession
Real-world corpora aren't static. The FIA Technical Regulations get a 2026 version, then a 2026-W42 update, then a 2027 version. Quote-history KBs grow daily but with each quote being its own logical doc (no supersession needed). The schema needs to handle both shapes — versioned logical docs AND single-version-only docs — without forcing one model on every corpus.
Three things users want:
- Default to the latest — most queries should hit current rules, not historical drafts.
- Pin to a specific version — "what does the 2025 version say about wing dimensions?"
- Diff between versions — "what changed between v1 and v2?"
The shape that supports all three without re-extracting on every update:
logical_documents (
id UUID PK,
knowledge_base_id UUID NOT NULL REFERENCES knowledge_bases(id) ON DELETE CASCADE,
name TEXT NOT NULL, -- 'FIA Technical Regulations'
slug TEXT NOT NULL, -- 'fia-technical-regs'
current_version_document_id UUID REFERENCES documents(id), -- pointer to current version
created_at TIMESTAMPTZ,
UNIQUE (knowledge_base_id, slug)
)
documents (existing, +)
logical_document_id UUID NULL REFERENCES logical_documents(id),
version_label TEXT, -- '2026', '2026-W42 update', '2027 draft', or NULL
superseded_at TIMESTAMPTZ NULL -- when a newer version replaced this one
flowchart TB
KB[knowledge_bases]
LD["logical_documents<br/><i>FIA Technical Regs</i>"]
D1["documents v1<br/>2025 edition<br/><i>superseded_at: 2026-01-01</i>"]
D2["documents v2<br/>2026 edition<br/><i>superseded_at: 2026-10-15</i>"]
D3["documents v3<br/>2026-W42 update<br/><i>superseded_at: NULL (current)</i>"]
KB --> LD
LD --> D1
LD --> D2
LD --> D3
LD -.current_version_document_id.-> D3
classDef current fill:#1e3a5f,stroke:#3b82f6,stroke-width:2px,color:#fff
class D3 current
Single-version docs (e.g., one quote in a quote-history KB) skip logical_documents entirely — logical_document_id is nullable, so the existing model just keeps working. Versioning is opt-in: a doc gets a logical_document only when versioning makes sense for that source type.
Default behavior — latest only. Retrieval primitives default-filter WHERE superseded_at IS NULL. Users always query the current version unless they explicitly say otherwise:
// Default: scope to current versions only
fts_search({ knowledgeBaseId, query, k })
// Explicit: scope to a specific version
fts_search({ knowledgeBaseId, query, k, documentIds: [v1Id] })
// Explicit: include superseded versions
fts_search({ knowledgeBaseId, query, k, includeSuperseded: true })
Diff workflows fall out as a thin layer. No new primitives needed; existing ones compose:
- Section-level diff (
lookup_sectionagainst v1 vs v2): trivially shows additions/removals/changes by section_id. - Semantic-level diff:
vector_searchagainst v1's chunks for each top-k hit; for each, find the cosine-nearest chunk in v2. Chunks with no good match are "added" or "removed." Cosine threshold + manual review for v0; tighten heuristics later.
Agent visibility into versions.
The agent needs to know multiple versions exist while it's retrieving, not after the user complains about a stale answer. Two surfaces:
Per-chunk version metadata in every retrieval result. Every chunk returned from lookup_section / fts_search / vector_search carries a small version envelope alongside the existing chunk fields:
{
chunk: { /* existing fields */ },
version: {
label: '2026-W42 update', // human label (free-form)
index: 3, // 1-indexed by created_at within logical_document_id
is_current: true, // superseded_at IS NULL
logical_document_name: 'FIA Technical Regulations',
} | null // null when document has no logical_document_id (single-version-only)
}
The join is chunks → documents → logical_documents plus a windowed ROW_NUMBER() OVER (PARTITION BY logical_document_id ORDER BY created_at) — essentially free at our scale. The agent never accidentally treats a stale chunk as current because every result is annotated with its place in the version sequence. Single-version-only documents send version: null so consumers can branch trivially.
A list_documents(knowledgeBaseId) MCP tool that returns the version tree:
list_documents({ knowledgeBaseId }) => {
logical_documents: Array<{
id: string,
name: string,
versions: Array<{ document_id: string, label: string, index: number, is_current: boolean }>
}>,
unversioned_documents: Array<{ id: string, title: string }>,
}
Pull-based — the agent calls when the query smells version-scoped ("what changed", "in last year's regs", "compare versions"). Don't pollute the system prompt with version trees by default; it scales badly past a few logical_documents and most queries don't need it.
Why both label (text) and index (int): label preserves how the org names versions (FIA uses "2026 + Update 1 + Update 2", not v1/v2/v3); index gives the agent a normalized chronological position for "is this newer than that" logic without forcing semver onto asymmetric versioning.
Ingestion path with versioning:
When a customer uploads a new version of an existing logical document:
- Ingest as a new
documentsrow with its own chunks (full pipeline — render, parse, extract, embed). - Set
documents.logical_document_idto the existing logical document. - Atomically: set the previous current version's
superseded_at = now(), updatelogical_documents.current_version_document_idto the new row. - The previous version's chunks stay in the index — semantic search still finds them when explicitly scoped.
Idempotency on source_hash still applies — uploading the exact same file again is a no-op (matches the existing documents row, no new version created).
Detecting that an upload is a new version.
Step (2) above assumes we know which logical_document to attach to. Inferring it silently is a bad-stakes failure: a false positive buries a new doc under an unrelated lineage; a false negative creates two parallel logical_documents that should have been one.
The pattern: score, suggest, never silently group. On upload, score the new file against every logical_document already in the destination KB. Surface the top match in the inspector before finalize. The user confirms or rejects.
Three signals stacked into the score:
- Filename similarity after normalizing year/version tokens.
FIA-tech-regs-2026.pdf↔FIA-tech-regs-2025.pdfreads as the same lineage once2026/2025is collapsed to a placeholder. Cheapest signal, surprisingly strong in practice. Year tokens (19\d\d,20\d\d),vN/V\d, ISO dates (2026-W42), and explicit "draft" / "final" / "update" markers all normalize to a single<VER>placeholder before string comparison (Jaro-Winkler or token-set ratio). - PDF title metadata — the
/Titleentry in the PDF info dictionary. Often filled with the doc's canonical name independent of filename. (Per source type the analog is: xlsx workbook properties, email Subject normalized, HTML<title>tag.) - Structural overlap — once first-pass extraction has run, the section_id set intersection (
C7,C7.1, …) is a strong signal that two docs are versions of the same lineage. Compute Jaccard over section_id sets. For non-PDF source types: header-row overlap (xlsx), thread-id (email), URL path stem (HTML).
If the top score crosses a threshold (tunable; start ~0.7 weighted average — filename 0.5, title 0.2, structure 0.3) the inspector shows: "Looks like a new version of FIA Technical Regulations. Link as version, or treat as new doc?" Below threshold: defaults to a new logical_document; user can manually link later in the inspector.
Subtle vs big content changes don't matter for the grouping decision. Whether the diff is one paragraph or a complete rewrite, it's still "version 3 of the FIA regs" if it's the same lineage. The change-magnitude question is downstream — it informs whether to copy unchanged chunks forward or re-extract everything (a v1 optimization on top of chunks.content_hash).
V0 build: filename-only signal with the year/version normalizer, threshold tuning by hand. Title-metadata and structural-overlap signals layer in once we have a few corpora to tune against. The score → suggestion UX shape is settled now so we don't paint ourselves into a corner.
Why not delete superseded chunks:
- Diff workflows need the historical content live in the index.
- pgvector ivfflat is append-friendly; deletes don't reclaim space efficiently.
- Storage cost is tiny relative to the value of "what changed" queries.
- If retention becomes a concern (compliance, GDPR), add a
documents.purge_aftercolumn and a scheduled job — not v0.
Cost & timing:
| Component | When |
|---|---|
Schema migration (add logical_documents + 3 columns on documents) | NOW, as part of the multi-tenancy migration. Zero marginal cost while we're already touching documents. All new fields nullable — existing single-version docs work unchanged. |
Default superseded_at IS NULL filter in retrieval primitives | NOW. Trivial WHERE-clause addition, immediately useful even without UI. |
Per-chunk version envelope on retrieval results | NOW. Same join pass, additive contract. |
list_documents MCP tool | NOW. Tiny tool, one query. |
| Filename-only version-detection on upload | When upload UI lands (v1). Defer until we have a real upload flow to attach to. |
| Title-metadata + structural-overlap detection signals | v1+ — tune once we have multiple real corpora. |
| Version-picker UI on doc cards | v1 — when a customer (probably the FIA case) demands it. |
| Diff view (section-level + semantic) | v1 — same trigger. |
| Auto-supersede on upload | v1 — needs upload UI anyway. |
Schema-first, features-when-needed. The schema additions are zero-cost now; the UX features wait for real demand.
Rejected alternatives:
- Single
documentsrow with version chain viaprevious_version_id. Rejected — awkward to query (recursive CTE for "find the latest"), no clear "logical document" handle for the user-facing concept ("the FIA Regs" as opposed to "the FIA Regs v3"). The two-table model maps cleanly to user mental model. - Delete superseded chunks. Rejected — kills diff workflows, doesn't save much storage, can't be undone if a customer asks "what did v2 say in section 7?"
- Treat each version as a totally separate document. Rejected — that's where we are today, and the FIA case is the explicit demonstration that it's wrong. Users want "the doc" with a version history, not "N separate docs that happen to share a name."
- Force versioning on all docs. Rejected — adds bureaucracy for source types that don't need it (each quote in QuoteAI is its own thing, not a version of a parent quote). Versioning is opt-in via
logical_document_id. - Silently auto-link uploads to the closest existing logical_document above some threshold. Rejected — false-positive cost (a new doc gets buried as a version of an unrelated lineage) is worse than the friction of one confirmation click. Always suggest, always confirm.
- Bake the full version tree into the system prompt for every chat. Rejected — scales badly past a handful of logical_documents, costs tokens on every turn even when the query has nothing to do with versioning. Prefer per-chunk
versionenvelopes (annotated where it matters) plus the pull-basedlist_documentstool.
Scope passing contract
Every retrieval request needs a scope — the set of knowledge bases the requesting user is allowed to see. The scope is derived from auth at the edge, propagated through every layer as a typed contract, and re-enforced at the SQL layer as defense-in-depth. The contract is allowedKnowledgeBaseIds: string[] — flowing from auth through to the agent's MCP tools.
flowchart LR
AUTH["Auth provider<br/><i>user_id, organization_id</i>"]
RESOLVER["KB scope resolver<br/><i>SELECT id FROM knowledge_bases<br/>WHERE organization_id = $1</i>"]
ACTION["Server action<br/><i>receives allowedKnowledgeBaseIds<br/>from middleware</i>"]
PROMPT["System prompt template<br/><i>KB list interpolated</i>"]
ROUTER["claude CLI router<br/><i>invokes MCP tools</i>"]
MCP["MCP doc-search tools<br/><i>SQL filter: kb_id IN (allowed_set)</i>"]
DB[(Postgres + RLS)]
AUTH --> RESOLVER --> ACTION --> PROMPT --> ROUTER --> MCP --> DB
Layer responsibilities:
| Layer | What it does | What goes wrong if it fails |
|---|---|---|
| Auth provider | Identifies the user + their org | Fails closed — no auth means no scope means empty result |
| KB scope resolver | Translates user → allowed KB IDs (incl. future per-KB ACLs) | Bug here = user gets wrong KBs (the dangerous case) |
| Server action | Receives scope from auth context, never trusts client-supplied IDs | Client-side scope spoofing |
| System prompt template | Tells the agent what it can search | Agent asks for a KB it can't access — gets empty result, looks confusing but isn't a leak |
| MCP tools | SQL filtering on kb_id IN (allowed_set) | The backstop — even if every layer above is buggy, no chunk leaks unless RLS also fails |
| Postgres RLS | DB-layer enforcement of organization_id = current_user.organization_id | Last line of defense |
The system prompt template has KB scope baked in at request time:
You are a retrieval router for the Autri corpus.
You have access to these knowledge bases:
- {kb_1.name} ({kb_1.id}): {kb_1.description}
- {kb_2.name} ({kb_2.id}): {kb_2.description}
- ...
You can ONLY search within these KBs. If the user's query implies they
want a different one, suggest expanding scope via list_knowledge_bases.
[rest of the existing router instructions]
The agent sees a static list and picks. No round-trip to discover allowed KBs unless the user query explicitly implies "search elsewhere" (handled by the list_knowledge_bases tool — see § Agent + KB Selection).
Why client never sets allowedKnowledgeBaseIds:
Server actions and API routes derive scope from auth context, ignore any client-supplied scope, and pass the auth-derived scope to the router. The agent sees only what auth says it can see. A malicious client could send allowedKnowledgeBaseIds: [some-other-orgs-kb] but the server discards that field and reconstructs from auth.
Server-action contract (typed):
// Wrong (DO NOT DO THIS):
async function runQuery(query: string, allowedKnowledgeBaseIds: string[]) { ... }
// Right:
async function runQuery(query: string, scope?: { documentIds?: string[] }) {
const session = await getServerSession(); // from auth provider
const allowedKnowledgeBaseIds =
await resolveAllowedKBs(session); // derived, NOT passed
const scopeFiltered =
filterScopeByAllowed(scope, allowedKnowledgeBaseIds);
return route(pool, {
query,
allowedKnowledgeBaseIds,
scope: scopeFiltered,
});
}
The scope parameter is the user's intent (which KBs they want to search within their allowed set, optionally narrowed to specific docs). The allowedKnowledgeBaseIds is the permission (what auth says they can see). The agent gets the intersection.
Rejected alternatives:
- Trust the client to send scope. Rejected — that's how data leaks in B2B SaaS. The client says what it wants; auth says what it can have.
- Skip the MCP-tool-level validation; rely on app filtering. Rejected — defense-in-depth principle. App layer is the primary gate; MCP-tool layer is the backstop catching the inevitable bug.
- No system-prompt KB list; agent always calls list_knowledge_bases first. Rejected — extra round-trip for every query, and the agent's choices are noisier without explicit context. The hybrid approach (list in prompt + tool for expansion) is faster and more legible.
Agent + KB selection
The agent operates within a scope of knowledge bases. The hybrid model: user sets explicit scope via the UI; agent can suggest expanding scope via the list_knowledge_bases MCP tool when the query implies cross-KB reasoning. User-set scope is the default; agent expansion is opt-in per query and always confirmed with the user before crossing.
flowchart TB
UI["User in chat UI<br/><i>scope picker: single KB / multiple / all my KBs</i>"]
AUTH["Server: auth resolves allowed KBs"]
INTER["Intersect (user-selected ∩ auth-allowed)"]
PROMPT["System prompt template<br/><i>KB list interpolated</i>"]
AGENT["Agent — picks tool(s) within scope<br/><i>lookup_section · fts_search · vector_search</i>"]
LIST["list_knowledge_bases tool<br/><i>opt-in — agent calls only when query implies cross-KB</i>"]
UI --> INTER
AUTH --> INTER
INTER --> PROMPT --> AGENT
AGENT -.expansion path.-> LIST
LIST --> AGENT
Why hybrid (vs alternatives):
| Model | Pros | Cons |
|---|---|---|
| Pure agent-routes (every query starts with list_knowledge_bases) | Most flexible | Round-trip cost on every query; user can't see what's being searched without inspection; mis-routing risk |
| Per-KB chat (one chat per KB) | Explicit, simple | Forces context-switching; agent can't combine across KBs even when useful |
| Hybrid (chosen) | Explicit user intent + agent intelligence; fast common case; cross-KB still possible | Slightly more UI complexity (scope picker) |
UI shape:
The chat interface has a scope picker — a sidebar control that lets the user select:
- A single KB ("rules KB" only)
- Multiple KBs ("rules KB" + "team comms KB")
- All my KBs (everything I have access to)
The scope picker shows a count next to each KB name and description (e.g., rules KB · 2 docs · 1,272 chunks) so users see the size of what they're querying. Default selection is whatever they had last for that workspace; first-time users get "All my KBs."
The list_knowledge_bases tool:
Purpose: let the agent suggest expanding scope when the user's query implies cross-KB reasoning. Returns only KBs the user has permission to see.
list_knowledge_bases() => Array<{
id: string;
name: string;
description: string;
doc_count: number;
current_scope: boolean; // true if currently in user's selected scope
}>
System-prompt instruction: "If the user's query suggests they want to search a KB outside the current scope, call list_knowledge_bases to see what's available, then ask the user if they want to expand."
Example interaction:
User: "Compare what the rules say about pit displays with what our team discussed in the planning emails."
Agent (scope: rules KB only): Calls list_knowledge_bases, sees "team-emails KB" available.
"I can search the rules KB for pit display content. I notice you also have a team-emails KB — would you like me to search both?"
The agent never silently expands scope; it always asks. This keeps users in control and avoids surprise cross-KB queries that mix sensitive corpora.
Permission boundary:
list_knowledge_bases returns only KBs the user has access to (filtered by auth at the SQL layer). The agent can't suggest a KB the user can't access — that path simply doesn't exist in its world. This is enforced as part of the scope-passing contract (§ above).
KB descriptions matter. The agent's tool-selection accuracy depends on KB descriptions being clear about what's in each KB. "Internal team comms" vs "World Finals 2026 regulations" gives the agent enough signal to suggest expansion intelligently. We surface this in the KB-creation UI as required-on-create with a placeholder "what kind of content is in this KB? (e.g., 'rules and regulations', 'customer quote history', 'engineering specs')."
Rejected alternatives:
- Bake KB list into a static system prompt file (
.md). Rejected — goes stale the moment a KB is created. The system prompt is a template the server interpolates per request, not a static file. - Make every primitive call list_knowledge_bases first. Rejected — extra round-trip for every query in the common case, no benefit when the user has explicit intent.
- Show the agent ALL KBs in the org regardless of UI scope. Rejected — defeats the user's explicit scope choice. The scope picker exists because users want to know what's being searched.
- Auto-expand without asking when query confidence is high. Rejected — surprises users, mixes sensitive corpora, and gives no obvious revert. Always ask.
Source-type-specific ingestion
The chunks model is universal. The ingestion path forks by source type. documents.source_type is the dispatch key — already in the schema, used today to distinguish (e.g.) 'pdf' from future types. Each source type has its own ingestion module producing the same shape: documents row + chunks (+ optional sections / figures / pages).
flowchart TB
PDF["PDF source"]
XLSX["Excel / spreadsheet (xlsx)"]
EMAIL["Email (MIME)"]
HTML["Web page"]
QUOTE["Quote (structured record from QuoteAI)"]
TEXT["Plain text / markdown"]
PDF --> P_PDF["render → parse → extract → embed<br/><i>existing pipeline (D10/D11)</i>"]
XLSX --> P_XLSX["sheetjs/exceljs parse → table-block detect →<br/>row-as-chunk w/ header context → embed"]
EMAIL --> P_EMAIL["MIME parse → header extraction →<br/>body chunking → embed"]
HTML --> P_HTML["scrape → readability → chunk → embed"]
QUOTE --> P_QUOTE["structured field map → chunk → embed<br/><i>(synchronous, ~1s)</i>"]
TEXT --> P_TEXT["heading split → chunk → embed"]
P_PDF --> CHUNKS[chunks table<br/><i>universal model</i>]
P_XLSX --> CHUNKS
P_EMAIL --> CHUNKS
P_HTML --> CHUNKS
P_QUOTE --> CHUNKS
P_TEXT --> CHUNKS
Per-source-type paths:
| source_type | Ingestion shape | When it runs | Chunks shape |
|---|---|---|---|
pdf | render → parse text-layer → vision-extract w/ ops → commit (D10/D11) | Batch CLI: pnpm ingest extract | text / table / figure / heading |
xlsx | sheetjs parse → per-sheet section → table-block detection → row-as-chunk with header row baked into embedding text | Inline (upload) or batch (folder sync) | text / table; bbox = sheet name + cell range (e.g., Sheet1!A3:F12) |
email | MIME parse → split headers vs body → optional thread-aware chunking | Inline (when received) or batch (mailbox sync) | text chunks per paragraph; metadata chunk for headers |
html | fetch → readability extraction → DOM-aware chunk split | Inline (when URL added) | text chunks; figures from <img> |
quote (QuoteAI) | structured field → flatten to text → chunk | Inline at quote-creation time (~1s) | structured chunk type per field group |
text / markdown | heading split → paragraph chunk → embed | Inline or batch | text / heading |
Excel ingestion specifically (big one for QuoteAI):
Quote history, customer DBs, pricing matrices, and engineering specs all routinely live in spreadsheets. The xlsx ingestion module:
- Reads with
sheetjs(xlsxpackage) orexceljs— both Node-native, no shell-out needed. - Each sheet →
sectionsrow (section_id = sheet name). - Detects "table blocks" (contiguous header row + data rows). Most spreadsheets have a clear header row at row 1 or 2; detect by looking for the first row whose cell values are all non-numeric / consistent type.
- Each data row → one
chunksrow, with header row context baked into the embedding text. Critical for recall: searching "find quotes for industrial robot arm controllers" should match a row that hasCustomer: ABC, Item: PLC controller for robotic armeven though the literal phrase doesn't appear. Headers as context = much better embedding alignment. bboxfor each chunk ={sheet: "Sheet1", range: "A3:F3"}— same shape concept as PDF bbox, different coordinate space.- Inspector renders xlsx chunks with a sheet-region view: render the spreadsheet with a highlighted cell range when a chunk is hovered.
Anthropic ships an xlsx skill we can lean on for the parsing layer if we don't want to write it from scratch — saves a few hours.
Common contract — what every source-type ingestion produces:
- A
documentsrow (withsource_typeset,knowledge_base_idset,logical_document_idset if versioning applies). - Optional
pagesrows (only for PDFs and other paginated sources). sectionsrows where structure exists (PDF section IDs, email subject + thread, HTML headings, structured-record field groups, xlsx sheets).chunksrows withcontent,chunk_type,bbox(where applicable, NULL for non-spatial sources),content_hash.- Optional
figuresrows for image-bearing sources.
The retrieval primitives don't care which path produced the chunks — they query against the same chunks table. Whoever wires up a new source type writes the ingestion module; no retrieval changes are needed.
Inline vs batch:
- Inline ingestion for small, real-time sources: a new email arrives, a quote is created, a URL is added, a single xlsx is uploaded. The user expects "ingested + searchable" within a couple seconds. Synchronous in the request handler is fine for ~1MB of input.
- Batch ingestion for large, scheduled sources: the existing PDF CLI pipeline, multi-sheet xlsx with thousands of rows, full mailbox sync. Run via
pnpm ingest, takes minutes to hours for big corpora.
Both produce the same chunks shape; only the orchestration differs.
Idempotency and incremental updates:
Every ingestion path computes documents.source_hash from the canonical content (file bytes for PDF, normalized MIME for email, sheet-and-row content for xlsx, etc.). Re-ingesting the same source is a no-op (matches existing row). Updates create a new documents row + new chunks; if versioning applies (see § Document versioning & supersession), the old version is superseded.
Per-chunk content_hash keeps the embed step idempotent: if we re-extract a doc but most chunks are unchanged, only the changed chunks need re-embedding. Saves cost on large corpora with small edits — especially relevant for xlsx where customers update one cell and re-upload.
Bbox handling — non-spatial sources:
chunks.bbox is NULL for sources without spatial layout (emails, plain text, structured records). For xlsx, bbox is the sheet name + cell range — different coordinate space than PDF (which uses normalized 0..1 page coords) but the inspector handles both. The retrieval primitives don't care.
The QuoteAI ingestion path specifically (because it's the second pilot):
Quotes arrive as structured records from QuoteAI's existing flow — fields like customer info, line items, pricing, terms, internal notes. The ingestion module:
- Receives the structured record at quote-creation time (synchronous in the create-quote handler).
- Maps each field group (customer info, line items, terms, notes) to a
sectionsrow — section_id is the field-group name. - Flattens each field group's content into a
chunksrow (chunk_type =structured). - Embeds inline. Total time: ~1s for a typical quote.
- Indexed and searchable before the create-quote response returns to the user.
This pattern unblocks "search past quotes for similar customer asks" — the retrieval primitives Just Work against the QuoteAI quote-history KB the same way they work against the STEM Racing rules KB. Different source type, same chunks model, same retrieval surface.
Rejected alternatives:
- Universal ingestion pipeline that takes all source types. Rejected — different sources need fundamentally different processing (PDFs need vision; emails need MIME parsing; xlsx needs cell-aware structure detection; quotes need field mapping). One pipeline-with-many-branches becomes a mess. Better: many small focused modules with a common output contract.
- Ingest everything as text only (drop structure). Rejected — kills the inspector wedge. We want section structure, page anchoring, figure detection, sheet/cell awareness. Source-specific paths preserve as much structure as the source has.
- Always run via batch jobs. Rejected — quote-history KBs and single-file uploads need real-time updates. Batch-only forces a worker queue + polling for "is my quote searchable yet" UX.
- Always run inline. Rejected — large PDF corpora and multi-sheet xlsx with thousands of rows take minutes; can't synchronous-block a user upload for that.
- Render xlsx → PDF → use existing PDF pipeline. Rejected — destroys cell structure (rows become "lines of text," columns become "spaces"), kills row-aware retrieval. Worth it only if the xlsx pipeline becomes too expensive to maintain; not yet.
Y1 Launch Plan
Captured 2026-05-07. The path from working spike to chargeable SaaS, with the unit economics and tier shape that go with it. The architecture is settled (D13–D17, § Multi-tenancy & Knowledge Bases above); this section is the go-to-market plan that those design decisions enable.
MVP threshold — five hard blockers
In dependency order. None are skippable; each gates the next.
| # | Blocker | What it is | Effort |
|---|---|---|---|
| 1 | Auth | Cognito (per QuoteAI INFRA-D3) with email + Google OAuth federation, session middleware that derives allowedKnowledgeBaseIds from the request. Today any URL grants any KB — must close before charging. | ~1 wk |
| 2 | Multi-tenancy enforcement | RLS policies on Postgres + the scope-passing contract from § Scope passing contract above. Schema is in (D13); the gates aren't. | ~3-4 days |
| 3 | Self-serve ingestion | File upload UI → background job (Inngest or Trigger.dev) → progress reporting. Today docs only land via pnpm ingest extract CLI. | ~1-1.5 wk |
| 4 | AWS-native deploy (D16) | ECS Fargate + RDS + S3 + Bedrock + Cognito + CloudFront + budget alarms. Coordinated with QuoteAI's deploy so Terraform modules are shared. | ~2-3 wk |
| 5 | Billing + tier gating | Stripe subscriptions, subscriptions table on organizations, middleware that blocks ingestion past quota. | ~1 wk |
Total: ~8-10 weeks focused, sequenced. #1 and #2 land first because everything else assumes them.
Soft blockers — needed before charging gracefully
- Source-type expansion. PDF-only is a TAM cap. xlsx (huge for QuoteAI overlap, half-built in QuoteAI's stack) + plain text/markdown + URL/HTML. ~1-2 wk for the three cheap formats. See § Source-type-specific ingestion.
- Brand identity.
D1 still treats "Lorebase" as a placeholder.Resolved 2026-05-15: D1 settled on Autri. Domain + landing-page polish remains pre-launch work.
Pricing tiers and unit economics
What we actually meter
Doc count alone is too coarse — a 1-page memo and a 500-page regulation cost wildly different things to ingest and store. Three axes, each tied to a real cost driver:
| Axis | What it caps | Cost driver | Why this axis |
|---|---|---|---|
| KB count | Number of knowledge_bases per organization | None directly — pure UX | Encourages organization; not a cost lever |
| Total chunks stored | Sum of chunks rows across all KBs in the org | Storage (recurring, low) + index size | Long-tail recurring cost; chunks ≈ 5 per page typically |
| Monthly pages ingested | Sum of pages rows created in calendar month | Vision extraction via Bedrock (~$0.001/page with prompt caching) | This is where money burns — vision API is the expensive call |
Web chat query volume is also metered (router cost ≈ $0.005/query) but only on lower tiers; MCP-driven queries are effectively free (see § The MCP economic shift below) so we don't meter them.
Tier ladder
| Tier | Price | KBs | Total chunks | Monthly pages | Web queries | MCP | Notes |
|---|---|---|---|---|---|---|---|
| Free | $0 | 1 | 5,000 | 50 | 50/mo | ❌ | Web-only. The funnel. |
| Personal | $20/mo | 3 | 50,000 | 500 | 500/mo | ✅ (1 client) | Lone-pro tier. |
| Pro | $50/mo (or $39/mo annual) | 10 | 500,000 | 5,000 | 2,000/mo | ✅ (multi-client) | Power user; multi-KB; version history (D15). |
| Enterprise | $3.5k+/mo + $12.5k setup | unlimited | unlimited | custom | unlimited | ✅ | Dedicated Aurora (D13 Tier 2), Cognito + Entra federation, per-tenant KMS. Brehob shape. |
Cost & margin per tier
Numbers below assume Bedrock prompt caching shipped (Y1 must-ship per QuoteAI cost validation), Haiku for vision extraction, OpenAI text-embedding-3-small at $0.02/1M tokens. All gross margins — net needs Stripe fees (~3%), CAC, support, overhead removed.
| Tier | Revenue | First-month cost | Steady-state cost | Steady margin | Margin % |
|---|---|---|---|---|---|
| Free | $0 | ~$1 ingest + $5 infra ≈ $6 | ~$5/mo | -$5/mo | n/a (loss leader) |
| Personal | $20/mo | ~$10 + $1 q + $5 infra ≈ $16 | ~$5-7/mo | $13-15/mo | ~70% |
| Pro | $50/mo | ~$100 + $5 q + $10 infra ≈ $115 | ~$15-20/mo | $30-35/mo | ~65% |
| Enterprise | $3.5k/mo | $771/mo (per QuoteAI cost validation) | $771/mo | ~$2.7k/mo | ~78% |
Cost-shape risks
- Pro first-month is a loss. ~$115 cost vs $50 revenue. A Pro user who churns at 30 days costs us ~$65. Mitigations: (a) push annual plan with discount that locks the recoup window, (b) raise Pro to $79-99/mo monthly with $39/mo annual, or (c) gate bulk-import behind a step that prevents casual Pro sign-ups dumping a 1000-doc historical archive. Lean (a)+(c).
- Vision-extraction cost is the biggest uncertainty. $0.10/100-page-doc assumes Haiku + aggressive prompt caching. If reality is closer to $0.50-1 (vision is hard to cache page-by-page), Personal tier inverts and Pro thins. Y1 must-ship items per cost validation: prompt caching, sliding window, daily Bedrock budget alarm.
- Free-tier loss is acceptable if conversion >5%. $5/mo × 100 free users = $500/mo loss, recouped by ~25 paying Personal users. The MCP gate is the conversion lever — free can't plug Autri into Claude Code, that's the upgrade trigger.
One-time overage fees
Tied to monthly pages ingested (the actual cost driver) — pay-as-you-go for the resource we're paying for. Avoids forcing tier upgrades for one-off spikes (e.g., "I just need to load this archive of 2000 pages once") and reduces refund pressure when someone over-ingests.
| Add-on | Price | Resets |
|---|---|---|
| +500 pages this month | $10 | One-time, calendar month |
| +5000 pages this month | $50 | One-time, calendar month |
| +500 pages permanent capacity | $25 | Persists; raises monthly base |
Web chat query overages and chunk-storage overages are not sold as one-offs — those grow proportionally with ingestion, so the ingestion add-on covers them implicitly.
The MCP economic shift
The biggest economic insight from this design pass: when a user is querying via their own Claude (Claude Code, Claude Desktop, Cursor, etc.) over MCP, our cost per query collapses by 10-50×.
| Path | Components per query | Cost |
|---|---|---|
| In-app web chat | Router LLM (Haiku, 2-3 turns) + embedding + Postgres + bandwidth | ~$0.005/query |
| MCP via user's Claude | Embedding (only if vector_search called) + Postgres + bandwidth | ~$0.0001-0.0005/query |
The user's Claude subscription bears the LLM cost; we're just serving the typed retrieval primitives. This is the economic argument for MCP being a paid-tier-only feature: it's the highest-value workflow for the user and the lowest-marginal-cost workflow for us.
For the Pro tier in particular, a customer who uses MCP heavily is essentially pure margin past the steady-state infra cost. A Pro user costing us ~$15-20/mo at the in-app-chat-only path drops to ~$5-10/mo when they're MCP-dominant. That's the segment to optimize for.
Implementation work: hosted MCP server (HTTPS+SSE, not stdio), OAuth 2.0 between MCP client and Autri, per-token scope enforcement. Foundry's E12 (MCP Authorization) solved this exact problem — we lift its design wholesale rather than re-deriving. ~2 wk implementation if we lean on E12 patterns.
KB-size scaling notes
pgvector index sizing
The chunks.embedding index uses ivfflat with lists parameter. Optimal lists ≈ sqrt(N):
- 656 chunks (today, single-tenant spike) →
lists=20— fine - 50k chunks (Personal tier max) →
lists=224— needs index rebuild - 500k chunks (Pro tier max per KB) →
lists=707— needs index rebuild
Mitigation: schedule periodic index rebuilds when a KB crosses ~10×, ~100×, etc. its initial size. Or move to HNSW once we're on Postgres 16 + pgvector 0.7+ (better recall at scale, no lists tuning).
Pro/Enterprise inflection at ~500k chunks/KB
Shared-DB performance (Pattern A from § Tenant model) starts to degrade past ~500k chunks per KB on commodity RDS sizes. That's the natural pressure point pushing customers from Pro to Enterprise:
- Below 500k chunks/KB: Pattern A shared DB is fine, sub-100ms queries, $50/mo Pro tier covers it.
- Above 500k chunks/KB: Pattern B dedicated Aurora per tenant is the answer — better tail latency, bigger index, customer-isolated load. Enterprise pricing covers the dedicated infra (per QuoteAI cost validation: $771/mo mid).
This means the Pro tier's chunk cap is a real ceiling, not a marketing line. A user trying to load a 1M-chunk corpus into Pro will hit slow queries; we point them to Enterprise as the answer. Tier boundaries align with actual technical inflection points.
Strategic sequencing — open questions
- Brehob first or self-serve first? If Brehob signs ($3.5k/mo recurring + $12.5k setup), that's the income that funds Autri MVP. Argues for: Brehob first (QuoteAI ships), Autri MVP self-serve second, with the QuoteAI-rebase-onto-Autri happening when there's paying-customer pressure on the foundation. Don't parallelize — context-switching tax is huge for a solo founder.
- GMPPU IP separation. Already flagged HIGH priority in next.md. Lawyer consult ($500-1500) before any paying customer; the FIA-docs overlap risk between GMPPU and Autri is real if the latter's first vertical is FIA-style technical regs.
- Naming commitment.
D1 still treats "Lorebase" as a placeholder.Resolved 2026-05-15: D1 locked on Autri. - Free-tier abuse vs lead generation. The MCP gate above is the cleanest answer — free users can't get the agentic-IDE workflow, that's the upgrade trigger. If conversion stays >5%, the model works.
- Vision-extraction cost at production scale. The biggest model unknown. Recommend: stand up the prompt caching + budget alarm before opening sign-ups, run a 100-doc ingestion on real test data, validate the $0.10/100-page-doc assumption before committing the tier prices in copy.
Recommended first-step after Brehob
Once Brehob outcome is known (sign or not, by ~late May 2026):
- If Brehob signs: focused 8-10 week MVP push on Autri. Sequencing per the five hard blockers above. Self-serve open by ~end of Q3 2026.
- If Brehob doesn't sign: re-evaluate go-to-market. Autri still ships, but on a slower cadence funded out-of-pocket. Maybe inverts the order — open free tier earlier as a lead-gen funnel for direct enterprise sales rather than self-serve scale.
Either way, the technical MVP plan stays the same; the go-to-market posture changes.
Rejected alternatives
- Per-doc pricing instead of per-page-ingested-monthly. Rejected — doc count is too coarse a proxy for cost. A user who uploads 100 thousand-page docs costs us $100 in vision extraction; the same tier limit on 100 single-page memos costs $10. Pages catches both. (Adopted: pages-monthly + total-chunks-stored, as per § Pricing axes.)
- Single chunk-count cap, no monthly axis. Rejected — chunks are recurring (storage) not burst (ingestion). Without a monthly-pages cap, a user could ingest a million pages on day one and exhaust the cost-driving resource before churning. Monthly axis matches the cost shape.
- Free tier with MCP access. Rejected — MCP is the highest-value workflow and the cheapest one to serve. Giving it free makes the upgrade case for every paid tier weaker. Free is web-chat-only; MCP unlocks at Personal.
- No first-month-loss mitigation on Pro. Rejected — $115 first-month cost vs $50/mo means churn at day 31 is a $65 loss. Annual plan with discount + bulk-import gating brings the break-even forward.
Open questions
The architecture above leaves a few productization decisions deliberately open. Listed here so they're tracked. Several originally-open questions have been resolved by the Deployment & isolation tiers decision (AWS-native + Bedrock + Terraform from day 0) and by QuoteAI's locked decisions (auth, pricing, cost validation). Resolved items are at the bottom under "Resolved."
Still open:
1. Self-serve onboarding flow (v1)
Operator-managed today (we create orgs and KBs on customer's behalf). The v1 self-serve flow needs:
- Sign-up / email verification
- Org creation (auto-create on first login? or explicit?)
- First-KB onboarding wizard ("upload a doc, see it parsed, ask a question")
- Pricing model + billing integration
None of this affects the schema — it's pure UI + auth-flow work. Open question: do we build self-serve when QuoteAI's first paying customer (post-Brehob #2+) demands it, or proactively for inbound interest?
2. Reciprocal rank fusion in the router
The agent currently picks one tool (occasionally two). If we observe consistent patterns where multi-index ensembling outperforms single-tool picks, RRF becomes worth implementing. Need data first — the existing retrieval_log is the data source. Open question: what's the threshold for "ensembling helps"? Probably look at chunks that scored well in two indexes vs only one and see if those are higher-quality answers.
3. Inline citation markers in agent answers
Today the playground shows the agent's answer separately from the chunks below. v0.2 lifts citation markers inline ([chunk_abc]-style refs the UI parses and renders as hover-tooltips with source bbox). QuoteAI's chat spike already has inline citation pills with hover popovers — that pattern is the migration target for Autri's playground. Open question: is this v1 or v2? Big UX win, moderate implementation effort.
4. Haiku-for-retrieval validation (Y1.5 per QuoteAI cost model)
QuoteAI's AWS cost validation has a Y1.5 plan to A/B Haiku-rerank vs Sonnet on a 15-20 query panel post-Brehob trip. If Haiku passes quality bar, swap retrieval/rerank to Haiku — unlocks margin headroom for high-usage chat scenarios. Same opportunity applies to Autri. Open question: what's the query panel for Autri's STEM Racing corpus? Likely overlap with QuoteAI's panel methodology, different content.
5. Multi-org users (contractors / consultants)
Current schema has users.organization_id as a single FK — each user belongs to exactly one org. Sufficient for v0 + most B2B SaaS shapes. Open question: if we hit a use case where one user works across multiple orgs (consultant working with multiple customers, contractor on multiple teams), this becomes a user_organizations join table. Cheap migration when needed, but worth flagging.
6. KB-level ACLs (per-user permissions within an org)
v0 model: all org members see all org KBs. Open question: when a customer asks for "the legal team can see the legal KB but the engineering team can't," we add a knowledge_base_acls table. Triggered by demand, not proactive.
7. Bedrock model version pinning + upgrade cadence
Bedrock lags Anthropic's direct API for model updates. Open question: do we always pin to a specific Claude version in prod, or follow Bedrock's default? Pinning gives predictability but requires manual upgrades. Following gets new features automatically but risks unexpected behavior shifts. Lean toward pinning with quarterly review — same pattern most enterprise customers will expect.
8. RLS policy authoring
We still need to write the actual RLS policies. The shape is "every row visible only if its organization_id (resolved via FK chain) matches the current session's current_user_organization_id." The Postgres mechanism: set a session variable in the server action, write policies that read it. Open question: do we manage policies via Drizzle migrations, or a separate migrations/policies/ directory? Drizzle doesn't have first-class RLS support yet.
Resolved:
By the Deployment & isolation tiers decision:
DB host (Neon vs Supabase vs Aurora).→ Aurora (RDS Postgres + pgvector for shared multi-tenant; Aurora Serverless v2 for dedicated tiers). AWS-native from day 0.MCP-via-CLI vs MCP-via-SDK switch.→ CLI for dev (Max-billed, D12), Bedrock SDK for prod. Same router code, swap behindcli-client.tsabstraction.Storage location for source files.→ S3. Per-tenant prefix in the bucket; KMS encryption with per-tenant keys for Tier 2/3.Per-tenant database — when?→ Tier 2 onward (mid-market and up). Tier 1 stays shared with RLS.
By QuoteAI's locked decisions (carried forward — Autri + QuoteAI converge on the same stack):
Auth provider (Cognito vs Clerk).→ Cognito + Entra federation (per QuoteAI INFRA-D3). Hannah Labs runs the Cognito user pool; enterprise customers federate their identity provider (Brehob = Microsoft Entra / Azure AD) into Cognito. ~$0/mo at expected scale.First-customer pilot pricing.→ $3,500/mo + $12,500 setup (per QuoteAI PRICING-D2). Multi-phase growth via scope additions ($5K Phase 1.5, $3K re-ingestion, $10-25K Phase 2 corpus).Cost-multiple validation.→ ×4.5 multiple at first-customer pilot pricing, against ~$771/mo mid-scenario AWS run cost (Bedrock + RDS + ECS + supporting services). Andy's "5-7× markup" rule of thumb landed at 4.5× after AWS validation. Headroom available via Y1.5 Haiku-retrieval swap if usage spikes.Setup fee structure.→ $5K base (AWS env + auth/SSO + customer config + PM) + $7.5K Phase 1 corpus ingestion (~5K docs at ~$1.50/doc) = $12.5K entry. Add-ons priced separately.
Competitive Landscape
Captured 2026-05-08 during strategic refinement. The competitive set is wider than "RAG tools" because Autri touches several adjacent markets — general AI knowledge bases, vertical chat-with-docs tools, author-specific writing tools, dev/infra, and AI-platform-native KBs. Each ring has different competitors and a different attack/defense posture. This section captures the positioning rationale and the moat claims, so future product decisions can justify against a coherent competitive picture rather than re-litigating it ad hoc.
The five competitive rings
The set of products users might choose instead of Autri falls into five buckets. Each ring competes on a different dimension; Autri's positioning is to be defensibly differentiated in every ring, not best-on-features in any single one.
Ring 1 — General-purpose AI knowledge bases (closest)
| Competitor | Position | Where they win | Where they lose |
|---|---|---|---|
| NotebookLM (Google) | Free, multi-modal, audio summaries | Distribution + free + Gemini quality | No API/MCP, source caps, no enterprise tier, black-box extraction |
| Notion AI | "AI inside the doc tool you use" | Massive distribution, embedded workflow | Mediocre RAG; not a real KB; no inspection |
| Mem.ai | "Self-organizing AI workspace" | Slick UX, fast capture | Pricing churned; positioning unstable; not source-of-truth |
| Obsidian + AI plugins | Power-user choice | Local-first, plugin ecosystem, free | Setup tax; no managed RAG; no MCP |
| Reflect / RemNote / Fabric | Personal AI notes | Niche fit | Don't scale to large corpora; black-box |
The structural threat: NotebookLM. Free, Google-distributed, audio overviews are genuinely novel. But NotebookLM has architectural ceilings it won't break through without disrupting its own positioning: no MCP/API (would cannibalize Workspace integration narrative), source caps (would hurt cost margins on free tier), no enterprise tier (Google has Vertex/Workspace for that), and structurally cannot show extraction quality because their parsing is internal. That last gap is the inspector wedge.
Ring 2 — Vertical "chat with docs" tools
| Competitor | Position | Notes |
|---|---|---|
| ChatPDF / AskYourPDF / Humata | Per-doc chat, low-friction | Toy-tier; commodity pricing race; no KB across docs; no MCP |
| Hebbia | Enterprise document analysis (finance/legal) | Strong incumbents; high-touch sales; ~$50–500k/yr |
| Glean | Enterprise search/RAG over corp data | Massive funding; SSO/connectors; not self-serve |
| Sana AI | Enterprise learning + KB | Newer, well-funded, training/onboarding focus |
Autri's enterprise tier (~$3.5k+/mo) sits between toy-tier and Glean/Hebbia. That's the gap for companies needing real RAG but not ready for $200k contracts. QuoteAI/Brehob-style customers live here.
Ring 3 — Author-specific tools
| Competitor | Position | What they do | The gap they leave |
|---|---|---|---|
| Sudowrite | "AI for fiction writers" — $19–59/mo | Generative assist, plot help, "Story Bible" | Story Bible is shallow; no MCP; no cross-book queries; locked to their UI |
| NovelAI | Generative fiction AI | Strong generation | No KB / RAG over user's prior work |
| Plottr / Campfire / World Anvil | Story planning / wikis | Manual structure | No semantic retrieval |
The key insight in this ring: none of these products treat the author's prior body of work as a retrieval substrate. They're generation-first, reference-second. Autri is reference-first and integrates with whatever the author writes in (via MCP), rather than being yet another writing UI to learn.
Ring 4 — Developer/infra (framing competitors)
LlamaIndex, LangChain, Pinecone, Weaviate, Qdrant, Vectara. These are frameworks/infra, not products. Autri competes by being the hosted, opinionated, inspector-equipped product someone could otherwise stitch together themselves with these tools — for orders of magnitude more effort and without the inspector or operations-based extraction.
The pitch: "You could build this in 6 months with LlamaIndex + Pinecone + a custom inspector UI; or you could pay us $35/mo."
Ring 5 — AI-platform native KBs (existential watch)
| Competitor | Position | Threat level |
|---|---|---|
| Claude Projects (Anthropic) | KB inside claude.ai | High — same vendor, same model, free with Claude Pro |
| ChatGPT Custom GPTs / Files | KB inside ChatGPT | High — massive distribution |
| Cursor knowledge bases | KB inside the IDE | Medium — different audience but converging |
| Gemini-in-Workspace KBs | KB inside Google Drive/Docs | High for Google-native customers |
This is the scariest ring because the AI platforms can absorb our wedge if they decide to. Mitigation: position Autri as the MCP server they consume, not the alternative they replace. If Claude Projects deepens, Autri is what plugs into it for users with corpora that exceed Claude Project's source limits or who need the inspector for trust.
Autri's differentiators (ranked by defensibility)
-
Visual extraction inspector — unique. No competitor lets you see how each chunk maps back to the source page. This requires per-page-as-image source modeling end-to-end, which is hard to retrofit. The inspector is the wedge from § Overview > What Is This?, and competitive analysis confirms it's defensible.
-
Color-coded retrieval traces — near-unique. Showing which index returned which chunk (lookup_section / FTS / vector) is rare. NotebookLM hides it. Glean shows source docs but not retrieval method. This makes Autri's outputs legibly trustworthy (§ Hybrid Retrieval).
-
Operations-based extraction — technically hard to copy. D11 invariant: LLM emits operations against fragment IDs; code does mechanical work. Verbatim text and exact bboxes guaranteed by construction. Most competitors do token-window chunking with hallucination risk on metadata. Replicating this requires re-architecting their pipeline, not adding a feature.
-
Hosted MCP with KB-scope — strategic + economic. Most KB tools are web-only. None of NotebookLM, Notion AI, Sudowrite have MCP. Per § Y1 Launch Plan > The MCP economic shift, MCP is both the highest-value workflow and the cheapest to serve (10–50× less inference cost than web router). MCP-as-paid-tier (D19) aligns user value with marginal cost — a structural advantage no competitor has framed yet.
-
Coherent self-serve → enterprise ladder — positioning. Most competitors pick one segment. NotebookLM = consumer. Glean = enterprise. Sudowrite = author. Autri has Free → Author → Pro → Enterprise on the same backend, with the inspector and MCP being the through-line value at every tier.
Vulnerabilities (and mitigations)
-
NotebookLM is free + Google-distributed. Hard to compete on price.
- Mitigation: MCP, vet-extraction, and enterprise tier — features they structurally won't add.
-
Claude Projects could absorb the inspector wedge. Anthropic could ship "show how each chunk was extracted" as a Claude Projects feature.
- Mitigation: be the MCP server they call, not the alternative they replace. Autri is stronger as platform-agnostic infra than as Claude-platform-locked competitor.
-
No mobile/iOS app. Authors use iPads; researchers travel.
- Watch: this becomes a real complaint by month 6 of self-serve. Plan a thin mobile read-only app for v1.5.
-
No collaboration features yet. Sudowrite, Notion, World Anvil all do multi-user.
- Mitigation: planned Studio tier is the answer — ship when collab features are real, not before.
-
Single LLM provider lock-in (Anthropic via Bedrock). Some users will want OpenAI / Gemini / local.
- Mitigation: D12/D16 abstraction layer keeps this swappable. Not a pre-launch blocker.
-
Cold-start brand discoverability. "Autri" is unknown vs. SEO incumbents.
- Mitigation: the inspector demo is highly visual and shareable. Lean into "show, don't tell" content marketing — STEM Racing pilot, mom's author-KB testimonial, Brehob enterprise case study all generate evocative content.
-
Solo founder vs. funded teams. Sudowrite has 20+; Glean has hundreds.
- Mitigation: depth not breadth. Don't try to outship them on features — outship them on the one thing each segment cares about (inspection for skeptical pros; MCP for power users; per-KB ACLs for enterprise).
Strategic positioning
One-line pitch:
Autri is the inspectable knowledge base — see how every chunk was extracted, see which index retrieved it, and plug your knowledge into Claude / Cursor / your editor via MCP.
Three-segment marketing (same SKU, segment-specific copy):
| Segment | Headline | Hero feature |
|---|---|---|
| Prosumer / authors | "Your library, your reference. Per-book KBs. Plugs into your editor." | MCP + inspector |
| Research / regulatory | "Trust your retrieval. Inspect every chunk. See what the agent saw." | Inspector + retrieval traces |
| Business / enterprise | "Self-host or managed. Per-KB ACLs. Bedrock-backed. SOC2 path." | Multi-tenancy + Bedrock + RLS |
The thing nobody else can credibly say: "You can see exactly what we extracted, and exactly how the agent retrieved it." Inspection is the moat; MCP is the distribution + economic shift; the Free→Enterprise ladder is the GTM coverage.
Author market: starting segment, not the wedge
The author market (mom's profile and similar) is one validated starting segment among several, not the primary GTM thesis. The reason to start there is opportunistic — direct customer access via personal network — not strategic primacy. The broader self-serve market includes researchers, regulatory writers, technical documentation owners, and consultants; all share the same need (trustworthy KB over a personal corpus) and the same MCP-shaped workflow.
Why this distinction matters for product decisions:
- Don't over-tune the inspector or extraction pipeline to fiction-shaped sources. The corpus generality (PDFs, regulations, structured records) is the architectural bet (§ Overview > Who Is It For?).
- Don't over-invest in author-specific features (Continuity Check templates, series mode) before validating they're the highest-leverage feature work across the segment portfolio.
- Use mom-as-design-partner for early UX feedback on the generic product, not as a request queue for author-specific features.
Build decision rationale
This section answers the gut-check question: given the competitive landscape, is Autri worth building?
Yes, with caveats. The reasoning:
Reasons to build:
-
The inspector is a genuinely differentiated philosophy of trust. Not a feature — a different stance on the human/AI relationship for knowledge work. That's defensible because competitors would have to re-architect, not add a checkbox.
-
The MCP economic insight is rare and right. Pushing power users to MCP is both cheaper to serve and more valuable to them. We'd be early to that pattern, and the architecture (D5/D14/D19) is already aligned with it.
-
Customers exist before the product is finished. STEM Racing (pilot), mom (author segment validation), Brehob (enterprise prospect via QuoteAI). That's not always true at this stage.
-
Marginal cost to ship is low. Foundation extracted from QuoteAI, not greenfield. Per D2, architectural decisions converged across both products — work on Autri strengthens QuoteAI and vice versa.
-
The decision is reversible. Build MVP → validate weak interest → pivot or pause. The cost of not shipping is missing the window.
Caveats / honest risks:
-
Author market is a starting point, not the destination. Don't bet the company on it alone. Multi-segment from day one.
-
NotebookLM is the existential threat. Watch their roadmap. If they add MCP or expose extraction inspection, reassess immediately.
-
Vision-extraction cost validation is the most important pre-launch check (§ Y1 Launch Plan > Cost-shape risks). If real costs are 10× our assumptions, the unit economics break and the whole pricing model needs rethinking.
-
Don't ship until the moat is real. Inspection working perfectly + MCP integration that genuinely provides value. A weak v1 of either kills the differentiation story.
-
The product needs to be demoable to convince anyone. Lean hard into video/screenshots showing the inspector. The story doesn't tell itself in text.
Rejected positioning alternatives
-
"Autri as Sudowrite competitor." Rejected. Sudowrite is generation-first; Autri is reference-first. Pivoting to writing-UI would abandon the inspector wedge and the corpus-generality bet, and put us in direct feature competition with a funded incumbent.
-
"Autri as Glean competitor." Rejected. Enterprise search has long sales cycles, requires SSO/SOC2 from day one, and demands a sales motion solo founders can't run. Enterprise tier is a future path enabled by self-serve traction, not the launch motion.
-
"Autri as ChatPDF commodity." Rejected. Race-to-the-bottom pricing on per-doc chat. The inspector is wasted in this positioning because users uploading one PDF for one chat session don't need to vet extraction quality.
-
"Autri as developer infra (LlamaIndex competitor)." Rejected. Different audience (developers vs. end users); different revenue shape (open source + cloud vs. subscription); different moats. The hosted opinionated product is the right shape; framework competition is not.
Cross-Cutting Concerns
| Concern | Summary | Affected | Dedicated Doc? |
|---|---|---|---|
| Versioning | extractor_version column on pages and chunks is <prompt-version>/<model> (e.g., page-extract-v3/claude-haiku-4-5-20251001); lets us A/B prompts and models | extractor | N/A (in-code) |
| Idempotency | Render and load are idempotent on file content / source_hash; extract is destructive but reset preserves doc + pages rows so iteration doesn't break URLs | ingestion | N/A |
| Cache lifecycle | ingestion/cache/ is gitignored and fully reproducible from source PDFs (small, committed). Render + parse cache outputs can be regenerated freely | ingestion | N/A |
Risks & Constraints
Tech Debt
documents.statusandconfidence_tiernot auto-populated after extraction — one-line fix pending inwriteExtractionsections.parent_section_idis NULL for all 446 rows — derivable deterministically fromsection_idstrings, not yet computedchunks.embeddingis NULL for all 1,272 rows — embedding pass not donechunks.confidenceandchunks.flaggedunset — only per-page confidence today- Section title artifacts preserved verbatim (
"C 7.6 Pit D isplay setup and parameters"— PDF kerning quirks). This IS the desired behavior; if we ever want canonical titles, do it as a deterministic post-process
Known Limitations
- Image-only PDFs (scanned legacy with no text layer) currently extract poorly. Need a vision-only branch in the classifier. None in current corpus.
- Agentic router for retrieval doesn't exist yet; the corpus is queryable today only via direct SQL.
- No formal test suite — extractor self-validates via Zod + the operations validator (every
fragment_idmust resolve), but there's no end-to-end test. pgvectorivfflatlists=20is tuned for small datasets (~thousands of rows). At scale we'll need to revisit; HNSW is the likely upgrade.
Technical Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Claude CLI subprocess hits Max-plan rate limits at scale | Low at current usage | Stops ingestion mid-run | Add retry/backoff in cli-client.ts; fall back to API key for prod |
| Anthropic Max-plan policy changes (e.g., disallows subprocess use) | Low | Forces SDK + API path | LLM calls already isolated behind cli-client.ts; swap is contained |
pgvector ivfflat performance on large corpora (>100k chunks) | Medium | Slow vector search | Switch to HNSW or tune lists parameter; defer until measured |
| PDF text layer absent on a real corpus PDF | Medium (legacy scans) | Falls back to vision (more expensive, less accurate) | Per-page classifier picks routing; vision path is already proven |
Epic Index
Pre-S0 spike phase; epics will be defined when scope solidifies past prototype.
Likely first epics (when defined):
- E1 — Retrieval layer:
lookup_section/fts_search/vector_search+ agentic router + query playground UI - E2 — Status + tier UI fixes (the small one-line follow-ups + classifier wiring)
- E3 — Section parent linking + recursive queries
- E4 — Embedding pass + content hash management
Decisions Log
The active decisions list lives in ~/Documents/Code/autri/decisions.md (in the project repo, per workspace convention). Highlights graduate here when the rationale is settled-but-non-obvious.
| Date | Decision | Rationale | Alternatives Considered |
|---|---|---|---|
| 2026-05-06 | D10 — PDF text-layer first; vision fallback | Verbatim accuracy + exact bboxes guaranteed by construction. Cost down 5x at higher quality on the same corpus. | Pure vision (rejected — cost, drift, OCR errors). Hybrid heuristic (deferred — agentic classifier reads density score). |
| 2026-05-06 | D11 — Operations-based extractor (LLM does semantics, code does mechanics) | LLM jobs scoped to irreducibly-semantic tasks; mechanical work is deterministic. Same Haiku now matches Sonnet quality. | Vision-only with full-content output (rejected — proven worse on every axis). Pure heuristic chunker (rejected — misses semantic boundaries). |
| 2026-05-06 | D12 — claude CLI subprocess (not Anthropic SDK) | Bills against user's Max plan; the SDK doesn't support Max billing per Anthropic policy. | Anthropic SDK with API key (deferred to production deploy). Files API workaround (rejected — still requires API key). |
| 2026-05-06 | D5 — Agentic router via MCP tools (for retrieval) | Same retrieval functions exposed via MCP for dev + native tool-use for prod. Agentic routing handles edge cases the heuristic can't, and the trace is itself a UX feature for the inspector. | Hardcoded heuristic router (deferred — agentic-first while iterating). |
| 2026-05-06 | D7 — Vetting > tuning as primary user job | Trust is the wedge for both the STEM team and the enterprise pitch. Tuning is power-user territory, v0.3+. | Combined "everything UI" (rejected — overwhelming). Tuning-first (rejected — no users without trust). |
| 2026-05-06 | D2 — Foundation extracted from QuoteAI, not greenfield | QuoteAI's KB layer is what Autri IS. Mirror QuoteAI's stack so the eventual rebase is friction-free. | Greenfield with different stack (rejected — duplicates work, blocks rebase). |
Glossary
| Term | Definition |
|---|---|
| Fragment | A line-level text unit from the PDF text layer, with id, text, and normalized bbox. The atom the agent groups into chunks. Produced by the parse stage. |
| Operation | A register_section / register_chunk / register_figure directive emitted by the extractor agent. References fragment IDs; never regenerates content. |
| Chunk | A semantic content unit (text paragraph, table, figure, heading) stored in chunks. Content is deterministically composed from fragments by the commit step. |
| Section | A formal numbered ID heading (e.g., C7.6.1). Stored in sections with hierarchy via parent_section_id. |
| Inspector | The /docs/[id] UI that overlays chunk bboxes on rendered pages with bidirectional hover. The product wedge. |
| Tier | Per-doc confidence classification: green (≥0.85) / yellow (≥0.70) / red. Computed from avg page extraction confidence. |
| Verbatim | Text copied character-for-character from the PDF text layer, including any typesetting artifacts (kerning gaps like "C 7.6"). Guaranteed by the operations architecture — the LLM never types content. |
| Operations contract | The Zod-validated set of register_* directives the extractor agent emits. Stable across model swaps. |
This design doc is the source of truth for project-level architecture. Active decisions live in the project repo's decisions.md. Session handoffs live in the project repo's next.md. Code is the source of truth for implementation; this doc captures the WHY.