Foundry Foundry

Autri — Project Design Doc

Migrated from projects/lorebase/design.md on 2026-05-15 as part of the Autri brand commit (D1). The source doc had 12 historical annotations on ## Glossary (all resolved/orphaned/replied — captured in decisions.md D13/D14/D15) which were not migrated. Source doc deleted post-migration.

Project status: Pre-S0 spike. Foundation built end-to-end in one session (2026-05-06). First corpus — STEM Racing World Finals 2026 regulations, 126 pages — fully ingested with the v3 pipeline.


Sub-system Index

None yet — single-package monorepo (app/, ingestion/, mcp-servers/*).

Overview

What Is This?

Autri is a generic knowledge-base platform with a visual extraction inspector as its product wedge. Users upload a document corpus; Autri parses, chunks, embeds, and indexes it; the inspector UI gives full visibility into how every document was processed — anchored to the original source — so users can vet quality before trusting answers from the system.

The thing most RAG tools get wrong is they're black boxes: when retrieval is bad, you can't tell why. Autri shows you exactly which fragment of the source PDF became which chunk, where on the page it sits, and how it was classified — bidirectionally. Hover a chunk in the text pane, the bbox lights up on the source page. Hover a region on the source, the corresponding chunk lights up.

Who Is It For?

Pilot corpora — two of them, deliberately different:

PilotCorpusWhy it stresses the architecture
STEM Racing CharlotteWorld Finals 2026 regulations (PDFs, ~120 pages)Tests the inspector wedge: trust-via-visibility for non-technical users (high-school students). Annual update cadence, public-ish content, rule-lookup query patterns.
QuoteAICustomer quote history (structured records, growing daily)Tests the universal-chunks-model claim against a fundamentally different source type. Daily ingestion cadence, commercial sensitivity, semantic search across past quotes, large + growing doc count.

If the architecture works on both, it works on most things. STEM tests the inspector wedge; QuoteAI tests incremental ingestion + structured-record source type + many-docs-in-one-KB at scale. Together they force the design to actually be generic instead of accidentally STEM-specific or QuoteAI-specific.

Long-term users: any organization that needs to trust an LLM's answers about their own documents — legal, regulatory, technical, or internal-knowledge teams. The inspector is the trust-building wedge for both small users (the racing kids) and enterprise pitches.

Business Model

This project is the foundation underneath QuoteAI. QuoteAI is "knowledge base + quoting app on top"; Autri is the knowledge base, made generic and inspectable.

Near-term: QuoteAI rebases onto Autri to ensure quality + accuracy. Without Autri's verbatim extraction + hybrid retrieval, QuoteAI's answers can't be trusted in front of a paying customer.

Long-term: Autri is the foundation for many products; QuoteAI is the first of many. If QuoteAI lands at John Hannah's company, the same KB engine pitches to other corpora-heavy buyers. If Autri ships standalone, the inspector wedge is sufficient differentiation.


Tech Stack

LayerTechnologyRationale
FrontendNext.js 14 (App Router) + TypeScript + Tailwind + shadcn/uiMirror QuoteAI exactly so the eventual rebase is friction-free
StateReact Server Components + Drizzle queriesInspector is read-mostly; RSC keeps the app simple
Backend / APINext.js API routesOne stack; deploy as a unit
DatabasePostgres 16 + pgvector (Docker for local)Hybrid retrieval — Full-Text Search (FTS) + vector + structural lookup; QuoteAI parity. See § Hybrid Retrieval.
AuthNone yet (single-user dev)Multi-tenant in productization
DeploymentLocal-only for nowProductization is post-pilot
PaymentsN/A

Key Libraries & Dependencies

LibraryPurposeNotes
pdftoppm (poppler)PDF → per-page PNG renderShell-out from ingestion/render.ts; no node-side image library
pdftotext -bbox-layout (poppler)PDF text-layer extraction with positionsShell-out from ingestion/parse.ts; emits XHTML with normalized fragment bboxes
claude CLI subprocessLLM extraction callsBills against user's Max plan; the Anthropic Agent SDK explicitly does not support Max billing
drizzle-ormTyped query builderSchema mirrors QuoteAI
pgvectorVector indexivfflat with vector_cosine_ops, 1536-dim
fast-xml-parserXHTML parser for poppler outputparseTagValue: false is critical — default coerces numeric content like "7.6" / "2026" to numbers, breaking text reconstruction

System Architecture

The Core Insight

The LLM's only job is determining which content fragments belong together as semantic chunks. Everything else — text extraction, coordinate math, persistence — is mechanical work that belongs in deterministic code.

This is the load-bearing architectural decision behind v3 of the extractor and behind everything that grows on top of Autri. It came from realizing we'd been asking the LLM to do everything: OCR text it could read for free from the PDF text layer, compute bbox coordinates by visual estimation from pixel images, draw bounding boxes, write captions, and make chunking decisions. The first three are mechanical. The fourth is irreducibly LLM-shaped.

When you carve the LLM job down to its semantic core:

  • Verbatim text becomes guaranteed. Content comes from the PDF text layer, never regenerated by the model. The model emits operations like register_chunk(fragment_ids: ["f4", "f5", "f6"]); code looks up fragments[fid].text and concatenates. The model literally cannot hallucinate text.
  • Bboxes become mathematically exact. Each chunk's bbox is the union of its fragments' bboxes (computed math), not a vision-estimated rectangle. The drift-toward-the-bottom-of-the-page issue that plagued vision extraction is structurally impossible.
  • Cost drops dramatically. Smaller prompts (no transcription instructions), smaller outputs (operation lists with IDs, not full content), fewer turns. ~5x cheaper at higher quality.
  • Smaller models match bigger ones. Haiku now performs at Sonnet's quality on this corpus, because the LLM job is small enough that model size stops mattering.
  • The contract is stable across model swaps. Future model upgrades work without prompt rework — same operations contract.

This principle generalizes beyond Autri. It's also captured in workspace memory as feedback_llm_does_semantics_code_does_mechanics.md.

Pipeline

End-to-end flow from a source PDF to indexed chunks. The agentic stage (highlighted) is the only LLM call; everything else is deterministic, shell-based, or pure SQL.

flowchart TD
    PDF[PDF source]
    PDF --> RENDER["render — pdftoppm<br/>~200ms/page @ 200 DPI"]
    PDF --> PARSE["parse — pdftotext + fast-xml-parser<br/>~50ms/page"]
    RENDER --> PNG["page-NN.png<br/>visual anchor for inspector overlay"]
    PARSE --> JSON["page-NN-text.json<br/>fragments [id, text, bbox]<br/>+ page_size + text_density"]
    PNG --> CLASSIFY["classify — text density routing<br/><i>not yet wired</i>"]
    JSON --> CLASSIFY
    CLASSIFY --> EXTRACT["<b>extract — agentic stage</b><br/>Claude Haiku via claude CLI subprocess<br/>Reads JSON (and PNG for figures)<br/>Emits register_section / register_chunk / register_figure ops<br/>+ confidence (0..1), notes"]
    EXTRACT --> COMMIT["commit — deterministic, transactional<br/>chunk.content = concat(fragments[i].text)  ← VERBATIM<br/>chunk.bbox = union(fragments[i].bbox)  ← EXACT<br/>section.title = same composition<br/>figure.bbox = vision-estimated (only vision math)<br/>heading chunks auto-emitted per register_section"]
    COMMIT --> DB[(Postgres<br/>documents → pages →<br/>sections, chunks, figures)]

    classDef agentic fill:#3b1d6e,stroke:#7c3aed,stroke-width:2px,color:#fff
    class EXTRACT agentic

Layer Descriptions

ModuleOwnsNotes
ingestion/render.tsPDF → PNGPure pdftoppm wrapper, configurable DPI
ingestion/parse.tsPDF → text-layer JSONpdftotext -bbox-layout + XML parsing; emits per-page fragment files
ingestion/load.tsDB writes for documents + pagesIdempotent on source_hash; PNG dimensions read from native PNG header (no image deps)
ingestion/extractor/cli-client.tsclaude CLI subprocess invocationSchema-validated output via --json-schema; envelope-fallback parser for the result-vs-structured_output quirk
ingestion/extractor/operations.tsOperations Zod schema + validatorValidates every fragment_id resolves to a real fragment
ingestion/extractor/extractor.tsOperations → DB writesTransactional commit; cross-page section-ID resolution via DB lookup
ingestion/extractor/prompts/Versioned prompt filespage-extract-v3.md is current; version stored in extractor_version per row
app/lib/db/Drizzle client + typed schemaHMR-safe singleton pool
app/app/docs/[id]/page.tsxInspector UI (Server Component)Data fetch + composition
app/components/PageInspector.tsxHover-bbox overlay (Client Component)Bidirectional highlight (text → bbox, bbox → text)
app/app/api/cache/[...path]Static file server for ingestion/cache/Path-traversal guarded

Architecture Diagram

The dependency graph from cache files through the ingestion package to the database, then back out through the Next.js app and the agentic router. cache/ is gitignored and reproducible from source PDFs; everything below it is the durable artifact.

flowchart TB
    subgraph CACHE["ingestion/cache/ (gitignored)"]
        PDF["source PDFs"]
        PNG["page-NN.png"]
        JSON["page-NN-text.json"]
    end

    subgraph ING["@autri/ingestion"]
        RENDER["render.ts<br/><i>pdftoppm shell</i>"]
        PARSE["parse.ts<br/><i>pdftotext shell</i>"]
        LOAD["load.ts<br/><i>idempotent on source_hash</i>"]
        CLI["cli-client.ts<br/><i>claude CLI subprocess</i>"]
        EXTRACTOR["extractor.ts<br/><i>commits agent ops</i>"]
        FINALIZE["finalize.ts + link-sections.ts<br/><i>doc-level post-process</i>"]
        EMBED["embed.ts<br/><i>OpenAI text-embedding-3-small</i>"]
    end

    PDF --> RENDER
    PDF --> PARSE
    PDF --> LOAD
    RENDER --> PNG
    PARSE --> JSON
    PNG --> CLI
    JSON --> CLI
    CLI --> EXTRACTOR
    EXTRACTOR --> FINALIZE

    LOAD --> DB[(Postgres + pgvector<br/>documents → pages →<br/>sections, chunks, figures<br/>+ retrieval_log)]
    EXTRACTOR --> DB
    FINALIZE --> DB
    EMBED --> DB

    DB --> APP["Next.js app<br/>/docs (corpus listing)<br/>/docs/[id] (inspector)<br/>/docs/[id]/query (playground)<br/>/api/cache/... (static)"]

    DB --> RETRIEVAL["@autri/retrieval<br/>lookup_section · fts_search · vector_search<br/>+ router"]
    RETRIEVAL --> MCP["@autri/mcp-doc-search<br/><i>stdio MCP server</i>"]
    RETRIEVAL --> APP
    MCP --> AGENT["claude CLI<br/>(agentic router)"]
    AGENT --> APP

Data Model

Core Entities

documents (
  id, name, source_type, source_path, source_hash,
  page_count, status, confidence_tier, extractor_version,
  ingested_at, approved_at, approved_by
)

pages (
  id, document_id, page_number, image_path,
  width_px, height_px, parsed_text,
  extraction_confidence, extractor_version,
  UNIQUE (document_id, page_number)
)

sections (
  id, document_id, section_id, parent_section_id, title, depth,
  UNIQUE (document_id, section_id)
)

chunks (
  id, document_id, section_id, chunk_index,
  content, content_hash,
  chunk_type ('text' | 'table' | 'figure' | 'heading'),
  bbox JSONB,
  embedding VECTOR(1536), embedder_version, extractor_version,
  confidence, flagged, flagged_reason,
  UNIQUE (document_id, chunk_index)
)
+ ivfflat (embedding vector_cosine_ops) WITH (lists = 20)
+ gin (to_tsvector('english', content))

figures (
  id, document_id, page_id, chunk_id,
  image_path, bbox JSONB, caption, figure_type, extractor_version
)

retrieval_log (
  id, query, tool_name, tool_params JSONB,
  result_chunk_ids UUID[], result_scores REAL[],
  latency_ms, created_at
)

Bbox Convention

chunks.bbox is a JSONB array of regions: [{"page": 5, "x": 0.12, "y": 0.34, "w": 0.76, "h": 0.18}, ...]. Coordinates are normalized 0..1 of the page, top-left origin. This means:

  • Bboxes survive image resizing / DPI changes for display
  • The same chunk can in principle span multiple pages (not yet exercised — current chunks are single-page)
  • Inspector overlay positioning is pure percentage math, no pixel coordinates

Figure Duplication (Intentional)

Figures appear in BOTH figures (with caption + figure_type metadata) AND chunks (as chunk_type='figure' rows with caption as content). This is deliberate: the inspector renders all chunks uniformly via the hover-overlay machinery, so figures get the same visual treatment as text chunks. The figures table is the source of truth for figure-specific metadata (caption, type); the chunks row provides the bbox + section-FK that the inspector consumes.


AI Interface Architecture

Extractor Surface (built)

The agent runs as a claude -p subprocess with --tools Read --add-dir <cache> --json-schema <ops-schema>. It reads page-NN-text.json (and page-NN.png if needed for figures) via the Read tool, returns structured JSON validated against operations.ts's Zod schema, then exits.

Why subprocess (not Anthropic SDK):

  • Bills against Max — the user's Claude CLI OAuth session is inherited. The Agent SDK explicitly does not support claude.ai login per Anthropic policy.
  • Sandboxed — each page invocation is isolated; failure of one doesn't affect others.
  • Visible — the trace is shell output, easy to debug.

For production / CI paths, swap to the Anthropic SDK with ANTHROPIC_API_KEY. The CLI subprocess is contained behind cli-client.ts; the swap is mechanical.

Hybrid Retrieval

The retrieval layer is hybrid by default — three primitives, each with a different sweet spot, exposed both as native tool-use in the production query path and as MCP servers for dev/debug from Claude Code or Claude Desktop. The agent picks. Most RAG systems are vector-only; vector-only silently fails on exact-match queries. Hybrid is the architectural bet, and the part of the IP we're underselling in this doc.

The three primitives:

PrimitiveIndexWins forLatency
lookup_section(documentId, sectionId, recursive?)Postgres unique index on (document_id, section_id)Direct rule lookup. "What does C7.6.2 say?" Sub-millisecond, deterministic. Recursive variant walks parent_section_id for "everything in C7."<10ms
fts_search(documentId, query, k)Postgres gin (to_tsvector('english', content))Keyword / exact-phrase / rule-number queries. Stemming + stop-words handled. Beats vector when the user's wording matches the doc's wording — which is more often than vector-only proponents admit.<50ms
vector_search(documentId, query, k)pgvector ivfflat cosine over text-embedding-3-small (1536-dim)Conceptual / paraphrase / synonym queries where the user's wording differs from the doc's. The "find me the rule about creativity even though the doc says 'innovation'" path.<100ms

Router architecture (D5):

The router spawns the local claude CLI as a subprocess (D12, Max-billed) with the doc-search MCP server attached over stdio and only the three tools allowed. The agent reads the user's query, picks tool(s), and emits a natural-language answer. System prompt instructs tool selection by query shape:

  • Exact rule IDs → lookup_section
  • Keyword / phrase → fts_search
  • Conceptual / paraphrase → vector_search
  • Ambiguous → run multiple in parallel; let the score ranking sort it out

Every tool call writes a row to retrieval_log (tool_name, query, tool_params, result_chunk_ids, result_scores, latency_ms, created_at). The router harvests these rows post-call within its wall-clock window and returns them as a unified hit list with source-of-result attribution — every chunk knows which index found it.

flowchart LR
    USER[User query] --> ROUTER["claude CLI subprocess<br/>(router with system prompt)"]
    ROUTER --> MCP["@autri/mcp-doc-search<br/>(stdio MCP server)"]
    MCP --> P1[lookup_section]
    MCP --> P2[fts_search]
    MCP --> P3[vector_search]
    P1 --> DB[(Postgres)]
    P2 --> DB
    P3 --> DB
    P1 -.logs.-> LOG[(retrieval_log)]
    P2 -.logs.-> LOG
    P3 -.logs.-> LOG
    DB --> ROUTER
    LOG --> ROUTER
    ROUTER --> ANSWER[Answer + hits with source attribution]

The IP angle — three differentiators most RAG systems lack:

  1. Hybrid by default. Three indexes always available, agent picks. Vector-only systems silently fail on exact-match queries (an embedder will paraphrase "C7.6.2" into something less specific). Hybrid catches the failure modes that single-index systems can't see.

  2. Source-of-result attribution. Every chunk in the result list shows which index found it (color-coded borders in the playground UI: blue for lookup_section, amber for fts_search, green for vector_search). Trust comes from legibility — if you can see why a chunk was returned, you can decide whether to trust it.

  3. The retrieval_log + inspector overlay makes the trace a UX feature, not a debug log. Users see the agent's reasoning replay as part of normal use, not as an opt-in. This is the same trust-via-visibility wedge that drives the extraction inspector — applied to retrieval.

Open question — when to ensemble: if no single index dominates a query, reciprocal rank fusion is the natural next step (combine the rankings from all three indexes weighted by 1/(rank + k)). Not built yet; flagged as an open question because we need data on agent-pick patterns first. If the agent already runs multiple primitives in parallel for ambiguous queries, RRF may be redundant — the score ordering already does this implicitly.

Open question — KB-scoped variants. Today's primitives take documentId. As the unit shifts from document → knowledge base (see § Multi-tenancy & Knowledge Bases > The Unit Shift), the primitives widen to (knowledgeBaseId, ..., documentIds?). The chunks model and pgvector index don't change; the WHERE clauses do.

Pattern symmetry with extraction (D11): the retrieval architecture mirrors the extraction architecture. The LLM emits structured tool calls referencing IDs from typed inputs; code applies the operations deterministically. Same shape on both ends of the pipeline. The chunks are the IDs the router operates on; the fragments are the IDs the extractor operates on.

Pattern Symmetry

The retrieval agent (planned) mirrors the extraction agent (built): emit structured operations / tool calls referencing IDs from typed inputs; code applies the operations deterministically. Same shape on both ends of the pipeline.


Multi-tenancy & Knowledge Bases

Captures the multi-tenancy and knowledge-base architecture for Autri. The product wedge (the visual extraction inspector) shipped in v0 as a single-tenant, document-scoped artifact; this section locks in the shape we'll grow into as Autri becomes the foundation underneath QuoteAI and other downstream products.

The architectural pivot driving this section: the right primitive is the knowledge base, not the document. Once that lands, the rest (multi-tenancy pattern, scope passing, agent + KB selection, source-type-specific ingestion) falls out cleanly.

Subsections:

  1. The unit shift: document → knowledge base. Why retrieval primitives become KB-scoped, what changes mechanically.
  2. Tenant model. The organizations / users / knowledge_bases / documents chain, why pattern-A multi-tenancy (shared DB + row-level isolation), how the operator-managed-now → customer-managed-later trajectory plays out.

Future subsections (TODO this session, after annotations on 1+2 settle): scope passing contract, agent + KB selection (hybrid model), source-type-specific ingestion, open questions on DB host (Neon vs Supabase) and auth provider.


The unit shift: document → knowledge base

Today's retrieval primitives are document-scoped. Every lookup_section, fts_search, vector_search call takes a documentId. That fits "I'm browsing this one rulebook" but breaks at the next horizon: a real organization doesn't operate at the document level, they operate at the knowledge base level — a coherent collection of documents that share access, retention, sensitivity, and indexing decisions.

A knowledge base is the natural unit because the things that vary in real systems vary KB-by-KB, not doc-by-doc:

  • Access maps to it. A user has access to a KB; whether the KB has 1 doc or 1,000 is irrelevant for permissioning.
  • Sensitivity maps to it. "Public rules" and "internal team comms" are different KBs, not different docs in the same KB. Mixing them under one ID is a leak waiting to happen.
  • Update cadence maps to it. Rules change once a year; quote history grows daily. Different update strategies apply per KB, not per doc.
  • Retrieval ergonomics map to it. Cross-doc queries within a KB are expected ("find me everything in our rulebook corpus about pit displays"). Cross-KB queries are unusual and explicit.

What changes mechanically:

The retrieval primitives shift from (documentId, ...) to (knowledgeBaseId, ..., documentIds?). The chunks model and pgvector index don't change — just the WHERE clauses. The agent's MCP tools widen accordingly: fts_search(knowledgeBaseId, query, k) searches all docs in the KB; an optional documentIds: string[] arg narrows back to specific docs when needed.

Document-scoped queries don't disappear — they're a natural special case (documentIds: [oneId]). The existing playground at /docs/[id]/query becomes /kb/[id]/query with an optional ?docs=... filter. Bookmarks keep working through a redirect from the old path.

Rejected alternatives:

  • Keep doc-as-primitive, add a multi-doc query primitive. Rejected — doubles the retrieval-primitive surface area, and "search across docs in a KB" becomes a special-case feature instead of the default. The right unit shifts because the wrong shape becomes a maintenance tax.
  • Folder-style nested KBs. Rejected for v0 — KBs are flat. Hierarchical KBs are a v2 concern when we have a customer asking; for now, free-form description on each KB handles categorization needs.
  • Workspaces above KBs. Rejected — adds a layer with no current use case. organizations provides the grouping; if "workspace" emerges as a need (e.g., a customer wants to separate prod KBs from staging KBs), revisit then.

Tenant model

Multi-tenant from day one (per the deployment-pattern-A choice — shared DB, row-level isolation, defense-in-depth via RLS). The shape:

organizations (
  id          UUID PK,
  name        TEXT NOT NULL,
  slug        TEXT UNIQUE NOT NULL,    -- for URLs
  created_at  TIMESTAMPTZ
)

users (
  id              UUID PK,
  email           TEXT UNIQUE NOT NULL,
  name            TEXT,
  organization_id UUID REFERENCES organizations(id),
  role            TEXT NOT NULL DEFAULT 'member',  -- 'admin' | 'member'
  created_at      TIMESTAMPTZ
)

knowledge_bases (
  id              UUID PK,
  organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
  name            TEXT NOT NULL,
  slug            TEXT NOT NULL,
  description     TEXT,
  created_by      UUID REFERENCES users(id),
  created_at      TIMESTAMPTZ,
  UNIQUE (organization_id, slug)
)

documents (existing, +)
  knowledge_base_id UUID NOT NULL REFERENCES knowledge_bases(id) ON DELETE CASCADE
  -- existing fields preserved (name, source_type, source_path, etc.)
flowchart TB
    ORG[organizations]
    USERS[users]
    KB[knowledge_bases]
    DOCS[documents]
    DERIVED["pages · sections · chunks · figures · retrieval_log<br/><i>tenant inherited via FK chain</i>"]

    ORG --> USERS
    ORG --> KB
    KB --> DOCS
    DOCS --> DERIVED
    USERS -.created_by.-> KB

    classDef tenant fill:#1e3a5f,stroke:#3b82f6,stroke-width:2px,color:#fff
    classDef new fill:#1e3a5f,stroke:#3b82f6,stroke-width:1px,color:#fff
    class ORG,USERS tenant
    class KB,DOCS new

FK chain is the canonical tenant identifier. Every retrieval-touched row inherits its tenant via documents.knowledge_base_id → knowledge_bases.organization_id. Joins gate every query. We deliberately do NOT denormalize tenant_id onto every table — the FK chain is the source of truth, and adding redundant columns invites drift on ingestion paths that forget to set them. We'll revisit if query-plan analysis shows the joins are too expensive (likely never at our scale).

Access model (v0): all members of an organization have read access to all KBs in that organization. Per-KB ACLs are deferred — YAGNI until a customer asks for "this KB is read-only for these users." When that happens, add a knowledge_base_acls table without touching existing rows.

Defense-in-depth: RLS as safety net. Once the DB host is decided (Neon vs Supabase, see Open Questions), enable row-level security policies that enforce organization_id = current_user.organization_id at the database layer. App-layer filtering is the primary gate; RLS is the backstop for the inevitable bug where a developer forgets a WHERE clause. Doesn't replace careful code — it caps the bug count near zero instead of unbounded.

Operator-managed → customer-managed trajectory:

PhaseWho creates the org/KBAutri admin UIAuth
v0 (now)We do, on customer's behalfInternal-onlyNone / dev
v1Customer self-signs-upCustomer sees their org's KBs onlyReal auth (Clerk / Supabase / etc.)
v2Same, plus per-KB ACLs and team managementSame, plus role-based gatingExisting

Schema doesn't change between phases. v0 → v1 is purely UI + auth-flow work. v1 → v2 adds an knowledge_base_acls table without touching any existing rows. The IP / focus argument (operator-managed first) is a productization-strategy choice, not an architectural one — schema is ready for v1 from the moment this lands.

Source-type-agnostic. A KB can contain any mix of documents.source_type values — PDFs, emails, HTML, structured records, agent-generated content (e.g., new quotes from QuoteAI). Ingestion paths fork by source type but converge into the same chunks table. (Fully treated in the future "Source-type-specific ingestion" subsection.)

Rejected alternatives:

  • Per-tenant database (Aurora-per-customer). Rejected — operational cost dominates (~$50/mo Aurora minimum × N tenants, migrations × N, monitoring × N, backups × N). pgvector ivfflat recall improves with more data, so one shared index outperforms many small ones at our scale. Becomes correct only with hard compliance requirements (HIPAA / SOC2 with data-residency) or one customer at 100x scale of the others.
  • Schema-per-tenant. Rejected — middle-ground awkward. Better isolation than shared-table at higher operational complexity than shared-DB-with-RLS. Pick one extreme or the other; the middle pays for both.
  • Denormalize tenant_id onto every table. Rejected as default — invites drift. Use the FK chain. Revisit if query-plan analysis shows join cost is real.
  • No users table; only auth provider IDs. Rejected — even if we use a third-party auth (Clerk, Supabase Auth), we still want a users row local to associate KBs with creators, track invitations, log activity. The auth provider's user ID becomes one column on users, not a replacement for the table.

Deployment & isolation tiers

Three deployment tiers, each appropriate for a different customer shape. The tier determines the database isolation, AWS account boundary, and pricing model. The application code is identical across tiers — what changes is the infrastructure config (Terraform variables) and which Postgres pool the request hits.

flowchart TB
    subgraph T1["Tier 1 — Shared Multi-tenant (default)"]
        direction LR
        T1A[Many customers] --> T1B[Hannah Labs AWS account]
        T1B --> T1C[(Shared RDS Postgres + RLS)]
    end

    subgraph T2["Tier 2 — Dedicated Database (Brehob lands here)"]
        direction LR
        T2A["Enterprise customer<br/>(e.g., Brehob — Cognito + Entra federation)"] --> T2B[Hannah Labs AWS account]
        T2B --> T2C[(Dedicated Aurora<br/>per customer)]
    end

    subgraph T3["Tier 3 — Customer-hosted (optional uplift)"]
        direction LR
        T3A["Top-tier enterprise demanding<br/>full data sovereignty"] --> T3B[Customer's AWS account<br/>via Terraform deploy]
        T3B --> T3C[(Customer's Aurora<br/>their data, their hardware)]
    end

Tier 1 — Shared multi-tenant (default)

  • Who: small / startup / free-tier customers, self-serve signups
  • Where: Hannah Labs' AWS account, shared infrastructure
  • DB: Shared RDS Postgres + pgvector with row-level security policies enforcing organization_id = current_user.organization_id
  • Compute: Shared ECS Fargate cluster, multi-tenant Next.js app
  • Cost basis: Per-org cost is fractional — RDS + ECS amortized across N customers
  • Pricing: Pay-as-you-go or low-tier subscription

Tier 2 — Dedicated database (Brehob's tier)

  • Who: mid-market / paid enterprise customers requesting data isolation. Brehob lands here under QuoteAI's PRICING-D2 ($3,500/mo + $12,500 setup).
  • Where: Hannah Labs' AWS account, customer-specific Aurora cluster
  • DB: Per-customer Aurora Serverless v2 with pgvector. tenants.dedicated_db_url column points at the cluster; app code selects pool by tenant lookup at request time.
  • Compute: Shared ECS Fargate cluster (compute is cheap; data isolation is the value)
  • Auth: Cognito user pool with customer's IdP federated in (Brehob = Microsoft Entra / Azure AD per QuoteAI INFRA-D3). Customer's users authenticate against their own corporate identity; Cognito issues session tokens; ~$0/mo at expected scale.
  • Cost basis: Aurora cluster is the major variable cost. AWS run cost validated at ~$771/mo mid-scenario for Brehob shape (Bedrock chat + drafting + embeddings + RDS + Fargate + supporting services).
  • Pricing: ~$3,500/mo subscription = ×4.5 multiple at validated cost. Setup fee $12,500 ($5K base + $7,500 Phase 1 corpus ingestion at ~$1.50/doc).

Tier 3 — Customer-hosted (optional sovereignty uplift)

  • Who: top-tier enterprise demanding full data sovereignty (regulated industries, customers with strict data-residency clauses, or specific procurement requirements). Optional upgrade from Tier 2 when justified.
  • Where: Customer's AWS account, deployed via shared Terraform modules
  • DB: Customer's Aurora cluster, in customer's VPC, customer-managed backups
  • Compute: Customer's ECS Fargate. Customer's Bedrock quotas. Customer's Cognito (or federated to ours).
  • Cost basis: Customer pays AWS direct; Hannah Labs bills for software license + implementation + ongoing support contract
  • Pricing: Highest tier — software license + implementation services + ongoing support. Brehob has the option to migrate here later if their procurement / security review demands it; we don't lead with it.

Tier escalation triggers:

TriggerMove to
Customer asks "where is our data?" with concern in their voiceTier 2
Customer's procurement asks about SOC 2 / ISO 27001 / data residencyTier 2 or 3
Customer wants their data in their AWS account, full stopTier 3
Customer is regulated (HIPAA, financial, defense)Tier 3 (often required)

Bedrock as the model provider in production:

Production runtime uses AWS Bedrock for Claude calls (better uptime than direct Anthropic API, AWS-native billing, regional failover, data-residency guarantees). Dev keeps the claude CLI subprocess pattern (D12, Max-billed) for local iteration; the abstraction in cli-client.ts makes the dev → prod swap mechanical.

Bedrock model lag (Anthropic ships Claude updates direct API first, Bedrock catches up days to weeks later) is the only real downside. Mitigation: pin models to specific versions in prod; upgrade on a schedule. Y1 must-ships per QuoteAI cost validation: prompt caching (~30-50% chat reduction), sliding window (~20 turns), daily Bedrock budget alarm.

Y1.5 cost optimization — validated headroom: A/B Haiku-rerank vs Sonnet on a 15-20 query panel post-trip. If Haiku passes quality bar, swap retrieval/rerank to Haiku and unlock margin headroom for high-usage chat scenarios.

Terraform-managed infrastructure:

All three tiers share the same Terraform module library:

infra/
├── modules/
│   ├── autri-stack/           # ECS + RDS/Aurora + S3 + Bedrock policies
│   ├── tenant-database/       # Per-tenant Aurora (Tier 2/3)
│   └── monitoring/            # CloudWatch + alarms
└── deployments/
    ├── shared/                # Tier 1 — multi-tenant
    ├── tenant-{name}/         # Tier 2 — per-customer in Hannah Labs AWS account
    └── customer-{name}/       # Tier 3 — config in customer's AWS account

Same modules, different config. Adding a Tier-2 customer = clone tenant-template/, update variables, terraform apply. Tier 3 = same idea, deployed into the customer's AWS account instead. Reproducible, version-controlled, fast.

Rejected alternatives:

  • All customers on dedicated DBs from day 1. Rejected — operational cost dominates, kills startup-tier economics. Per-tenant DB is the value-added tier, not the default.
  • All customers customer-hosted. Rejected — most customers don't have AWS expertise / don't want operational responsibility. Self-managed is a feature for those who specifically want it.
  • No shared deployment at all (every customer is its own AWS account). Rejected — adds bureaucracy and infra cost where most customers want managed simplicity.
  • Multi-cloud (Azure + GCP) from day 0. Rejected — AWS-only is sufficient for the foreseeable customer pipeline; multi-cloud is a productization concern that adds 3× infra complexity. Add when a deal demands it.
  • Vercel + Neon for prod. Rejected — fine for early SaaS but doesn't pass enterprise procurement (no isolation tier, no AWS-native deploy option). Autri needs to be sellable to AWS-native enterprises from day 0.

Document versioning & supersession

Real-world corpora aren't static. The FIA Technical Regulations get a 2026 version, then a 2026-W42 update, then a 2027 version. Quote-history KBs grow daily but with each quote being its own logical doc (no supersession needed). The schema needs to handle both shapes — versioned logical docs AND single-version-only docs — without forcing one model on every corpus.

Three things users want:

  1. Default to the latest — most queries should hit current rules, not historical drafts.
  2. Pin to a specific version — "what does the 2025 version say about wing dimensions?"
  3. Diff between versions — "what changed between v1 and v2?"

The shape that supports all three without re-extracting on every update:

logical_documents (
  id                          UUID PK,
  knowledge_base_id           UUID NOT NULL REFERENCES knowledge_bases(id) ON DELETE CASCADE,
  name                        TEXT NOT NULL,                  -- 'FIA Technical Regulations'
  slug                        TEXT NOT NULL,                  -- 'fia-technical-regs'
  current_version_document_id UUID REFERENCES documents(id),  -- pointer to current version
  created_at                  TIMESTAMPTZ,
  UNIQUE (knowledge_base_id, slug)
)

documents (existing, +)
  logical_document_id UUID NULL REFERENCES logical_documents(id),
  version_label       TEXT,             -- '2026', '2026-W42 update', '2027 draft', or NULL
  superseded_at       TIMESTAMPTZ NULL  -- when a newer version replaced this one
flowchart TB
    KB[knowledge_bases]
    LD["logical_documents<br/><i>FIA Technical Regs</i>"]
    D1["documents v1<br/>2025 edition<br/><i>superseded_at: 2026-01-01</i>"]
    D2["documents v2<br/>2026 edition<br/><i>superseded_at: 2026-10-15</i>"]
    D3["documents v3<br/>2026-W42 update<br/><i>superseded_at: NULL (current)</i>"]

    KB --> LD
    LD --> D1
    LD --> D2
    LD --> D3
    LD -.current_version_document_id.-> D3

    classDef current fill:#1e3a5f,stroke:#3b82f6,stroke-width:2px,color:#fff
    class D3 current

Single-version docs (e.g., one quote in a quote-history KB) skip logical_documents entirely — logical_document_id is nullable, so the existing model just keeps working. Versioning is opt-in: a doc gets a logical_document only when versioning makes sense for that source type.

Default behavior — latest only. Retrieval primitives default-filter WHERE superseded_at IS NULL. Users always query the current version unless they explicitly say otherwise:

// Default: scope to current versions only
fts_search({ knowledgeBaseId, query, k })

// Explicit: scope to a specific version
fts_search({ knowledgeBaseId, query, k, documentIds: [v1Id] })

// Explicit: include superseded versions
fts_search({ knowledgeBaseId, query, k, includeSuperseded: true })

Diff workflows fall out as a thin layer. No new primitives needed; existing ones compose:

  • Section-level diff (lookup_section against v1 vs v2): trivially shows additions/removals/changes by section_id.
  • Semantic-level diff: vector_search against v1's chunks for each top-k hit; for each, find the cosine-nearest chunk in v2. Chunks with no good match are "added" or "removed." Cosine threshold + manual review for v0; tighten heuristics later.

Agent visibility into versions.

The agent needs to know multiple versions exist while it's retrieving, not after the user complains about a stale answer. Two surfaces:

Per-chunk version metadata in every retrieval result. Every chunk returned from lookup_section / fts_search / vector_search carries a small version envelope alongside the existing chunk fields:

{
  chunk: { /* existing fields */ },
  version: {
    label: '2026-W42 update',     // human label (free-form)
    index: 3,                      // 1-indexed by created_at within logical_document_id
    is_current: true,              // superseded_at IS NULL
    logical_document_name: 'FIA Technical Regulations',
  } | null   // null when document has no logical_document_id (single-version-only)
}

The join is chunks → documents → logical_documents plus a windowed ROW_NUMBER() OVER (PARTITION BY logical_document_id ORDER BY created_at) — essentially free at our scale. The agent never accidentally treats a stale chunk as current because every result is annotated with its place in the version sequence. Single-version-only documents send version: null so consumers can branch trivially.

A list_documents(knowledgeBaseId) MCP tool that returns the version tree:

list_documents({ knowledgeBaseId }) => {
  logical_documents: Array<{
    id: string,
    name: string,
    versions: Array<{ document_id: string, label: string, index: number, is_current: boolean }>
  }>,
  unversioned_documents: Array<{ id: string, title: string }>,
}

Pull-based — the agent calls when the query smells version-scoped ("what changed", "in last year's regs", "compare versions"). Don't pollute the system prompt with version trees by default; it scales badly past a few logical_documents and most queries don't need it.

Why both label (text) and index (int): label preserves how the org names versions (FIA uses "2026 + Update 1 + Update 2", not v1/v2/v3); index gives the agent a normalized chronological position for "is this newer than that" logic without forcing semver onto asymmetric versioning.

Ingestion path with versioning:

When a customer uploads a new version of an existing logical document:

  1. Ingest as a new documents row with its own chunks (full pipeline — render, parse, extract, embed).
  2. Set documents.logical_document_id to the existing logical document.
  3. Atomically: set the previous current version's superseded_at = now(), update logical_documents.current_version_document_id to the new row.
  4. The previous version's chunks stay in the index — semantic search still finds them when explicitly scoped.

Idempotency on source_hash still applies — uploading the exact same file again is a no-op (matches the existing documents row, no new version created).

Detecting that an upload is a new version.

Step (2) above assumes we know which logical_document to attach to. Inferring it silently is a bad-stakes failure: a false positive buries a new doc under an unrelated lineage; a false negative creates two parallel logical_documents that should have been one.

The pattern: score, suggest, never silently group. On upload, score the new file against every logical_document already in the destination KB. Surface the top match in the inspector before finalize. The user confirms or rejects.

Three signals stacked into the score:

  1. Filename similarity after normalizing year/version tokens. FIA-tech-regs-2026.pdfFIA-tech-regs-2025.pdf reads as the same lineage once 2026 / 2025 is collapsed to a placeholder. Cheapest signal, surprisingly strong in practice. Year tokens (19\d\d, 20\d\d), vN / V\d, ISO dates (2026-W42), and explicit "draft" / "final" / "update" markers all normalize to a single <VER> placeholder before string comparison (Jaro-Winkler or token-set ratio).
  2. PDF title metadata — the /Title entry in the PDF info dictionary. Often filled with the doc's canonical name independent of filename. (Per source type the analog is: xlsx workbook properties, email Subject normalized, HTML <title> tag.)
  3. Structural overlap — once first-pass extraction has run, the section_id set intersection (C7, C7.1, …) is a strong signal that two docs are versions of the same lineage. Compute Jaccard over section_id sets. For non-PDF source types: header-row overlap (xlsx), thread-id (email), URL path stem (HTML).

If the top score crosses a threshold (tunable; start ~0.7 weighted average — filename 0.5, title 0.2, structure 0.3) the inspector shows: "Looks like a new version of FIA Technical Regulations. Link as version, or treat as new doc?" Below threshold: defaults to a new logical_document; user can manually link later in the inspector.

Subtle vs big content changes don't matter for the grouping decision. Whether the diff is one paragraph or a complete rewrite, it's still "version 3 of the FIA regs" if it's the same lineage. The change-magnitude question is downstream — it informs whether to copy unchanged chunks forward or re-extract everything (a v1 optimization on top of chunks.content_hash).

V0 build: filename-only signal with the year/version normalizer, threshold tuning by hand. Title-metadata and structural-overlap signals layer in once we have a few corpora to tune against. The score → suggestion UX shape is settled now so we don't paint ourselves into a corner.

Why not delete superseded chunks:

  • Diff workflows need the historical content live in the index.
  • pgvector ivfflat is append-friendly; deletes don't reclaim space efficiently.
  • Storage cost is tiny relative to the value of "what changed" queries.
  • If retention becomes a concern (compliance, GDPR), add a documents.purge_after column and a scheduled job — not v0.

Cost & timing:

ComponentWhen
Schema migration (add logical_documents + 3 columns on documents)NOW, as part of the multi-tenancy migration. Zero marginal cost while we're already touching documents. All new fields nullable — existing single-version docs work unchanged.
Default superseded_at IS NULL filter in retrieval primitivesNOW. Trivial WHERE-clause addition, immediately useful even without UI.
Per-chunk version envelope on retrieval resultsNOW. Same join pass, additive contract.
list_documents MCP toolNOW. Tiny tool, one query.
Filename-only version-detection on uploadWhen upload UI lands (v1). Defer until we have a real upload flow to attach to.
Title-metadata + structural-overlap detection signalsv1+ — tune once we have multiple real corpora.
Version-picker UI on doc cardsv1 — when a customer (probably the FIA case) demands it.
Diff view (section-level + semantic)v1 — same trigger.
Auto-supersede on uploadv1 — needs upload UI anyway.

Schema-first, features-when-needed. The schema additions are zero-cost now; the UX features wait for real demand.

Rejected alternatives:

  • Single documents row with version chain via previous_version_id. Rejected — awkward to query (recursive CTE for "find the latest"), no clear "logical document" handle for the user-facing concept ("the FIA Regs" as opposed to "the FIA Regs v3"). The two-table model maps cleanly to user mental model.
  • Delete superseded chunks. Rejected — kills diff workflows, doesn't save much storage, can't be undone if a customer asks "what did v2 say in section 7?"
  • Treat each version as a totally separate document. Rejected — that's where we are today, and the FIA case is the explicit demonstration that it's wrong. Users want "the doc" with a version history, not "N separate docs that happen to share a name."
  • Force versioning on all docs. Rejected — adds bureaucracy for source types that don't need it (each quote in QuoteAI is its own thing, not a version of a parent quote). Versioning is opt-in via logical_document_id.
  • Silently auto-link uploads to the closest existing logical_document above some threshold. Rejected — false-positive cost (a new doc gets buried as a version of an unrelated lineage) is worse than the friction of one confirmation click. Always suggest, always confirm.
  • Bake the full version tree into the system prompt for every chat. Rejected — scales badly past a handful of logical_documents, costs tokens on every turn even when the query has nothing to do with versioning. Prefer per-chunk version envelopes (annotated where it matters) plus the pull-based list_documents tool.

Scope passing contract

Every retrieval request needs a scope — the set of knowledge bases the requesting user is allowed to see. The scope is derived from auth at the edge, propagated through every layer as a typed contract, and re-enforced at the SQL layer as defense-in-depth. The contract is allowedKnowledgeBaseIds: string[] — flowing from auth through to the agent's MCP tools.

flowchart LR
    AUTH["Auth provider<br/><i>user_id, organization_id</i>"]
    RESOLVER["KB scope resolver<br/><i>SELECT id FROM knowledge_bases<br/>WHERE organization_id = $1</i>"]
    ACTION["Server action<br/><i>receives allowedKnowledgeBaseIds<br/>from middleware</i>"]
    PROMPT["System prompt template<br/><i>KB list interpolated</i>"]
    ROUTER["claude CLI router<br/><i>invokes MCP tools</i>"]
    MCP["MCP doc-search tools<br/><i>SQL filter: kb_id IN (allowed_set)</i>"]
    DB[(Postgres + RLS)]

    AUTH --> RESOLVER --> ACTION --> PROMPT --> ROUTER --> MCP --> DB

Layer responsibilities:

LayerWhat it doesWhat goes wrong if it fails
Auth providerIdentifies the user + their orgFails closed — no auth means no scope means empty result
KB scope resolverTranslates user → allowed KB IDs (incl. future per-KB ACLs)Bug here = user gets wrong KBs (the dangerous case)
Server actionReceives scope from auth context, never trusts client-supplied IDsClient-side scope spoofing
System prompt templateTells the agent what it can searchAgent asks for a KB it can't access — gets empty result, looks confusing but isn't a leak
MCP toolsSQL filtering on kb_id IN (allowed_set)The backstop — even if every layer above is buggy, no chunk leaks unless RLS also fails
Postgres RLSDB-layer enforcement of organization_id = current_user.organization_idLast line of defense

The system prompt template has KB scope baked in at request time:

You are a retrieval router for the Autri corpus.

You have access to these knowledge bases:
- {kb_1.name} ({kb_1.id}): {kb_1.description}
- {kb_2.name} ({kb_2.id}): {kb_2.description}
- ...

You can ONLY search within these KBs. If the user's query implies they
want a different one, suggest expanding scope via list_knowledge_bases.

[rest of the existing router instructions]

The agent sees a static list and picks. No round-trip to discover allowed KBs unless the user query explicitly implies "search elsewhere" (handled by the list_knowledge_bases tool — see § Agent + KB Selection).

Why client never sets allowedKnowledgeBaseIds:

Server actions and API routes derive scope from auth context, ignore any client-supplied scope, and pass the auth-derived scope to the router. The agent sees only what auth says it can see. A malicious client could send allowedKnowledgeBaseIds: [some-other-orgs-kb] but the server discards that field and reconstructs from auth.

Server-action contract (typed):

// Wrong (DO NOT DO THIS):
async function runQuery(query: string, allowedKnowledgeBaseIds: string[]) { ... }

// Right:
async function runQuery(query: string, scope?: { documentIds?: string[] }) {
  const session = await getServerSession();              // from auth provider
  const allowedKnowledgeBaseIds =
    await resolveAllowedKBs(session);                    // derived, NOT passed
  const scopeFiltered =
    filterScopeByAllowed(scope, allowedKnowledgeBaseIds);
  return route(pool, {
    query,
    allowedKnowledgeBaseIds,
    scope: scopeFiltered,
  });
}

The scope parameter is the user's intent (which KBs they want to search within their allowed set, optionally narrowed to specific docs). The allowedKnowledgeBaseIds is the permission (what auth says they can see). The agent gets the intersection.

Rejected alternatives:

  • Trust the client to send scope. Rejected — that's how data leaks in B2B SaaS. The client says what it wants; auth says what it can have.
  • Skip the MCP-tool-level validation; rely on app filtering. Rejected — defense-in-depth principle. App layer is the primary gate; MCP-tool layer is the backstop catching the inevitable bug.
  • No system-prompt KB list; agent always calls list_knowledge_bases first. Rejected — extra round-trip for every query, and the agent's choices are noisier without explicit context. The hybrid approach (list in prompt + tool for expansion) is faster and more legible.

Agent + KB selection

The agent operates within a scope of knowledge bases. The hybrid model: user sets explicit scope via the UI; agent can suggest expanding scope via the list_knowledge_bases MCP tool when the query implies cross-KB reasoning. User-set scope is the default; agent expansion is opt-in per query and always confirmed with the user before crossing.

flowchart TB
    UI["User in chat UI<br/><i>scope picker: single KB / multiple / all my KBs</i>"]
    AUTH["Server: auth resolves allowed KBs"]
    INTER["Intersect (user-selected ∩ auth-allowed)"]
    PROMPT["System prompt template<br/><i>KB list interpolated</i>"]
    AGENT["Agent — picks tool(s) within scope<br/><i>lookup_section · fts_search · vector_search</i>"]
    LIST["list_knowledge_bases tool<br/><i>opt-in — agent calls only when query implies cross-KB</i>"]

    UI --> INTER
    AUTH --> INTER
    INTER --> PROMPT --> AGENT
    AGENT -.expansion path.-> LIST
    LIST --> AGENT

Why hybrid (vs alternatives):

ModelProsCons
Pure agent-routes (every query starts with list_knowledge_bases)Most flexibleRound-trip cost on every query; user can't see what's being searched without inspection; mis-routing risk
Per-KB chat (one chat per KB)Explicit, simpleForces context-switching; agent can't combine across KBs even when useful
Hybrid (chosen)Explicit user intent + agent intelligence; fast common case; cross-KB still possibleSlightly more UI complexity (scope picker)

UI shape:

The chat interface has a scope picker — a sidebar control that lets the user select:

  • A single KB ("rules KB" only)
  • Multiple KBs ("rules KB" + "team comms KB")
  • All my KBs (everything I have access to)

The scope picker shows a count next to each KB name and description (e.g., rules KB · 2 docs · 1,272 chunks) so users see the size of what they're querying. Default selection is whatever they had last for that workspace; first-time users get "All my KBs."

The list_knowledge_bases tool:

Purpose: let the agent suggest expanding scope when the user's query implies cross-KB reasoning. Returns only KBs the user has permission to see.

list_knowledge_bases() => Array<{
  id: string;
  name: string;
  description: string;
  doc_count: number;
  current_scope: boolean;  // true if currently in user's selected scope
}>

System-prompt instruction: "If the user's query suggests they want to search a KB outside the current scope, call list_knowledge_bases to see what's available, then ask the user if they want to expand."

Example interaction:

User: "Compare what the rules say about pit displays with what our team discussed in the planning emails."

Agent (scope: rules KB only): Calls list_knowledge_bases, sees "team-emails KB" available.

"I can search the rules KB for pit display content. I notice you also have a team-emails KB — would you like me to search both?"

The agent never silently expands scope; it always asks. This keeps users in control and avoids surprise cross-KB queries that mix sensitive corpora.

Permission boundary:

list_knowledge_bases returns only KBs the user has access to (filtered by auth at the SQL layer). The agent can't suggest a KB the user can't access — that path simply doesn't exist in its world. This is enforced as part of the scope-passing contract (§ above).

KB descriptions matter. The agent's tool-selection accuracy depends on KB descriptions being clear about what's in each KB. "Internal team comms" vs "World Finals 2026 regulations" gives the agent enough signal to suggest expansion intelligently. We surface this in the KB-creation UI as required-on-create with a placeholder "what kind of content is in this KB? (e.g., 'rules and regulations', 'customer quote history', 'engineering specs')."

Rejected alternatives:

  • Bake KB list into a static system prompt file (.md). Rejected — goes stale the moment a KB is created. The system prompt is a template the server interpolates per request, not a static file.
  • Make every primitive call list_knowledge_bases first. Rejected — extra round-trip for every query in the common case, no benefit when the user has explicit intent.
  • Show the agent ALL KBs in the org regardless of UI scope. Rejected — defeats the user's explicit scope choice. The scope picker exists because users want to know what's being searched.
  • Auto-expand without asking when query confidence is high. Rejected — surprises users, mixes sensitive corpora, and gives no obvious revert. Always ask.

Source-type-specific ingestion

The chunks model is universal. The ingestion path forks by source type. documents.source_type is the dispatch key — already in the schema, used today to distinguish (e.g.) 'pdf' from future types. Each source type has its own ingestion module producing the same shape: documents row + chunks (+ optional sections / figures / pages).

flowchart TB
    PDF["PDF source"]
    XLSX["Excel / spreadsheet (xlsx)"]
    EMAIL["Email (MIME)"]
    HTML["Web page"]
    QUOTE["Quote (structured record from QuoteAI)"]
    TEXT["Plain text / markdown"]

    PDF --> P_PDF["render → parse → extract → embed<br/><i>existing pipeline (D10/D11)</i>"]
    XLSX --> P_XLSX["sheetjs/exceljs parse → table-block detect →<br/>row-as-chunk w/ header context → embed"]
    EMAIL --> P_EMAIL["MIME parse → header extraction →<br/>body chunking → embed"]
    HTML --> P_HTML["scrape → readability → chunk → embed"]
    QUOTE --> P_QUOTE["structured field map → chunk → embed<br/><i>(synchronous, ~1s)</i>"]
    TEXT --> P_TEXT["heading split → chunk → embed"]

    P_PDF --> CHUNKS[chunks table<br/><i>universal model</i>]
    P_XLSX --> CHUNKS
    P_EMAIL --> CHUNKS
    P_HTML --> CHUNKS
    P_QUOTE --> CHUNKS
    P_TEXT --> CHUNKS

Per-source-type paths:

source_typeIngestion shapeWhen it runsChunks shape
pdfrender → parse text-layer → vision-extract w/ ops → commit (D10/D11)Batch CLI: pnpm ingest extracttext / table / figure / heading
xlsxsheetjs parse → per-sheet section → table-block detection → row-as-chunk with header row baked into embedding textInline (upload) or batch (folder sync)text / table; bbox = sheet name + cell range (e.g., Sheet1!A3:F12)
emailMIME parse → split headers vs body → optional thread-aware chunkingInline (when received) or batch (mailbox sync)text chunks per paragraph; metadata chunk for headers
htmlfetch → readability extraction → DOM-aware chunk splitInline (when URL added)text chunks; figures from <img>
quote (QuoteAI)structured field → flatten to text → chunkInline at quote-creation time (~1s)structured chunk type per field group
text / markdownheading split → paragraph chunk → embedInline or batchtext / heading

Excel ingestion specifically (big one for QuoteAI):

Quote history, customer DBs, pricing matrices, and engineering specs all routinely live in spreadsheets. The xlsx ingestion module:

  1. Reads with sheetjs (xlsx package) or exceljs — both Node-native, no shell-out needed.
  2. Each sheet → sections row (section_id = sheet name).
  3. Detects "table blocks" (contiguous header row + data rows). Most spreadsheets have a clear header row at row 1 or 2; detect by looking for the first row whose cell values are all non-numeric / consistent type.
  4. Each data row → one chunks row, with header row context baked into the embedding text. Critical for recall: searching "find quotes for industrial robot arm controllers" should match a row that has Customer: ABC, Item: PLC controller for robotic arm even though the literal phrase doesn't appear. Headers as context = much better embedding alignment.
  5. bbox for each chunk = {sheet: "Sheet1", range: "A3:F3"} — same shape concept as PDF bbox, different coordinate space.
  6. Inspector renders xlsx chunks with a sheet-region view: render the spreadsheet with a highlighted cell range when a chunk is hovered.

Anthropic ships an xlsx skill we can lean on for the parsing layer if we don't want to write it from scratch — saves a few hours.

Common contract — what every source-type ingestion produces:

  1. A documents row (with source_type set, knowledge_base_id set, logical_document_id set if versioning applies).
  2. Optional pages rows (only for PDFs and other paginated sources).
  3. sections rows where structure exists (PDF section IDs, email subject + thread, HTML headings, structured-record field groups, xlsx sheets).
  4. chunks rows with content, chunk_type, bbox (where applicable, NULL for non-spatial sources), content_hash.
  5. Optional figures rows for image-bearing sources.

The retrieval primitives don't care which path produced the chunks — they query against the same chunks table. Whoever wires up a new source type writes the ingestion module; no retrieval changes are needed.

Inline vs batch:

  • Inline ingestion for small, real-time sources: a new email arrives, a quote is created, a URL is added, a single xlsx is uploaded. The user expects "ingested + searchable" within a couple seconds. Synchronous in the request handler is fine for ~1MB of input.
  • Batch ingestion for large, scheduled sources: the existing PDF CLI pipeline, multi-sheet xlsx with thousands of rows, full mailbox sync. Run via pnpm ingest, takes minutes to hours for big corpora.

Both produce the same chunks shape; only the orchestration differs.

Idempotency and incremental updates:

Every ingestion path computes documents.source_hash from the canonical content (file bytes for PDF, normalized MIME for email, sheet-and-row content for xlsx, etc.). Re-ingesting the same source is a no-op (matches existing row). Updates create a new documents row + new chunks; if versioning applies (see § Document versioning & supersession), the old version is superseded.

Per-chunk content_hash keeps the embed step idempotent: if we re-extract a doc but most chunks are unchanged, only the changed chunks need re-embedding. Saves cost on large corpora with small edits — especially relevant for xlsx where customers update one cell and re-upload.

Bbox handling — non-spatial sources:

chunks.bbox is NULL for sources without spatial layout (emails, plain text, structured records). For xlsx, bbox is the sheet name + cell range — different coordinate space than PDF (which uses normalized 0..1 page coords) but the inspector handles both. The retrieval primitives don't care.

The QuoteAI ingestion path specifically (because it's the second pilot):

Quotes arrive as structured records from QuoteAI's existing flow — fields like customer info, line items, pricing, terms, internal notes. The ingestion module:

  1. Receives the structured record at quote-creation time (synchronous in the create-quote handler).
  2. Maps each field group (customer info, line items, terms, notes) to a sections row — section_id is the field-group name.
  3. Flattens each field group's content into a chunks row (chunk_type = structured).
  4. Embeds inline. Total time: ~1s for a typical quote.
  5. Indexed and searchable before the create-quote response returns to the user.

This pattern unblocks "search past quotes for similar customer asks" — the retrieval primitives Just Work against the QuoteAI quote-history KB the same way they work against the STEM Racing rules KB. Different source type, same chunks model, same retrieval surface.

Rejected alternatives:

  • Universal ingestion pipeline that takes all source types. Rejected — different sources need fundamentally different processing (PDFs need vision; emails need MIME parsing; xlsx needs cell-aware structure detection; quotes need field mapping). One pipeline-with-many-branches becomes a mess. Better: many small focused modules with a common output contract.
  • Ingest everything as text only (drop structure). Rejected — kills the inspector wedge. We want section structure, page anchoring, figure detection, sheet/cell awareness. Source-specific paths preserve as much structure as the source has.
  • Always run via batch jobs. Rejected — quote-history KBs and single-file uploads need real-time updates. Batch-only forces a worker queue + polling for "is my quote searchable yet" UX.
  • Always run inline. Rejected — large PDF corpora and multi-sheet xlsx with thousands of rows take minutes; can't synchronous-block a user upload for that.
  • Render xlsx → PDF → use existing PDF pipeline. Rejected — destroys cell structure (rows become "lines of text," columns become "spaces"), kills row-aware retrieval. Worth it only if the xlsx pipeline becomes too expensive to maintain; not yet.

Y1 Launch Plan

Captured 2026-05-07. The path from working spike to chargeable SaaS, with the unit economics and tier shape that go with it. The architecture is settled (D13–D17, § Multi-tenancy & Knowledge Bases above); this section is the go-to-market plan that those design decisions enable.

MVP threshold — five hard blockers

In dependency order. None are skippable; each gates the next.

#BlockerWhat it isEffort
1AuthCognito (per QuoteAI INFRA-D3) with email + Google OAuth federation, session middleware that derives allowedKnowledgeBaseIds from the request. Today any URL grants any KB — must close before charging.~1 wk
2Multi-tenancy enforcementRLS policies on Postgres + the scope-passing contract from § Scope passing contract above. Schema is in (D13); the gates aren't.~3-4 days
3Self-serve ingestionFile upload UI → background job (Inngest or Trigger.dev) → progress reporting. Today docs only land via pnpm ingest extract CLI.~1-1.5 wk
4AWS-native deploy (D16)ECS Fargate + RDS + S3 + Bedrock + Cognito + CloudFront + budget alarms. Coordinated with QuoteAI's deploy so Terraform modules are shared.~2-3 wk
5Billing + tier gatingStripe subscriptions, subscriptions table on organizations, middleware that blocks ingestion past quota.~1 wk

Total: ~8-10 weeks focused, sequenced. #1 and #2 land first because everything else assumes them.

Soft blockers — needed before charging gracefully

  • Source-type expansion. PDF-only is a TAM cap. xlsx (huge for QuoteAI overlap, half-built in QuoteAI's stack) + plain text/markdown + URL/HTML. ~1-2 wk for the three cheap formats. See § Source-type-specific ingestion.
  • Brand identity. D1 still treats "Lorebase" as a placeholder. Resolved 2026-05-15: D1 settled on Autri. Domain + landing-page polish remains pre-launch work.

Pricing tiers and unit economics

What we actually meter

Doc count alone is too coarse — a 1-page memo and a 500-page regulation cost wildly different things to ingest and store. Three axes, each tied to a real cost driver:

AxisWhat it capsCost driverWhy this axis
KB countNumber of knowledge_bases per organizationNone directly — pure UXEncourages organization; not a cost lever
Total chunks storedSum of chunks rows across all KBs in the orgStorage (recurring, low) + index sizeLong-tail recurring cost; chunks ≈ 5 per page typically
Monthly pages ingestedSum of pages rows created in calendar monthVision extraction via Bedrock (~$0.001/page with prompt caching)This is where money burns — vision API is the expensive call

Web chat query volume is also metered (router cost ≈ $0.005/query) but only on lower tiers; MCP-driven queries are effectively free (see § The MCP economic shift below) so we don't meter them.

Tier ladder

TierPriceKBsTotal chunksMonthly pagesWeb queriesMCPNotes
Free$015,0005050/moWeb-only. The funnel.
Personal$20/mo350,000500500/mo✅ (1 client)Lone-pro tier.
Pro$50/mo (or $39/mo annual)10500,0005,0002,000/mo✅ (multi-client)Power user; multi-KB; version history (D15).
Enterprise$3.5k+/mo + $12.5k setupunlimitedunlimitedcustomunlimitedDedicated Aurora (D13 Tier 2), Cognito + Entra federation, per-tenant KMS. Brehob shape.

Cost & margin per tier

Numbers below assume Bedrock prompt caching shipped (Y1 must-ship per QuoteAI cost validation), Haiku for vision extraction, OpenAI text-embedding-3-small at $0.02/1M tokens. All gross margins — net needs Stripe fees (~3%), CAC, support, overhead removed.

TierRevenueFirst-month costSteady-state costSteady marginMargin %
Free$0~$1 ingest + $5 infra ≈ $6~$5/mo-$5/mon/a (loss leader)
Personal$20/mo~$10 + $1 q + $5 infra ≈ $16~$5-7/mo$13-15/mo~70%
Pro$50/mo~$100 + $5 q + $10 infra ≈ $115~$15-20/mo$30-35/mo~65%
Enterprise$3.5k/mo$771/mo (per QuoteAI cost validation)$771/mo~$2.7k/mo~78%

Cost-shape risks

  1. Pro first-month is a loss. ~$115 cost vs $50 revenue. A Pro user who churns at 30 days costs us ~$65. Mitigations: (a) push annual plan with discount that locks the recoup window, (b) raise Pro to $79-99/mo monthly with $39/mo annual, or (c) gate bulk-import behind a step that prevents casual Pro sign-ups dumping a 1000-doc historical archive. Lean (a)+(c).
  2. Vision-extraction cost is the biggest uncertainty. $0.10/100-page-doc assumes Haiku + aggressive prompt caching. If reality is closer to $0.50-1 (vision is hard to cache page-by-page), Personal tier inverts and Pro thins. Y1 must-ship items per cost validation: prompt caching, sliding window, daily Bedrock budget alarm.
  3. Free-tier loss is acceptable if conversion >5%. $5/mo × 100 free users = $500/mo loss, recouped by ~25 paying Personal users. The MCP gate is the conversion lever — free can't plug Autri into Claude Code, that's the upgrade trigger.

One-time overage fees

Tied to monthly pages ingested (the actual cost driver) — pay-as-you-go for the resource we're paying for. Avoids forcing tier upgrades for one-off spikes (e.g., "I just need to load this archive of 2000 pages once") and reduces refund pressure when someone over-ingests.

Add-onPriceResets
+500 pages this month$10One-time, calendar month
+5000 pages this month$50One-time, calendar month
+500 pages permanent capacity$25Persists; raises monthly base

Web chat query overages and chunk-storage overages are not sold as one-offs — those grow proportionally with ingestion, so the ingestion add-on covers them implicitly.

The MCP economic shift

The biggest economic insight from this design pass: when a user is querying via their own Claude (Claude Code, Claude Desktop, Cursor, etc.) over MCP, our cost per query collapses by 10-50×.

PathComponents per queryCost
In-app web chatRouter LLM (Haiku, 2-3 turns) + embedding + Postgres + bandwidth~$0.005/query
MCP via user's ClaudeEmbedding (only if vector_search called) + Postgres + bandwidth~$0.0001-0.0005/query

The user's Claude subscription bears the LLM cost; we're just serving the typed retrieval primitives. This is the economic argument for MCP being a paid-tier-only feature: it's the highest-value workflow for the user and the lowest-marginal-cost workflow for us.

For the Pro tier in particular, a customer who uses MCP heavily is essentially pure margin past the steady-state infra cost. A Pro user costing us ~$15-20/mo at the in-app-chat-only path drops to ~$5-10/mo when they're MCP-dominant. That's the segment to optimize for.

Implementation work: hosted MCP server (HTTPS+SSE, not stdio), OAuth 2.0 between MCP client and Autri, per-token scope enforcement. Foundry's E12 (MCP Authorization) solved this exact problem — we lift its design wholesale rather than re-deriving. ~2 wk implementation if we lean on E12 patterns.

KB-size scaling notes

pgvector index sizing

The chunks.embedding index uses ivfflat with lists parameter. Optimal lists ≈ sqrt(N):

  • 656 chunks (today, single-tenant spike) → lists=20 — fine
  • 50k chunks (Personal tier max) → lists=224 — needs index rebuild
  • 500k chunks (Pro tier max per KB) → lists=707 — needs index rebuild

Mitigation: schedule periodic index rebuilds when a KB crosses ~10×, ~100×, etc. its initial size. Or move to HNSW once we're on Postgres 16 + pgvector 0.7+ (better recall at scale, no lists tuning).

Pro/Enterprise inflection at ~500k chunks/KB

Shared-DB performance (Pattern A from § Tenant model) starts to degrade past ~500k chunks per KB on commodity RDS sizes. That's the natural pressure point pushing customers from Pro to Enterprise:

  • Below 500k chunks/KB: Pattern A shared DB is fine, sub-100ms queries, $50/mo Pro tier covers it.
  • Above 500k chunks/KB: Pattern B dedicated Aurora per tenant is the answer — better tail latency, bigger index, customer-isolated load. Enterprise pricing covers the dedicated infra (per QuoteAI cost validation: $771/mo mid).

This means the Pro tier's chunk cap is a real ceiling, not a marketing line. A user trying to load a 1M-chunk corpus into Pro will hit slow queries; we point them to Enterprise as the answer. Tier boundaries align with actual technical inflection points.

Strategic sequencing — open questions

  1. Brehob first or self-serve first? If Brehob signs ($3.5k/mo recurring + $12.5k setup), that's the income that funds Autri MVP. Argues for: Brehob first (QuoteAI ships), Autri MVP self-serve second, with the QuoteAI-rebase-onto-Autri happening when there's paying-customer pressure on the foundation. Don't parallelize — context-switching tax is huge for a solo founder.
  2. GMPPU IP separation. Already flagged HIGH priority in next.md. Lawyer consult ($500-1500) before any paying customer; the FIA-docs overlap risk between GMPPU and Autri is real if the latter's first vertical is FIA-style technical regs.
  3. Naming commitment. D1 still treats "Lorebase" as a placeholder. Resolved 2026-05-15: D1 locked on Autri.
  4. Free-tier abuse vs lead generation. The MCP gate above is the cleanest answer — free users can't get the agentic-IDE workflow, that's the upgrade trigger. If conversion stays >5%, the model works.
  5. Vision-extraction cost at production scale. The biggest model unknown. Recommend: stand up the prompt caching + budget alarm before opening sign-ups, run a 100-doc ingestion on real test data, validate the $0.10/100-page-doc assumption before committing the tier prices in copy.

Once Brehob outcome is known (sign or not, by ~late May 2026):

  1. If Brehob signs: focused 8-10 week MVP push on Autri. Sequencing per the five hard blockers above. Self-serve open by ~end of Q3 2026.
  2. If Brehob doesn't sign: re-evaluate go-to-market. Autri still ships, but on a slower cadence funded out-of-pocket. Maybe inverts the order — open free tier earlier as a lead-gen funnel for direct enterprise sales rather than self-serve scale.

Either way, the technical MVP plan stays the same; the go-to-market posture changes.

Rejected alternatives

  • Per-doc pricing instead of per-page-ingested-monthly. Rejected — doc count is too coarse a proxy for cost. A user who uploads 100 thousand-page docs costs us $100 in vision extraction; the same tier limit on 100 single-page memos costs $10. Pages catches both. (Adopted: pages-monthly + total-chunks-stored, as per § Pricing axes.)
  • Single chunk-count cap, no monthly axis. Rejected — chunks are recurring (storage) not burst (ingestion). Without a monthly-pages cap, a user could ingest a million pages on day one and exhaust the cost-driving resource before churning. Monthly axis matches the cost shape.
  • Free tier with MCP access. Rejected — MCP is the highest-value workflow and the cheapest one to serve. Giving it free makes the upgrade case for every paid tier weaker. Free is web-chat-only; MCP unlocks at Personal.
  • No first-month-loss mitigation on Pro. Rejected — $115 first-month cost vs $50/mo means churn at day 31 is a $65 loss. Annual plan with discount + bulk-import gating brings the break-even forward.

Open questions

The architecture above leaves a few productization decisions deliberately open. Listed here so they're tracked. Several originally-open questions have been resolved by the Deployment & isolation tiers decision (AWS-native + Bedrock + Terraform from day 0) and by QuoteAI's locked decisions (auth, pricing, cost validation). Resolved items are at the bottom under "Resolved."

Still open:

1. Self-serve onboarding flow (v1)

Operator-managed today (we create orgs and KBs on customer's behalf). The v1 self-serve flow needs:

  • Sign-up / email verification
  • Org creation (auto-create on first login? or explicit?)
  • First-KB onboarding wizard ("upload a doc, see it parsed, ask a question")
  • Pricing model + billing integration

None of this affects the schema — it's pure UI + auth-flow work. Open question: do we build self-serve when QuoteAI's first paying customer (post-Brehob #2+) demands it, or proactively for inbound interest?

2. Reciprocal rank fusion in the router

The agent currently picks one tool (occasionally two). If we observe consistent patterns where multi-index ensembling outperforms single-tool picks, RRF becomes worth implementing. Need data first — the existing retrieval_log is the data source. Open question: what's the threshold for "ensembling helps"? Probably look at chunks that scored well in two indexes vs only one and see if those are higher-quality answers.

3. Inline citation markers in agent answers

Today the playground shows the agent's answer separately from the chunks below. v0.2 lifts citation markers inline ([chunk_abc]-style refs the UI parses and renders as hover-tooltips with source bbox). QuoteAI's chat spike already has inline citation pills with hover popovers — that pattern is the migration target for Autri's playground. Open question: is this v1 or v2? Big UX win, moderate implementation effort.

4. Haiku-for-retrieval validation (Y1.5 per QuoteAI cost model)

QuoteAI's AWS cost validation has a Y1.5 plan to A/B Haiku-rerank vs Sonnet on a 15-20 query panel post-Brehob trip. If Haiku passes quality bar, swap retrieval/rerank to Haiku — unlocks margin headroom for high-usage chat scenarios. Same opportunity applies to Autri. Open question: what's the query panel for Autri's STEM Racing corpus? Likely overlap with QuoteAI's panel methodology, different content.

5. Multi-org users (contractors / consultants)

Current schema has users.organization_id as a single FK — each user belongs to exactly one org. Sufficient for v0 + most B2B SaaS shapes. Open question: if we hit a use case where one user works across multiple orgs (consultant working with multiple customers, contractor on multiple teams), this becomes a user_organizations join table. Cheap migration when needed, but worth flagging.

6. KB-level ACLs (per-user permissions within an org)

v0 model: all org members see all org KBs. Open question: when a customer asks for "the legal team can see the legal KB but the engineering team can't," we add a knowledge_base_acls table. Triggered by demand, not proactive.

7. Bedrock model version pinning + upgrade cadence

Bedrock lags Anthropic's direct API for model updates. Open question: do we always pin to a specific Claude version in prod, or follow Bedrock's default? Pinning gives predictability but requires manual upgrades. Following gets new features automatically but risks unexpected behavior shifts. Lean toward pinning with quarterly review — same pattern most enterprise customers will expect.

8. RLS policy authoring

We still need to write the actual RLS policies. The shape is "every row visible only if its organization_id (resolved via FK chain) matches the current session's current_user_organization_id." The Postgres mechanism: set a session variable in the server action, write policies that read it. Open question: do we manage policies via Drizzle migrations, or a separate migrations/policies/ directory? Drizzle doesn't have first-class RLS support yet.


Resolved:

By the Deployment & isolation tiers decision:

  • DB host (Neon vs Supabase vs Aurora).Aurora (RDS Postgres + pgvector for shared multi-tenant; Aurora Serverless v2 for dedicated tiers). AWS-native from day 0.
  • MCP-via-CLI vs MCP-via-SDK switch.CLI for dev (Max-billed, D12), Bedrock SDK for prod. Same router code, swap behind cli-client.ts abstraction.
  • Storage location for source files.S3. Per-tenant prefix in the bucket; KMS encryption with per-tenant keys for Tier 2/3.
  • Per-tenant database — when?Tier 2 onward (mid-market and up). Tier 1 stays shared with RLS.

By QuoteAI's locked decisions (carried forward — Autri + QuoteAI converge on the same stack):

  • Auth provider (Cognito vs Clerk).Cognito + Entra federation (per QuoteAI INFRA-D3). Hannah Labs runs the Cognito user pool; enterprise customers federate their identity provider (Brehob = Microsoft Entra / Azure AD) into Cognito. ~$0/mo at expected scale.
  • First-customer pilot pricing.$3,500/mo + $12,500 setup (per QuoteAI PRICING-D2). Multi-phase growth via scope additions ($5K Phase 1.5, $3K re-ingestion, $10-25K Phase 2 corpus).
  • Cost-multiple validation.×4.5 multiple at first-customer pilot pricing, against ~$771/mo mid-scenario AWS run cost (Bedrock + RDS + ECS + supporting services). Andy's "5-7× markup" rule of thumb landed at 4.5× after AWS validation. Headroom available via Y1.5 Haiku-retrieval swap if usage spikes.
  • Setup fee structure.$5K base (AWS env + auth/SSO + customer config + PM) + $7.5K Phase 1 corpus ingestion (~5K docs at ~$1.50/doc) = $12.5K entry. Add-ons priced separately.

Competitive Landscape

Captured 2026-05-08 during strategic refinement. The competitive set is wider than "RAG tools" because Autri touches several adjacent markets — general AI knowledge bases, vertical chat-with-docs tools, author-specific writing tools, dev/infra, and AI-platform-native KBs. Each ring has different competitors and a different attack/defense posture. This section captures the positioning rationale and the moat claims, so future product decisions can justify against a coherent competitive picture rather than re-litigating it ad hoc.

The five competitive rings

The set of products users might choose instead of Autri falls into five buckets. Each ring competes on a different dimension; Autri's positioning is to be defensibly differentiated in every ring, not best-on-features in any single one.

Ring 1 — General-purpose AI knowledge bases (closest)

CompetitorPositionWhere they winWhere they lose
NotebookLM (Google)Free, multi-modal, audio summariesDistribution + free + Gemini qualityNo API/MCP, source caps, no enterprise tier, black-box extraction
Notion AI"AI inside the doc tool you use"Massive distribution, embedded workflowMediocre RAG; not a real KB; no inspection
Mem.ai"Self-organizing AI workspace"Slick UX, fast capturePricing churned; positioning unstable; not source-of-truth
Obsidian + AI pluginsPower-user choiceLocal-first, plugin ecosystem, freeSetup tax; no managed RAG; no MCP
Reflect / RemNote / FabricPersonal AI notesNiche fitDon't scale to large corpora; black-box

The structural threat: NotebookLM. Free, Google-distributed, audio overviews are genuinely novel. But NotebookLM has architectural ceilings it won't break through without disrupting its own positioning: no MCP/API (would cannibalize Workspace integration narrative), source caps (would hurt cost margins on free tier), no enterprise tier (Google has Vertex/Workspace for that), and structurally cannot show extraction quality because their parsing is internal. That last gap is the inspector wedge.

Ring 2 — Vertical "chat with docs" tools

CompetitorPositionNotes
ChatPDF / AskYourPDF / HumataPer-doc chat, low-frictionToy-tier; commodity pricing race; no KB across docs; no MCP
HebbiaEnterprise document analysis (finance/legal)Strong incumbents; high-touch sales; ~$50–500k/yr
GleanEnterprise search/RAG over corp dataMassive funding; SSO/connectors; not self-serve
Sana AIEnterprise learning + KBNewer, well-funded, training/onboarding focus

Autri's enterprise tier (~$3.5k+/mo) sits between toy-tier and Glean/Hebbia. That's the gap for companies needing real RAG but not ready for $200k contracts. QuoteAI/Brehob-style customers live here.

Ring 3 — Author-specific tools

CompetitorPositionWhat they doThe gap they leave
Sudowrite"AI for fiction writers" — $19–59/moGenerative assist, plot help, "Story Bible"Story Bible is shallow; no MCP; no cross-book queries; locked to their UI
NovelAIGenerative fiction AIStrong generationNo KB / RAG over user's prior work
Plottr / Campfire / World AnvilStory planning / wikisManual structureNo semantic retrieval

The key insight in this ring: none of these products treat the author's prior body of work as a retrieval substrate. They're generation-first, reference-second. Autri is reference-first and integrates with whatever the author writes in (via MCP), rather than being yet another writing UI to learn.

Ring 4 — Developer/infra (framing competitors)

LlamaIndex, LangChain, Pinecone, Weaviate, Qdrant, Vectara. These are frameworks/infra, not products. Autri competes by being the hosted, opinionated, inspector-equipped product someone could otherwise stitch together themselves with these tools — for orders of magnitude more effort and without the inspector or operations-based extraction.

The pitch: "You could build this in 6 months with LlamaIndex + Pinecone + a custom inspector UI; or you could pay us $35/mo."

Ring 5 — AI-platform native KBs (existential watch)

CompetitorPositionThreat level
Claude Projects (Anthropic)KB inside claude.aiHigh — same vendor, same model, free with Claude Pro
ChatGPT Custom GPTs / FilesKB inside ChatGPTHigh — massive distribution
Cursor knowledge basesKB inside the IDEMedium — different audience but converging
Gemini-in-Workspace KBsKB inside Google Drive/DocsHigh for Google-native customers

This is the scariest ring because the AI platforms can absorb our wedge if they decide to. Mitigation: position Autri as the MCP server they consume, not the alternative they replace. If Claude Projects deepens, Autri is what plugs into it for users with corpora that exceed Claude Project's source limits or who need the inspector for trust.

Autri's differentiators (ranked by defensibility)

  1. Visual extraction inspector — unique. No competitor lets you see how each chunk maps back to the source page. This requires per-page-as-image source modeling end-to-end, which is hard to retrofit. The inspector is the wedge from § Overview > What Is This?, and competitive analysis confirms it's defensible.

  2. Color-coded retrieval traces — near-unique. Showing which index returned which chunk (lookup_section / FTS / vector) is rare. NotebookLM hides it. Glean shows source docs but not retrieval method. This makes Autri's outputs legibly trustworthy (§ Hybrid Retrieval).

  3. Operations-based extraction — technically hard to copy. D11 invariant: LLM emits operations against fragment IDs; code does mechanical work. Verbatim text and exact bboxes guaranteed by construction. Most competitors do token-window chunking with hallucination risk on metadata. Replicating this requires re-architecting their pipeline, not adding a feature.

  4. Hosted MCP with KB-scope — strategic + economic. Most KB tools are web-only. None of NotebookLM, Notion AI, Sudowrite have MCP. Per § Y1 Launch Plan > The MCP economic shift, MCP is both the highest-value workflow and the cheapest to serve (10–50× less inference cost than web router). MCP-as-paid-tier (D19) aligns user value with marginal cost — a structural advantage no competitor has framed yet.

  5. Coherent self-serve → enterprise ladder — positioning. Most competitors pick one segment. NotebookLM = consumer. Glean = enterprise. Sudowrite = author. Autri has Free → Author → Pro → Enterprise on the same backend, with the inspector and MCP being the through-line value at every tier.

Vulnerabilities (and mitigations)

  1. NotebookLM is free + Google-distributed. Hard to compete on price.

    • Mitigation: MCP, vet-extraction, and enterprise tier — features they structurally won't add.
  2. Claude Projects could absorb the inspector wedge. Anthropic could ship "show how each chunk was extracted" as a Claude Projects feature.

    • Mitigation: be the MCP server they call, not the alternative they replace. Autri is stronger as platform-agnostic infra than as Claude-platform-locked competitor.
  3. No mobile/iOS app. Authors use iPads; researchers travel.

    • Watch: this becomes a real complaint by month 6 of self-serve. Plan a thin mobile read-only app for v1.5.
  4. No collaboration features yet. Sudowrite, Notion, World Anvil all do multi-user.

    • Mitigation: planned Studio tier is the answer — ship when collab features are real, not before.
  5. Single LLM provider lock-in (Anthropic via Bedrock). Some users will want OpenAI / Gemini / local.

    • Mitigation: D12/D16 abstraction layer keeps this swappable. Not a pre-launch blocker.
  6. Cold-start brand discoverability. "Autri" is unknown vs. SEO incumbents.

    • Mitigation: the inspector demo is highly visual and shareable. Lean into "show, don't tell" content marketing — STEM Racing pilot, mom's author-KB testimonial, Brehob enterprise case study all generate evocative content.
  7. Solo founder vs. funded teams. Sudowrite has 20+; Glean has hundreds.

    • Mitigation: depth not breadth. Don't try to outship them on features — outship them on the one thing each segment cares about (inspection for skeptical pros; MCP for power users; per-KB ACLs for enterprise).

Strategic positioning

One-line pitch:

Autri is the inspectable knowledge base — see how every chunk was extracted, see which index retrieved it, and plug your knowledge into Claude / Cursor / your editor via MCP.

Three-segment marketing (same SKU, segment-specific copy):

SegmentHeadlineHero feature
Prosumer / authors"Your library, your reference. Per-book KBs. Plugs into your editor."MCP + inspector
Research / regulatory"Trust your retrieval. Inspect every chunk. See what the agent saw."Inspector + retrieval traces
Business / enterprise"Self-host or managed. Per-KB ACLs. Bedrock-backed. SOC2 path."Multi-tenancy + Bedrock + RLS

The thing nobody else can credibly say: "You can see exactly what we extracted, and exactly how the agent retrieved it." Inspection is the moat; MCP is the distribution + economic shift; the Free→Enterprise ladder is the GTM coverage.

Author market: starting segment, not the wedge

The author market (mom's profile and similar) is one validated starting segment among several, not the primary GTM thesis. The reason to start there is opportunistic — direct customer access via personal network — not strategic primacy. The broader self-serve market includes researchers, regulatory writers, technical documentation owners, and consultants; all share the same need (trustworthy KB over a personal corpus) and the same MCP-shaped workflow.

Why this distinction matters for product decisions:

  • Don't over-tune the inspector or extraction pipeline to fiction-shaped sources. The corpus generality (PDFs, regulations, structured records) is the architectural bet (§ Overview > Who Is It For?).
  • Don't over-invest in author-specific features (Continuity Check templates, series mode) before validating they're the highest-leverage feature work across the segment portfolio.
  • Use mom-as-design-partner for early UX feedback on the generic product, not as a request queue for author-specific features.

Build decision rationale

This section answers the gut-check question: given the competitive landscape, is Autri worth building?

Yes, with caveats. The reasoning:

Reasons to build:

  1. The inspector is a genuinely differentiated philosophy of trust. Not a feature — a different stance on the human/AI relationship for knowledge work. That's defensible because competitors would have to re-architect, not add a checkbox.

  2. The MCP economic insight is rare and right. Pushing power users to MCP is both cheaper to serve and more valuable to them. We'd be early to that pattern, and the architecture (D5/D14/D19) is already aligned with it.

  3. Customers exist before the product is finished. STEM Racing (pilot), mom (author segment validation), Brehob (enterprise prospect via QuoteAI). That's not always true at this stage.

  4. Marginal cost to ship is low. Foundation extracted from QuoteAI, not greenfield. Per D2, architectural decisions converged across both products — work on Autri strengthens QuoteAI and vice versa.

  5. The decision is reversible. Build MVP → validate weak interest → pivot or pause. The cost of not shipping is missing the window.

Caveats / honest risks:

  1. Author market is a starting point, not the destination. Don't bet the company on it alone. Multi-segment from day one.

  2. NotebookLM is the existential threat. Watch their roadmap. If they add MCP or expose extraction inspection, reassess immediately.

  3. Vision-extraction cost validation is the most important pre-launch check (§ Y1 Launch Plan > Cost-shape risks). If real costs are 10× our assumptions, the unit economics break and the whole pricing model needs rethinking.

  4. Don't ship until the moat is real. Inspection working perfectly + MCP integration that genuinely provides value. A weak v1 of either kills the differentiation story.

  5. The product needs to be demoable to convince anyone. Lean hard into video/screenshots showing the inspector. The story doesn't tell itself in text.

Rejected positioning alternatives

  • "Autri as Sudowrite competitor." Rejected. Sudowrite is generation-first; Autri is reference-first. Pivoting to writing-UI would abandon the inspector wedge and the corpus-generality bet, and put us in direct feature competition with a funded incumbent.

  • "Autri as Glean competitor." Rejected. Enterprise search has long sales cycles, requires SSO/SOC2 from day one, and demands a sales motion solo founders can't run. Enterprise tier is a future path enabled by self-serve traction, not the launch motion.

  • "Autri as ChatPDF commodity." Rejected. Race-to-the-bottom pricing on per-doc chat. The inspector is wasted in this positioning because users uploading one PDF for one chat session don't need to vet extraction quality.

  • "Autri as developer infra (LlamaIndex competitor)." Rejected. Different audience (developers vs. end users); different revenue shape (open source + cloud vs. subscription); different moats. The hosted opinionated product is the right shape; framework competition is not.


Cross-Cutting Concerns

ConcernSummaryAffectedDedicated Doc?
Versioningextractor_version column on pages and chunks is <prompt-version>/<model> (e.g., page-extract-v3/claude-haiku-4-5-20251001); lets us A/B prompts and modelsextractorN/A (in-code)
IdempotencyRender and load are idempotent on file content / source_hash; extract is destructive but reset preserves doc + pages rows so iteration doesn't break URLsingestionN/A
Cache lifecycleingestion/cache/ is gitignored and fully reproducible from source PDFs (small, committed). Render + parse cache outputs can be regenerated freelyingestionN/A

Risks & Constraints

Tech Debt

  • documents.status and confidence_tier not auto-populated after extraction — one-line fix pending in writeExtraction
  • sections.parent_section_id is NULL for all 446 rows — derivable deterministically from section_id strings, not yet computed
  • chunks.embedding is NULL for all 1,272 rows — embedding pass not done
  • chunks.confidence and chunks.flagged unset — only per-page confidence today
  • Section title artifacts preserved verbatim ("C 7.6 Pit D isplay setup and parameters" — PDF kerning quirks). This IS the desired behavior; if we ever want canonical titles, do it as a deterministic post-process

Known Limitations

  • Image-only PDFs (scanned legacy with no text layer) currently extract poorly. Need a vision-only branch in the classifier. None in current corpus.
  • Agentic router for retrieval doesn't exist yet; the corpus is queryable today only via direct SQL.
  • No formal test suite — extractor self-validates via Zod + the operations validator (every fragment_id must resolve), but there's no end-to-end test.
  • pgvector ivfflat lists=20 is tuned for small datasets (~thousands of rows). At scale we'll need to revisit; HNSW is the likely upgrade.

Technical Risks

RiskLikelihoodImpactMitigation
Claude CLI subprocess hits Max-plan rate limits at scaleLow at current usageStops ingestion mid-runAdd retry/backoff in cli-client.ts; fall back to API key for prod
Anthropic Max-plan policy changes (e.g., disallows subprocess use)LowForces SDK + API pathLLM calls already isolated behind cli-client.ts; swap is contained
pgvector ivfflat performance on large corpora (>100k chunks)MediumSlow vector searchSwitch to HNSW or tune lists parameter; defer until measured
PDF text layer absent on a real corpus PDFMedium (legacy scans)Falls back to vision (more expensive, less accurate)Per-page classifier picks routing; vision path is already proven

Epic Index

Pre-S0 spike phase; epics will be defined when scope solidifies past prototype.

Likely first epics (when defined):

  • E1 — Retrieval layer: lookup_section / fts_search / vector_search + agentic router + query playground UI
  • E2 — Status + tier UI fixes (the small one-line follow-ups + classifier wiring)
  • E3 — Section parent linking + recursive queries
  • E4 — Embedding pass + content hash management

Decisions Log

The active decisions list lives in ~/Documents/Code/autri/decisions.md (in the project repo, per workspace convention). Highlights graduate here when the rationale is settled-but-non-obvious.

DateDecisionRationaleAlternatives Considered
2026-05-06D10 — PDF text-layer first; vision fallbackVerbatim accuracy + exact bboxes guaranteed by construction. Cost down 5x at higher quality on the same corpus.Pure vision (rejected — cost, drift, OCR errors). Hybrid heuristic (deferred — agentic classifier reads density score).
2026-05-06D11 — Operations-based extractor (LLM does semantics, code does mechanics)LLM jobs scoped to irreducibly-semantic tasks; mechanical work is deterministic. Same Haiku now matches Sonnet quality.Vision-only with full-content output (rejected — proven worse on every axis). Pure heuristic chunker (rejected — misses semantic boundaries).
2026-05-06D12claude CLI subprocess (not Anthropic SDK)Bills against user's Max plan; the SDK doesn't support Max billing per Anthropic policy.Anthropic SDK with API key (deferred to production deploy). Files API workaround (rejected — still requires API key).
2026-05-06D5 — Agentic router via MCP tools (for retrieval)Same retrieval functions exposed via MCP for dev + native tool-use for prod. Agentic routing handles edge cases the heuristic can't, and the trace is itself a UX feature for the inspector.Hardcoded heuristic router (deferred — agentic-first while iterating).
2026-05-06D7 — Vetting > tuning as primary user jobTrust is the wedge for both the STEM team and the enterprise pitch. Tuning is power-user territory, v0.3+.Combined "everything UI" (rejected — overwhelming). Tuning-first (rejected — no users without trust).
2026-05-06D2 — Foundation extracted from QuoteAI, not greenfieldQuoteAI's KB layer is what Autri IS. Mirror QuoteAI's stack so the eventual rebase is friction-free.Greenfield with different stack (rejected — duplicates work, blocks rebase).

Glossary

TermDefinition
FragmentA line-level text unit from the PDF text layer, with id, text, and normalized bbox. The atom the agent groups into chunks. Produced by the parse stage.
OperationA register_section / register_chunk / register_figure directive emitted by the extractor agent. References fragment IDs; never regenerates content.
ChunkA semantic content unit (text paragraph, table, figure, heading) stored in chunks. Content is deterministically composed from fragments by the commit step.
SectionA formal numbered ID heading (e.g., C7.6.1). Stored in sections with hierarchy via parent_section_id.
InspectorThe /docs/[id] UI that overlays chunk bboxes on rendered pages with bidirectional hover. The product wedge.
TierPer-doc confidence classification: green (≥0.85) / yellow (≥0.70) / red. Computed from avg page extraction confidence.
VerbatimText copied character-for-character from the PDF text layer, including any typesetting artifacts (kerning gaps like "C 7.6"). Guaranteed by the operations architecture — the LLM never types content.
Operations contractThe Zod-validated set of register_* directives the extractor agent emits. Stable across model swaps.

This design doc is the source of truth for project-level architecture. Active decisions live in the project repo's decisions.md. Session handoffs live in the project repo's next.md. Code is the source of truth for implementation; this doc captures the WHY.

Review

🔒

Enter your access token to view annotations