Foundry Foundry

Autri — North Star

Gold source of truth. This is the doc everything else hangs off. Every epic, feature, story, experiment, and "should we build X?" gets checked against the objectives here. The mission and business objectives are stable; technical objectives and the dependency map evolve as we learn. Referenced as the north star from next.md and decisions.md.

Status: first draft (2026-06-06). Brainstormed in session; to be refined. Items marked [proposed] or [open] are not yet locked.


Mission

The world's best bring-your-own-data knowledge base — retrieval you can trust, from one person's bookshelf to a Fortune 500's document vault.

Trust is the angle. Most retrieval tools are black boxes: when an answer looks wrong you can't tell why, and you can't tell when it's right either. Autri's bet is legibility — you can see exactly how your documents were parsed, chunked, embedded, and retrieved, and trace any answer back to the exact spot in the source it came from. That's the thing the well-funded incumbents structurally can't copy without rebuilding their whole product.


North-Star Metric

Maximize retrieval quality per dollar the customer spends, while we stay margin-positive.

Plainly: how many docs can a customer upload, keep good retrieval, and have us still make money. This is the ratio every other objective serves — quality (the numerator) and cost-to-serve (the denominator) are optimized together, never one at the expense of blowing up the other.

What "margin-positive" means (gross margin = revenue minus cost-to-serve: inference, embeddings, hosting, storage):

  • Hard floor (binding constraint): positive gross margin on every individual customer — no one ever costs more than they pay. The risk is the tail, not the average (e.g. a PDF-heavy power user racking up ingest cost on a low tier), enforced via tier caps + fair-use.
  • Target: 70% blended gross margin at scale — credible and achievable for an AI-native product today (LLM inference is real cost-of-goods, unlike pure SaaS at ~80%).
  • Stretch: 80% as models get cheaper and routing tightens — SaaS-grade.

These become real once cost-observability gives beta numbers; ratchet the target up as the data lands.

How we'll know: cost observability (per-doc + per-query accounting, shipped) measures the denominator; the eval harness (per-index + hybrid recall) measures the numerator.


Business Objectives

Format: each objective states the customer outcome and why in the headline; mechanisms (how we deliver it) live in the description or drop down to Technical Objectives. Rule: a value the customer gets is a business objective; a thing we build is a technical objective.

B1 — Trustable retrieval: verify quality before you trust an answer

Customers can see and verify how their documents were understood, and trace every answer back to the exact source it came from — rich source views, direct links back to the original document, and pixel-precise highlighting of where a quote lives. Quality alone is table-stakes and a losing arms race against well-funded RAG vendors; verifiable quality is our moat. The same legibility that builds end-user trust is also our debugging surface and our sales demo.

  • Where we draw the answer-correctness line. We make every answer verifiable — click a citation and the source document opens to the exact highlighted spot — and the user judges whether it's right; we don't auto-grade the answer's correctness. What we do own: that the citation → source → bbox loop is trustworthy, and a corrective-action path when an answer or citation is clearly wrong in-app (flag → re-retrieve / re-answer, surface a mismatch) so the user isn't left stuck.
  • Dimension — speed-to-trust. How fast a new user gets from "I uploaded my documents" to "I believe the answers this thing gives me." Two halves: how fast (ingest latency — batch ingestion's 2× speedup serves this) and how legible (the inspector convincing them on the very first doc). For self-serve this is the conversion lever — trust is the whole pitch, and a slow or confusing first run kills it before trust ever forms.
  • How we'll know: retrieval recall/quality from the eval harness; time-to-first-trusted-answer for the speed-to-trust dimension.

B2 — Know cost cold, so price always tracks cost

We operate with full visibility into what every document and every query costs us, so we can stay margin-positive at any usage pattern and price correctly instead of guessing. Minimizing cost (batch ingestion, cheaper models, code-does-mechanics routing) is a tactic underneath this; the objective is knowing — you can't hit a margin target you can't measure, and you can't price a product whose cost you don't understand.

  • How we'll know: real per-doc and per-query cost from beta usage, tracked against the margin floor and target above.

B3 — Be the data layer for the AI tools people already use

Autri isn't another chatbot to adopt — it's the knowledge layer that plugs into the assistants people already work in (Claude, Copilot, Cursor, ChatGPT) via MCP and APIs. The strategic payoff: enterprise procurement friction collapses, because the customer doesn't have to approve a new AI tool, just a new data source for one they've already approved. "MCP as infrastructure" — the web app is the demo and the inspector; the integration is where the product gets used day-to-day. (Distinct from B6: B3 is plugging into assistants that already exist; B6 is being the foundation for products that don't exist yet.)

B4 — One substrate, individual → Fortune 500

The same product self-serves an individual novelist and hosts a Fortune 500 — without forking the codebase. Shared-database, row-level isolation for self-serve; a dedicated, isolated deployment (own VPC / database / storage) for enterprise, where a silo is a config-and-deploy, not a fork. Enterprise-grade data governance (tenant isolation, data residency, compliance, IP allowlisting) lives here — institutional trust ("your data is safe and isolated") is the org-level complement to B1's answer-level trust, and it's what makes "scales to Fortune 500" real rather than a slogan.

B5 — Monetization that expands with the customer

A revenue model that grows as the customer gets more value: free inspector → paid tier → integration unlock → enterprise. Price tracks both value delivered and cost-to-serve (B2 feeds this). This objective gates on B1 + B2 being known first — we can't price tiers correctly until we know what quality we deliver and what it costs us.

B6 — Be the substrate others build on (and we expand on)

Autri is an inspectable knowledge + retrieval substrate, exposed via API/MCP, so (a) third parties can build their own apps and agents on top of it (Autri is the infrastructure; they own the vertical experience), and (b) we can grow up the stack ourselves by adding agent/vertical layers on our own substrate — QuoteAI is the first (a quoting agent on top of the knowledge base). The substrate stays horizontal and trustable; agents/verticals capture specific vertical value. "Make any AI better without retraining." Furthest-out objective — laid out now so we build toward it, not because it's near-term.

The B3/B6 line: B3 = plug into assistants that already exist; B6 = be the foundation for products that don't exist yet.


B7 — Dependable: it's up, and your data and answers are always there

The product is reliable for everyone, not just enterprise: it stays up, ingestion doesn't silently fail, and your data and answers are available when you need them. "World's best" implies dependability as a baseline, not a premium add-on; enterprise SLAs (already in the pricing) are the formal, contractual version of the same promise. Reliability is the operational complement to B1's trust — an answer you can't get to, or an ingest that quietly dropped half your docs, breaks trust as surely as a wrong citation.


Technical Objectives

Each technical objective serves one or more business objectives (tagged →), with a coverage status: [have] built/validated · [partial] exists but incomplete · [missing] not started · [deferred] intentionally later. These break down into epics/features/stories. High-level for now; sharpened by the 2026-06-06 coverage red-team. Deploy/auth statuses corrected 2026-06-06 after verifying true prod state.

Serving B1 (trustable retrieval + verifiable answers):

  • Inspectable ingestion pipeline → B1. [have] Visual extraction inspector: source view, bbox overlays, chunk-boundary highlights, grouping rationale, color-coded retrieval traces. Hole: source pane is dead for non-PDF docs — see next.
  • Source view for all doc types → B1. [partial] Today only PDFs render a source pane; prose/docx/text show nothing. Render derived text with chunk-boundary highlights so "see where in the source" works for every doc type.
  • Measurable retrieval quality → B1, B2. [have] Eval harness: per-index + hybrid recall scorecard, golden query corpus, regression gates.
  • High-recall hybrid retrieval across doc types → B1. [partial] Vector + full-text + section/ID lookup via an agentic router. Validated on prose/verse/structured; unproven on tables, figures, spreadsheets, scanned/OCR PDFs.
  • Verifiable answers + corrective actions → B1. [partial] The trust line: every answer carries citations resolving to the exact source + bbox so the user judges correctness (we make it verifiable; we don't auto-grade). Have: citations → source. Missing: (a) the click-citation → source+bbox verify loop wired end-to-end on the answer path; (b) a corrective-action path when an answer/citation is clearly wrong (flag → re-retrieve / re-answer, surface a mismatch). Lighter: signal low-confidence / abstain when retrieval is weak.
  • Confidence-tiered review → B1. [have] Docs land green/yellow/red; surface what to verify, bulk-approve the rest.
  • Broad source-format coverage → B1. [partial → roadmap] PDF + docx + markdown today. A prioritized roadmap, not an "any format" promise: spreadsheets, slides, HTML, images/OCR, transcripts.
  • Speed-to-trust instrumentation → B1. [missing] Own onboarding latency + first-run legibility; measure time-to-first-trusted-answer. (Ingest-latency lever = batch, under B2.)

Serving B2 (cost / margin):

  • Cost observability → B2. [have] Per-doc + per-query cost accounting with stage breakdown. Owed: real worker-write verification (deploy-gated).
  • Full COGS accounting → B2. [missing] Extend the cost model beyond LLM + embeddings to storage (page PNGs, DB/pgvector) and worker compute — a true margin number needs them.
  • Margin monitoring + floor enforcement → B2. [missing] Aggregate cost into margin-per-customer/tier; watch and alert on the never-negative-per-customer floor (the binding constraint), enforced via caps/fair-use.
  • Cost-efficient ingestion → B2. [partial] Batch (~50% cheaper, validated, build pending), cheaper models where quality holds (Bedrock Nova), code-does-mechanics routing, image-gating.
  • Cost-efficient retrieval → B2. [partial] Keep the query path cheap; integration path is cheapest-to-serve. Add: caching/dedup for repeated queries at scale.

Serving B3 (be the data layer):

  • Authenticated REST API + API-key auth → B3, B6. [missing — decided] D56 amendment: clean API + static API-key auth, sidestepping the OAuth/DCR/resource-indicator pain. The MCP retrieval server already works; only the OAuth flow burned us.
  • Provider-agnostic retrieval interface → B3. [partial] Same retrieval functions as MCP tools + native tool-use. Reality check: "across providers" may be three integration models — MCP (Claude/Cursor), Copilot extensions, GPT actions — not one.
  • Live data connectors + freshness → B3, B1. [missing — epic-cluster] Connect to where data lives (SharePoint, OneDrive, Google Drive, Foundry for dogfood); change detection (webhook vs poll); auto re-ingest; deletion propagation. Stale data erodes trust.
  • Hosted MCP OAuth (WorkOS Connect) → B3. [deferred → enterprise-SSO demand] The full OAuth path; superseded for beta+1 by API-key auth.
  • API rate limiting / quotas / abuse protection → B3, B6. [missing] For public-facing API.

Serving B4 (one substrate, individual → F500):

  • Production deploy + auth/identity → B4. [have] Deployed at app.autri.ai; real Cognito + Google auth with an email allowlist (curated beta gating); post-confirm Lambda provisions the user/org row. Remaining: SSO/SAML for enterprise (later). (Corrects the 2026-06-06 red-team "nothing deployed" finding — that was based on a stale CLAUDE.md.)
  • Multi-tenancy: pool + silo → B4. [have-in-code / partial] Row-level isolation enforced in app code across read/mutation/query surfaces — but never exercised with real multiple users in prod (verify before the beta cohort). Parameterized dedicated-deploy silo not yet built (pattern decided).
  • Baseline security (all tiers) → B4, B1. [partial] Encryption likely via AWS/RDS + TLS defaults (verify); tenant isolation in code. Missing: data export / portability / deletion (BYO-data implies take-it-back; GDPR right-to-be-forgotten).
  • Enterprise compliance (F500) → B4. [missing] SOC2 posture, audit logs, data-residency enforcement, IP allowlisting, pen-testing / vuln management. The escalation tier within B4.

Serving B5 (monetization):

  • Metering, tier enforcement, billing → B5. [missing — Commerce MVP] Usage metering (chunk caps, query limits, fair-use), tier enforcement, payment (Stripe), overage/dunning.
  • Customer-facing usage visibility → B5, B1. [missing] Show customers their consumption vs caps — reduces bill surprise, reinforces trust.

Serving B6 (substrate others build on): [furthest-out; named so the gap is visible]

  • Public, versioned API + SDK + developer keys → B6. [missing] (Builds on the B3 API + key-auth above.)
  • Agent-layer framework → B6. [missing] How QuoteAI-style agents sit on top of the substrate.
  • API usage metering for third-party consumers → B6. [missing]

Serving B7 (dependable):

  • Observability + monitoring + alerting → B7. [partial/missing] Including surfacing ingestion failures (continue-on-error is the convention, but failures must be visible, not silent). Verify what exists in the deployed app.
  • Autoscaling + scalable ingestion workers → B7, B4. [partial] Lambda-based; validate under real load.
  • Backup / disaster recovery → B7. [partial/missing] Verify RDS backups/snapshots exist for real user data.
  • Incident response + SLA tracking → B7. [missing] Formalizes the enterprise SLA the pricing already promises.

Dependency Map (high level)

How the objectives ladder and gate each other:

  • Deploy + auth/identity already exist (live at app.autri.ai, Cognito + Google + email allowlist, tenancy enforced in code) — so B3/B5/B6/B7 build on top of a live platform, not behind an unbuilt foundation. (The earlier "deploy is the chokepoint" framing was based on a stale CLAUDE.md; corrected 2026-06-06.) The real remaining-foundation items are narrow: verify tenancy in prod with real users, reliability/monitoring, and — for enterprise, later — SSO + silo.
  • B5 (monetization) depends on B1 + B2 — can't set tiers until we know the quality we deliver and the cost to serve it. ("Measure first" is the beta stance.)
  • B3 (integration) now uses API-key auth, not MCP OAuth (D56 amendment) — collapsing the old "MCP auth is a multi-day keystone" blocker. Full MCP OAuth (WorkOS) is deferred to enterprise-SSO demand.
  • B4 (enterprise) depends on silo deploy + compliance; its SLA promise depends on B7 (dependability).
  • B6 (substrate) depends on B3's API + auth being public and stable.
  • B1 + B2 are the live frontier — quality validated across doc types, cost made observable (shipped) and being driven down (batch validated). They feed the north-star metric directly and unblock B5.

(Next refinement: expand into a full objective → epic → feature → story breakdown with explicit arrows. The Roadmap doc holds the sequenced version.)


Open Notes / To Decide

  • Correction (2026-06-06): the red-team's "nothing is deployed" finding was wrong — based on a stale CLAUDE.md. The web app IS deployed (app.autri.ai) with real Cognito + Google auth + email allowlist + tenancy enforced in code, and is feature-complete (upload → inspect → chat → cost). The customer beta is a deploy-current-main + verify + invite + measure effort, not a build. Biggest real risk: multi-tenant isolation has never been exercised with real multiple users in prod.
  • Coverage red-team applied (2026-06-06): answer-trust scoped to verifiable answers + corrective actions; security/compliance kept in B4 (split baseline / enterprise); reliability promoted to B7 (dependable); data-format breadth = roadmap technical objective. Tech objectives tagged [have]/[partial]/[missing]/[deferred].
  • Still open / to design: corrective-action mechanics; connectors + freshness scoping; the full objective → epic → feature → story breakdown (see the Roadmap doc).
  • Earlier-locked (2026-06-06): margin target 70% blended / 80% stretch + never-negative-per-customer floor; platform/substrate = B6; speed-to-trust = a named dimension of B1; beta+1 auth = API-key, not MCP OAuth (D56 amendment).

Review

🔒

Enter your access token to view annotations