Foundry Foundry

Epic: Gate-0 Corpus Spike

Roadmap item 0 of brehob-launch · blocks scope-lock of everything downstream · 0a due by kickoff (Jun 15), 0b by kickoff +1 week · runs locally (no Brehob data touches a deployed environment; local pipeline carries cost instrumentation via migrations 013/014; the slice's LLM calls may use the Anthropic API under its no-training terms, per DB-7).

Objective

The go/no-go on delivering Brehob on the Autri substrate: prove (or disprove) that the Brehob corpus — 76% legacy formats, table/line-item-heavy — converts and ingests at retrievable fidelity and viable cost, with pass bars defined before running. Plan B if no-go: conversion-approach rework + heavier curation + spend the +3-week extension. A no-go does not kill the signed deal; it redirects the first three weeks.

Pass bars (define in S1, BEFORE any scoring)

  • Per-regime recall targets on authored query gold — reported as the regime mix (structured / prose / table), never a single aggregate. Table/line-item queries get an explicit minimum: that's the quote-quality risk.
  • Conversion-fidelity rubric: table rows preserved, headings preserved, junk rate per converted doc.
  • $/doc ceiling consistent with $3,000/mo at the 70% margin target (per-doc cost columns are the instrument).
  • Authoring discipline per the eval-pipeline skill (per-index query gold; drive vector vs FTS vs lookup deliberately). Small-n caveat: with ~8–12 docs the verdict carries a noise floor — report it, and don't over-read single-doc deltas.

Work breakdown

0a — go/no-go core (by kickoff):

  • S1 — Pass bars + query gold. Dan authors the quote-scenario gold (~half a day, budgeted): what a Brehob salesperson actually asks. AI authors the per-index gold under the eval-pipeline discipline. Numbers for the three bars above get written down here, first.
  • S2 — Slice selection. ~8–12 docs from quoteai/ingestion/cache/QuoteAI - Brehob Quote History/ spanning format × doc-type × complexity: the Powerex oilless spec (golden scenario), a hospital/NFPA medical system, sent quotes (… - Final.doc), an .xls/.xlsx pricing sheet, and 1–2 deliberately-old docs (tests whether age = junk for the curation rules).
  • S3 — Conversion spike. (a) Convert the slice locally — textutil for .doc is fine on macOS for the spike; SheetJS Excel parser (port from quoteai/ingestion/parsers/excel.ts) for .xls/.xlsx → markdown pipe-tables. (b) Run the same .doc files through a LibreOffice headless container and compare output parity → this comparison IS the server-side recommendation (output ①: container-image Lambda vs Fargate task — not a zip Lambda; size/filesystem traps).
  • S4 — Ingest into a local Brehob library with per-type KBs; record which regime each doc lands in (structured / prose-Haiku-grouping / table).
  • S5 — Score with the eval-pipeline scorecard → per-regime recall + MRR vs the S1 bars. The sharpest read: do pipe-table rows surface as retrievable chunks (output ②).
  • S6 — Cost read. Per-doc cost columns → $/doc per format/regime (output ③) + the carried check that figure-vision cost is non-zero on a real deployed ingest, using a non-Brehob doc (output ④).

0b — analyses (kickoff +1wk; reuses 0a's slice + scorecard):

  • S7 — Curation rules (output ⑤): from the slice + corpus file-listing stats, recommend recency windows per doc-type, final-vs-draft dedup, value tiers, product-line coverage check. Consumed by M2 triage.
  • S8 — Drafter sufficiency (output ⑥): run the Powerex golden scenario against plain Autri generic retrieval; report where (if anywhere) it falls short of QuoteAI's specialized tools — especially the numeric-range equipment case (HP/CFM/PSI). Decides whether the generic structured-metadata filter primitive enters epics/quoteai-vertical.
  • S9 — Verdict + report → update brehob-launch with go/no-go, the regime mix, and the real effort estimate for epics/ingestion-foundation.

Acceptance

0a: all four outputs reported against pre-defined bars, with the noise floor stated. 0b: curation rules concrete enough for M2 to execute; drafter-sufficiency verdict concrete enough to size item 4.

References

Pull back archive/sub-systems/eval-corpus-and-doe, archive/sub-systems/pipeline-eval-harness, archive/sub-systems/unified-chunking-markdown. Pipeline facts verified 6/10: .md → STRUCTURED hard-coded (parse-markdown.ts); deployed prose path = Haiku "chunk-grouping-v2" extractor; chunker has no table-specific handling (chunk_type: 'table' is a label only).

Review

🔒

Enter your access token to view annotations