Epic: Gate-0 Corpus Spike
Roadmap item 0 of brehob-launch · blocks scope-lock of everything downstream · 0a due by kickoff (Jun 15), 0b by kickoff +1 week · runs locally (no Brehob data touches a deployed environment; local pipeline carries cost instrumentation via migrations 013/014; the slice's LLM calls may use the Anthropic API under its no-training terms, per DB-7).
Objective
The go/no-go on delivering Brehob on the Autri substrate: prove (or disprove) that the Brehob corpus — 76% legacy formats, table/line-item-heavy — converts and ingests at retrievable fidelity and viable cost, with pass bars defined before running. Plan B if no-go: conversion-approach rework + heavier curation + spend the +3-week extension. A no-go does not kill the signed deal; it redirects the first three weeks.
Pass bars (define in S1, BEFORE any scoring)
- Per-regime recall targets on authored query gold — reported as the regime mix (structured / prose / table), never a single aggregate. Table/line-item queries get an explicit minimum: that's the quote-quality risk.
- Conversion-fidelity rubric: table rows preserved, headings preserved, junk rate per converted doc.
- $/doc ceiling consistent with $3,000/mo at the 70% margin target (per-doc cost columns are the instrument).
- Authoring discipline per the eval-pipeline skill (per-index query gold; drive vector vs FTS vs lookup deliberately). Small-n caveat: with ~8–12 docs the verdict carries a noise floor — report it, and don't over-read single-doc deltas.
Work breakdown
0a — go/no-go core (by kickoff):
- S1 — Pass bars + query gold. Dan authors the quote-scenario gold (~half a day, budgeted): what a Brehob salesperson actually asks. AI authors the per-index gold under the eval-pipeline discipline. Numbers for the three bars above get written down here, first.
- S2 — Slice selection. ~8–12 docs from
quoteai/ingestion/cache/QuoteAI - Brehob Quote History/spanning format × doc-type × complexity: the Powerex oilless spec (golden scenario), a hospital/NFPA medical system, sent quotes (… - Final.doc), an.xls/.xlsxpricing sheet, and 1–2 deliberately-old docs (tests whether age = junk for the curation rules). - S3 — Conversion spike. (a) Convert the slice locally —
textutilfor.docis fine on macOS for the spike; SheetJS Excel parser (port fromquoteai/ingestion/parsers/excel.ts) for.xls/.xlsx→ markdown pipe-tables. (b) Run the same.docfiles through a LibreOffice headless container and compare output parity → this comparison IS the server-side recommendation (output ①: container-image Lambda vs Fargate task — not a zip Lambda; size/filesystem traps). - S4 — Ingest into a local Brehob library with per-type KBs; record which regime each doc lands in (structured / prose-Haiku-grouping / table).
- S5 — Score with the eval-pipeline scorecard → per-regime recall + MRR vs the S1 bars. The sharpest read: do pipe-table rows surface as retrievable chunks (output ②).
- S6 — Cost read. Per-doc cost columns → $/doc per format/regime (output ③) + the carried check that figure-vision cost is non-zero on a real deployed ingest, using a non-Brehob doc (output ④).
0b — analyses (kickoff +1wk; reuses 0a's slice + scorecard):
- S7 — Curation rules (output ⑤): from the slice + corpus file-listing stats, recommend recency windows per doc-type, final-vs-draft dedup, value tiers, product-line coverage check. Consumed by M2 triage.
- S8 — Drafter sufficiency (output ⑥): run the Powerex golden scenario against plain Autri generic retrieval; report where (if anywhere) it falls short of QuoteAI's specialized tools — especially the numeric-range equipment case (HP/CFM/PSI). Decides whether the generic structured-metadata filter primitive enters
epics/quoteai-vertical. - S9 — Verdict + report → update brehob-launch with go/no-go, the regime mix, and the real effort estimate for
epics/ingestion-foundation.
Acceptance
0a: all four outputs reported against pre-defined bars, with the noise floor stated. 0b: curation rules concrete enough for M2 to execute; drafter-sufficiency verdict concrete enough to size item 4.
References
Pull back archive/sub-systems/eval-corpus-and-doe, archive/sub-systems/pipeline-eval-harness, archive/sub-systems/unified-chunking-markdown. Pipeline facts verified 6/10: .md → STRUCTURED hard-coded (parse-markdown.ts); deployed prose path = Haiku "chunk-grouping-v2" extractor; chunker has no table-specific handling (chunk_type: 'table' is a label only).