Foundry Foundry

Anvil — Project Design Doc

Status: Draft — Step 0 Refinement Created: March 28, 2026 Authors: Dan Hannah & Clay


Overview

What Is This?

Anvil is an open-source MCP server that makes any document collection queryable by AI agents. Point it at a directory of markdown files (or any supported format), and it automatically chunks your content by structure, generates embeddings using a local model, and stores them in a local vector database. Agents connect via MCP and semantically search your docs on demand — no copy-pasting context, no manual curation, no external services required.

Point it at your docs → agents can query them. That's the entire promise.

While the name evokes mkdocs (and it works beautifully alongside mkdocs projects), Anvil is not coupled to mkdocs. It's a standalone tool that works with any directory of documents. Technical docs, manuscripts, knowledge bases, legal libraries — if it's text in files, Anvil can index it.

Why This Exists — The Dual-Audience Problem

Written content has always served one audience: humans. Documentation sites, manuscripts, knowledge bases — they're all optimized for human consumption. Beautiful formatting, searchable web pages, nice navigation. And they're great at it.

But now there's a second audience: AI agents. And their needs are fundamentally different from humans.

NeedHumanAI Agent
FormatBeautiful HTML, nice typography, nav sidebarRaw text with metadata — headings, paths, structure
Access patternBrowse, scan, read linearlyQuery semantically — "find me the section about X"
VolumeRead one page at a time, skim the restNeeds targeted chunks — 500 tokens, not 50,000
FreshnessChecks docs when they remember toNeeds docs current as of the last build, every session
ContextCarries knowledge between reading sessionsStarts from zero every session — the cold start problem

Anvil bridges this gap. The human gets mkdocs — the same beautiful, browsable site they've always had. The agent gets an MCP server — semantic search over the same content, returning exactly the chunks it needs. Both audiences consume the same source of truth, in their optimal format.

The Collaboration Unlock

This isn't just about agents reading docs. It's about human-AI collaboration at the documentation layer.

When a human writes or updates documentation, the agent's knowledge updates automatically on the next build. When an agent needs context for a task, it queries the docs instead of the human copy-pasting 15,000 tokens into a prompt. The documentation becomes the shared memory between human and AI — bridging the gap between sessions and solving the cold start problem that plagues every AI workflow.

In the context of CSDLC, this transforms Step 2 (Agent Prompt Crafting) from manual context curation to agent self-service. Instead of the AI Lead extracting and pasting relevant doc sections into each sub-agent's prompt, sub-agents query the docs themselves and pull exactly what they need. Estimated reduction: 15-20k tokens of pasted context → 500-1,000 tokens of targeted retrieval per query.

Who Is It For?

v1 — Developers & technical teams: Teams working with AI agents who need their documentation queryable. Pain point: agents need documentation context but current options are either "paste the whole doc" (token-expensive, noisy) or "hope the agent figures it out" (unreliable). These users are comfortable with CLI tools, understand MCP, and have markdown docs.

v2+ — Anyone with a document corpus and an AI workflow:

AudienceUse CaseExample
Authors & writersAI assistant that knows your entire body of work — characters, plot threads, continuity, world-buildingA mystery author with 10+ books needs an AI that can answer "what color eyes did I give this character in book 3?"
Legal teamsSemantic search across contract libraries, case law, policy documents"Find all precedents related to force majeure clauses"
EnterpriseInternal knowledge bases, SOPs, runbooks queryable by AI assistantsNew hire's AI assistant can query the entire company knowledge base
ResearchPaper libraries, lab notebooks, literature reviews"What methods have been used for X in the last 5 papers?"
EducationCourse materials and textbooks as AI-queryable resourcesStudents' AI tutors can pull relevant course content on demand

The broader vision informs architecture decisions (format-agnostic, not hardcoded to markdown) but v1 scope is developers with markdown docs.

Bootstrapping with Existing Projects

Anvil is designed to drop into any existing project with zero migration:

  1. npm install -g anvil (or npx anvil)
  2. anvil --docs ./docs/
  3. MCP server starts, indexes your docs on first run, and is ready for agent queries.

No restructuring, no special frontmatter, no format changes. If it's a directory of text files, Anvil can index it. This applies to existing projects like our Routr CSDLC docs, internal documentation at work, or any markdown-based project.

First-run experience: The initial indexing (30-60 seconds for a large corpus) happens once on first startup. The MCP server shows progress and becomes available for queries as soon as indexing completes. Every subsequent startup is near-instant (loads existing DB, checks for changes).

Business Model

Open-source core (MIT license). The CLI, local embeddings, local vector DB, and MCP server are free forever. This is the developer wedge — how people discover Anvil and prove it works.

Monetization tiers (not v1 — informs architecture, not scope):

TierWhatWho PaysWhy They Pay
Open-source CLInpx anvil --docs ./path — local everything, stdio MCPFree foreverAdoption, community, developer trust
Cloud syncDB auto-syncs to cloud. MCP accessible from anywhere. Team sharing.Teams, small companiesCollaboration — multiple people/agents query the same docs
Managed serviceUpload docs via web UI, get an MCP endpoint. No CLI needed.Non-technical users (authors, legal, enterprise)They want the value without the terminal
Analytics"Which docs do agents query most?" "Which sections have low retrieval quality?"Anyone optimizing docs for AIData-driven doc improvement
Custom embeddingsFine-tuned models for specific domains (legal, medical, etc.)Enterprise, specialized verticalsDomain-specific accuracy

v1 is free, open-source, local-only. The goal is adoption and validation. Revenue comes after product-market fit.


Competitive Landscape

The core idea of "MCP server that indexes markdown for AI agents" is validated, not novel. Multiple tools exist in this space. We're building Anvil anyway — here's what exists and why our approach is different.

Existing Tools

ToolWhat It DoesLanguageKey StrengthsKey Weaknesses
markdown-vault-mcpGeneric markdown MCP server, FTS5 + semantic + hybrid search, read/writePythonMost mature. Write tools, Docker, systemd, frontmatter-aware. 13 MCP tools.Python-only. Generic "search your files" — no awareness of doc structure or project relationships.
mkdocs-mcp-pluginmkdocs-specific MCP with keyword + vector + hybrid searchPythonTight mkdocs integration, auto-detects mkdocs.ymlCoupled to mkdocs — requires dev server running. Not standalone.
MCP-Markdown-RAGMarkdown RAG via Milvus vector DBPythonSolid incremental indexingRequires Milvus (heavy infrastructure). Not zero-config.
document-mcpMulti-format local doc indexer with LanceDB + OllamaPythonSupports PDF, Word, RTF, not just markdownRequires Ollama running. Heavy dependencies. Personal tool, not team-ready.

Why We're Still Building This

1. Supply chain ownership. Anvil is foundational to CSDLC — our entire sub-agent workflow depends on agents querying docs. We can't have that dependency on an external PyPI package with uncertain maintenance. Owning the tool means we control our own process.

2. TypeScript / zero-config. Every existing tool is Python. The MCP ecosystem is TypeScript-first. npx anvil --docs ./path with zero Python dependency is a meaningful DX gap.

3. CSDLC-native intelligence (v2+). Existing tools are generic document search — "find stuff in my files." We're building toward project-aware context retrieval:

  • Understanding design doc → epic → story hierarchy
  • Serving different context depth for different agent roles (architecture overview vs. implementation detail)
  • Knowing that a sub-agent on E2 needs E1 context but not E3
  • Integration with standup rituals, cross-cutting concerns, session bootstrapping

These features are only possible because we own the tool and built it for our workflow.

4. Heading-based chunking. Most tools use fixed token windows or paragraph splitting. Our heading-hierarchy chunking with breadcrumb metadata is a genuine quality differentiator for structured documentation.

5. Self-managing server. File watching + staleness checks + auto-reindexing. Most tools require manual index triggers or separate rebuild steps.

Our Position

We're not inventing the mousetrap — we're building a better one, purpose-built for our workflow. The existing tools prove market demand. Our differentiation is DX (TypeScript, zero-config, npx), quality (heading-based chunking), and vision (CSDLC-aware documentation intelligence layer, not generic file search).

If the open-source community gets value from it, that's a bonus. But the primary consumer is us.


Tech Stack

LayerTechnologyRationale
RuntimeNode.js (TypeScript)Single language for the entire product. MCP ecosystem is TS-first.
Markdown Parsingunified / remarkMature markdown AST parser — heading extraction, structure awareness
ChunkingCustom (TypeScript)Heading-hierarchy-aware splitting — no off-the-shelf chunker does this well
Embeddings (default)all-MiniLM-L6-v2 via @huggingface/transformers (ONNX)Zero API keys required. Local ONNX model, ~80MB, runs anywhere Node.js runs.
Embeddings (optional)OpenAI text-embedding-3-smallHigher quality, requires API key. Configurable upgrade path.
Vector DBsqlite-vss via better-sqlite3Zero infrastructure — SQLite extension. DB is a single file.
MCP Protocol@modelcontextprotocol/sdkStandard MCP, stdio transport for v1.
File Watchingchokidar (or Node.js fs.watch)Detects doc changes for auto-re-indexing

Key Libraries & Dependencies

LibraryPurposeNotes
@modelcontextprotocol/sdkMCP server implementationstdio transport, tool registration
@huggingface/transformersLocal ONNX embedding inferenceRuns all-MiniLM-L6-v2 without Python
better-sqlite3SQLite driver with extension loadingReads/writes sqlite-vss DB
sqlite-vssVector similarity searchNative SQLite extension
unified / remarkMarkdown parsingAST-level heading extraction and content splitting
chokidarFile system watchingTriggers re-indexing on doc changes

Architecture Decision: All TypeScript, Single Process

The entire product — file watching, chunking, embedding, vector storage, and MCP serving — runs in a single Node.js process. No Python dependency, no multi-process coordination, no shared file contracts.

Why this works now: @huggingface/transformers runs ONNX-optimized models directly in Node.js. The same all-MiniLM-L6-v2 model that previously required Python + PyTorch now runs natively in JavaScript with comparable performance. This eliminates the two-language split entirely.

Previous approach (rejected): Python mkdocs plugin for chunking/embedding + TypeScript MCP server. Rejected because: two languages, two install steps, shared state via DB file was fragile, and decoupling from mkdocs made the plugin unnecessary.

Architecture Decision: Local-First Embeddings

The default embedding model runs locally — no API key, no network calls, no cost per query. This is a deliberate choice:

  • Zero friction adoption: npx anvil --docs ./path and you're done. No OpenAI account, no API key management, no billing surprises.
  • Good enough for docs: You're searching within a bounded corpus (your own docs), not the entire internet. The quality difference between MiniLM and OpenAI embeddings matters less when the search space is small and well-structured.
  • Upgrade path exists: Users who want higher quality can switch to OpenAI (or Bedrock, or any provider) with one config line.

Architecture Decision: Self-Managing MCP Server

The MCP server is not a passive query layer — it owns the entire indexing pipeline. On startup, it indexes the docs directory. While running, it watches for file changes and re-indexes automatically. Agents always get fresh results without manual rebuilds or external tooling.

This means:

  • No separate build step (no mkdocs build dependency for the DB)
  • No CI pipeline needed for the vector DB
  • No git hooks or manual rebuild rituals
  • The MCP server IS the product — one process does everything

System Architecture

Architecture Diagram

graph TB
    subgraph "Anvil MCP Server (single Node.js process)"
        A[File Watcher] -->|detect changes| B[Chunker]
        B -->|heading-based splits| C[Embedder — local ONNX model]
        C -->|upsert| D[(sqlite-vss DB)]
        E[MCP Tool Handler] -->|staleness check| A
        E -->|vector search| D
    end

    F[AI Agent] -->|MCP protocol / stdio| E
    G[Docs Directory] --> A

    subgraph "Existing (unchanged, optional)"
        H[mkdocs build] -->|static HTML| I[GitHub Pages]
        G --> H
    end

Layer Descriptions

File Watcher Monitors the docs directory for changes (new files, edits, deletions). On startup, performs a full scan to detect any changes since the last index. While running, uses filesystem events for near-instant change detection.

Chunker Parses markdown into semantically meaningful chunks based on heading hierarchy using remark (markdown AST parser). Each chunk carries metadata: source file path, heading breadcrumb (e.g., Architecture > Data Flow > Event System), heading level, position in document, and last-modified timestamp. Chunks are the atomic unit — one chunk = one retrievable piece of context.

Embedder Takes chunks and generates vector embeddings. Default: all-MiniLM-L6-v2 via ONNX runtime (local, no API key, ~80MB model). Abstracted behind an interface — OpenAI, Bedrock, or any provider can be swapped in via config.

Vector DB sqlite-vss database file. Contains the chunks table (text, metadata, embedding vector) and the VSS index. Fully managed by the MCP server — no external process reads or writes it.

MCP Tool Handler Receives agent queries via MCP protocol (stdio transport). Before each query, checks if the docs source has changed. If stale, triggers a targeted re-index (only changed files) before returning results. Agents always get fresh data without knowing or caring about the indexing layer.

Data Flow

Startup: First-Run Indexing

User runs: npx anvil --docs ./docs/
    → MCP server starts
    → Scans docs directory — no existing DB or DB is empty
    → Chunks ALL files by heading hierarchy
    → Generates embeddings for all chunks (local ONNX model)
    → Writes sqlite-vss DB
    → MCP server is ready for queries
    → ~30-60 seconds for a large corpus (1,000 chunks), near-instant for small projects

Steady State: Incremental Re-Indexing

Author edits docs/architecture.md
    → File watcher detects the change (or staleness check on next query)
    → Re-chunks only the changed file
    → Compares chunk content_hash against existing DB entries
    → Only re-embeds chunks whose content actually changed
    → sqlite-vss DB updated via upsert (changed chunks updated, deleted chunks pruned)
    → Next agent query gets fresh results
    → ~200-400ms for a typical single-page edit

Agent Query

Agent needs context about "data flow"
    → Agent calls MCP tool: search_docs("data flow architecture")
    → MCP server checks for doc changes (fast hash/mtime check, ~5ms)
    → If stale: triggers incremental re-index first (~200-400ms one time)
    → Runs vector similarity search against sqlite-vss
    → Returns top-k chunks with metadata and relevance scores
    → Agent gets exactly the context it needs (~500-1000 tokens)
    → vs. old way: human pastes entire doc (~15,000-20,000 tokens)

Latency Profile

ScenarioAdded LatencyFrequency
Normal query (DB is fresh)~5ms (staleness check only)95%+ of queries
Small edit (1 page, ~5 chunks)~200-400ms (re-index + query)Occasional, once per edit
New doc added (~10 chunks)~400-800msRare
Major restructure (50+ chunks)~2-4 secondsVery rare
First-ever full index (1,000 chunks)~30-60 secondsOnce, on first startup

Data Model

Core Entities

@dataclass
class Chunk:
    """The atomic unit of retrievable documentation."""
    chunk_id: str          # Deterministic hash of file_path + heading_path
    file_path: str         # Relative path within docs/ (e.g., "architecture/data-flow.md")
    heading_path: str      # Breadcrumb (e.g., "Architecture > Data Flow > Event System")
    heading_level: int     # 1-6 (h1-h6)
    content: str           # Raw markdown text of this section
    content_hash: str      # Hash of content — used for diff-based re-embedding
    embedding: list[float] # Vector embedding (384 dims for MiniLM, 1536 for OpenAI)
    nav_path: str          # mkdocs nav position (e.g., "Getting Started > Installation")
    last_modified: str     # ISO timestamp of source file last modification
    char_count: int        # Length of content — useful for token estimation
    
@dataclass
class ChunkMetadata:
    """Returned to agents alongside search results."""
    file_path: str
    heading_path: str
    nav_path: str
    last_modified: str
    relevance_score: float  # Cosine similarity from vector search

SQLite Schema

CREATE TABLE chunks (
    chunk_id TEXT PRIMARY KEY,
    file_path TEXT NOT NULL,
    heading_path TEXT NOT NULL,
    heading_level INTEGER NOT NULL,
    content TEXT NOT NULL,
    content_hash TEXT NOT NULL,
    nav_path TEXT,
    last_modified TEXT,
    char_count INTEGER
);

-- Metadata table for DB self-description
CREATE TABLE anvil_meta (
    key TEXT PRIMARY KEY,
    value TEXT
);
-- Stores: embedding_model, embedding_dimensions, last_index_timestamp, anvil_version, docs_root_path

-- sqlite-vss virtual table for vector search
CREATE VIRTUAL TABLE chunks_vss USING vss0(
    embedding(384)  -- dimension matches embedding model (384 for MiniLM, 1536 for OpenAI)
);

Chunking Strategy

Split on headings, not token count. Most RAG systems chunk by fixed token windows (500 tokens, 1000 tokens). This is wrong for documentation because it splits mid-section, losing semantic coherence. Anvil chunks at heading boundaries — each section under a heading becomes one chunk.

Problem: Some sections are very long (2000+ tokens). Solution: If a chunk exceeds a configurable max (default: 1500 tokens), split at paragraph boundaries within that section. The heading breadcrumb is preserved on all sub-chunks, with a part indicator (e.g., Architecture > Data Flow [part 2/3]).

Problem: Some sections are very short (a single sentence under an h4). Solution: Optionally merge short chunks upward into their parent heading's chunk. Configurable — some users want granular, some want consolidated.


MCP Tool Surface

v1 Tools (MVP)

ToolDescriptionParametersReturns
search_docsSemantic search across all documentationquery: string, top_k?: number (default 5), file_filter?: string (glob pattern)Array of { content, metadata, score }
get_pageRetrieve full page content by file pathfile_path: string{ content, metadata, chunks[] }
get_sectionRetrieve a specific section by heading pathfile_path: string, heading_path: string{ content, metadata }
list_pagesList all pages with nav structureprefix?: string (filter by path prefix)Array of { file_path, nav_path, title, chunk_count }
get_statusServer health, index state, and version info(none){ docs_root, total_pages, total_chunks, embedding_model, last_indexed, db_size_bytes, git_info }

Explicitly Out of Scope: Write Tools

v1 is read-only. There is no create_page or update_section MCP tool. Rationale:

  • Write tools would require the MCP server to have filesystem access to the markdown source — breaking the clean read-only security model
  • Agents can already write markdown files through their normal filesystem access (they don't need MCP for that)
  • The MCP server's job is retrieval, not authoring
  • A write layer would effectively be a docs CMS — a much larger product scope

If write tools are added in the future, they would write to markdown source and trigger a rebuild, not directly modify the vector DB.

v2 Tools (Future)

ToolDescriptionNotes
get_relatedFind pages related to a given pageBased on embedding similarity between page-level vectors
get_changelogWhat changed since a given dateGit-backed, shows which sections were modified
search_by_tagFilter by frontmatter tags/categoriesRequires metadata extraction from frontmatter
query_metadataSearch/filter by structured frontmatter fieldsEnables QuoteAI-style use cases: "quotes over $3000 from 2024". Moves Anvil toward structured document query engine.

Tool Design Principles

  • search_docs is the workhorse. 80% of agent queries will use this. It must be fast and relevant.
  • get_page is the fallback. When an agent knows exactly what file it needs, don't make it search.
  • get_section is surgical precision. When an agent knows exactly what heading it wants.
  • list_pages is discovery. An agent can browse the doc structure before querying.

llms.txt & llms-full.txt

What Are These?

llms.txt is an emerging convention for making website content accessible to LLMs. Anvil generates two files:

  • llms.txt — A structured site map listing every page with its title, path, and a one-line description. Think robots.txt but for LLMs. An agent reads this to understand what documentation exists and where it lives.
  • llms-full.txt — The full content of every page concatenated into one text file, with page boundaries marked by headers.

Why Include This?

Not all LLM clients support MCP. Some can only read files or URLs. llms.txt gives them something — basic access to your docs without semantic search.

The Limitation (Why MCP Is Better)

llms-full.txt for a medium docs site might be 50,000+ tokens. An agent consuming it gets everything whether it needs it or not. That's expensive, noisy, and often exceeds context windows.

MCP search returns 500-1,000 targeted tokens per query. That's the difference — and why MCP is the primary interface, with llms.txt as the fallback.

MVP Scope Decision

Deferred to v2. llms.txt generation is not included in the v1 MVP. The core value prop is MCP semantic search — llms.txt serves a different audience (non-MCP clients) that we're not targeting yet. When we're ready to share Anvil with the broader community, llms.txt becomes a valuable adoption tool. For now, it's overhead.


Configuration

CLI Usage

# Zero-config start — all defaults, just point at your docs
npx anvil --docs ./docs/

# With options
npx anvil \
  --docs ./docs/ \
  --db ./anvil.db \
  --embedding-provider openai \
  --max-chunk-tokens 1500

Config File (optional)

For projects that want persistent config, create anvil.config.json in the docs root:

{
  "docs": "./docs",
  "db": "./anvil.db",
  "embedding": {
    "provider": "local",
    "model": "all-MiniLM-L6-v2"
  },
  "chunking": {
    "maxTokens": 1500,
    "mergeShort": true,
    "minTokens": 50
  },
  "mcp": {
    "transport": "stdio",
    "defaultTopK": 5
  }
}

Minimal Start (Zero Config)

npx anvil --docs ./docs/

That's it. All defaults apply: local ONNX embeddings, 1500 max tokens, DB written alongside the docs directory. No API key, no config file, no Python required.

MCP Client Config (for Cursor, Claude Desktop, OpenClaw, etc.)

{
  "mcpServers": {
    "anvil": {
      "command": "npx",
      "args": ["anvil", "--docs", "./path/to/docs"],
      "transport": "stdio"
    }
  }
}

Deployment & Access Model

How It Works: Self-Managing MCP Server (v1)

The v1 deployment model is local, self-managing, zero-config. The MCP server runs on the same machine as your agents, watches your docs directory, and handles all indexing automatically. There is no separate build step, no CI pipeline for the DB, and no manual sync.

graph LR
    subgraph "Your Machine"
        A[Docs Directory] --> B[Anvil MCP Server]
        B --> C[(sqlite-vss DB)]
        B --> D[AI Agent]
    end

    subgraph "Existing (unchanged, independent)"
        A --> E[mkdocs build]
        E --> F[GitHub Pages]
    end

The MCP server and mkdocs are completely independent. mkdocs builds the human-facing site. Anvil indexes the same source files for agents. Neither depends on the other. You can use Anvil without mkdocs, or mkdocs without Anvil.

Keeping the DB Fresh — It's Automatic

The MCP server owns the entire indexing pipeline. Freshness is built in, not bolted on:

  1. File watcher detects changes to the docs directory in real-time
  2. Staleness check on every query — even if the watcher misses something, the server verifies freshness before returning results
  3. Incremental re-indexing — only changed files are re-chunked and re-embedded. Typical latency: 200-400ms, transparent to the agent.

You never manually rebuild. Edit a doc, save it, the next agent query gets fresh results. The MCP server handles everything.

Team Workflows

ScenarioHow It Works
Solo devMCP server watches your local docs directory. Edit and save — done.
Team, shared repoEach team member runs their own MCP server pointed at their local clone. git pull updates the source files; MCP server detects changes and re-indexes automatically.
Hosted (v2)Cloud-hosted MCP server with SSE transport. DB syncs on push. No local setup needed.

"Do I need to pull main when docs change?" Yes — the MCP server watches local files, so you need git pull to get remote changes onto your machine. But you do NOT need to rebuild anything after pulling. The MCP server detects the file changes and re-indexes automatically.

Branch Handling

v1: The MCP server indexes whatever files are on disk. If you're on main, it indexes main's docs. If you switch to a feature branch, it detects the file changes and re-indexes. No manual intervention.

v2 (future): Versioned DBs — maintain separate vector stores per branch/tag. MCP server accepts a version parameter. Enables agents to query docs from a specific release or compare branches.

Infrastructure Dependencies

DependencyRequired?What Breaks Without It
Node.js (v18+)YesRuntime for the MCP server
sqlite-vss native extensionYesVector similarity search. Ships as a pre-built binary via npm for most platforms.
@huggingface/transformersYes (default embeddings)Local ONNX model inference. ~80MB model download on first run, cached after.
OpenAI API keyOnly if embedding_provider: openaiOptional upgrade for higher-quality embeddings.
Python / mkdocsNoAnvil is fully independent. mkdocs is only needed if you want the human-facing site.

Security Model

API Key Management

  • Default config requires zero API keys. Local ONNX embeddings model runs entirely offline.
  • If a cloud provider is configured (OpenAI, Bedrock), API keys are referenced by environment variable name — never stored in config files.

Data Sensitivity

  • The vector DB contains your documentation content in plain text (it has to — that's what gets returned to agents).
  • If your docs are private/internal, the DB file should be treated with the same access controls as the docs themselves.
  • Don't commit the DB to a public repo if your docs are private.

Public vs. Private Documentation

Anvil doesn't manage access control — it inherits whatever access model your docs repo uses. Some considerations:

  • Public repo, public docs: No concerns. The DB contains the same content that's already public.
  • Private repo, private docs: DB should also be private. Don't publish it as a public artifact.
  • Mixed (some public, some private): Not supported in v1. Either all docs are indexed or none. v2 could support per-page or per-directory inclusion/exclusion via config.
Splitting Public and Private

If you want some docs public (e.g., methodology, process) and others private (e.g., project implementations), the recommended approach is separate repos — a public repo for shareable content and a private repo for project-specific docs. Each gets its own Anvil instance and vector DB.

Trust Boundaries

  • The MCP server is read-only. It cannot modify the DB, the docs, or anything else.
  • stdio transport means the MCP server only communicates with its parent process (the agent/client). No network exposure.
  • Future SSE transport would require authentication — deferred to v2.

Cross-Cutting Concerns

ConcernSummaryAffected Areas
Embedding model portabilityDB stores model name + dimensions in metadata. Switching models requires full re-embed (detected automatically).Embedder, DB schema
Chunk qualityBad chunks = bad retrieval. This is the single biggest quality lever.Chunker, all MCP tools
sqlite-vss portabilityC extension — needs pre-built binaries for target platform. May be friction on exotic systems.Installation, npm packaging
First-run model downloadDefault local model is ~80MB. First startup takes longer (download + cache + full index). Subsequent startups are near-instant.DX, offline scenarios
DB freshnessHandled automatically by file watcher + staleness checks. No manual intervention needed.MCP server core loop

Risks & Constraints

Technical Risks

RiskLikelihoodImpactMitigation
sqlite-vss installation friction (native extension)MediumHigh (blocks adoption)Ship pre-built binaries via npm for major platforms (macOS, Linux, Windows). Document manual build fallback.
ONNX model inference performance on low-end hardwareLowMediumMiniLM is lightweight (~80MB). If too slow, offer OpenAI fallback. Benchmark on CI runners.
Embedding model changes break existing DBsLowMediumStore model name + dimensions in DB metadata. Detect mismatch and prompt full re-embed.
Chunk quality is poor for unusual doc structuresMediumHigh (core value prop)Extensive testing with real-world docs (our CSDLC docs, QuoteAI). Configurable chunking params.
File watcher misses changes (OS-level edge cases)LowLowStaleness check on every query as backup. File watcher is optimization, not sole mechanism.

Supported Formats & Hosting (v1)

v1 supports markdown files (.md) on your local filesystem. That's it.

Anvil watches a directory path. It doesn't care where the files came from — a git repo, Dropbox, a USB drive, or hand-typed on your desktop. If the directory contains .md files, Anvil indexes them. There is no "hosting" requirement and no git dependency (git info in get_status is optional/opportunistic).

What's NOT supported in v1:

  • Word docs (.docx), PDFs, RST, HTML, or any non-markdown format
  • Remote file sources (S3, Google Drive, URLs)
  • Archives or compressed files

v2 (E5: Format Adapters) will add pluggable format support. The architecture supports this — the chunker and embedder operate on text + structure, not on markdown specifically. A Word adapter would extract text/headings from .docx, a PDF adapter would extract from .pdf, etc. The pipeline downstream is format-agnostic; only the parser layer is markdown-specific in v1.

Known Limitations (v1)

  • Markdown only — no Word docs, PDFs, or other formats (v2 adds format adapters via E5)
  • Local MCP only — stdio transport, no remote/hosted access
  • Read-only — no write tools, agents can't create/update docs via MCP
  • No branch awareness — DB reflects whatever files are on disk, no version switching
  • No per-file access control — all files in the docs directory are indexed, or none
  • English-optimized — embedding models work best with English. Multilingual is possible but untested.

Scaling Characteristics

Anvil v1 is designed for project-scale documentation — tens to hundreds of files. Here's where the architecture hits walls at larger scales:

ScaleFilesChunksFirst IndexRe-index (1 file)QueryStatus
Small (personal project)10-50100-500~5-15s~200ms~10ms✅ Sweet spot
Medium (team docs)50-500500-5,000~30-90s~200ms~15ms✅ Comfortable
Large (enterprise docs)500-5,0005,000-50,000~5-15 min~300ms~50ms⚠️ First-run is slow, queries still fast
Massive (100K+ files)100,000+1M+Hours~500ms~200ms+🔴 Needs architectural changes

Where it breaks at 100K+ files:

  1. First-run indexing — embedding 1M+ chunks with local ONNX on a single machine takes hours. This is a serial CPU bottleneck.
  2. sqlite-vss search quality — vector search accuracy degrades at high dimensionality/volume. May need approximate nearest neighbor (ANN) indices or a purpose-built vector DB (pgvector, Pinecone).
  3. Memory — loading 1M embeddings (384 dims × 4 bytes × 1M = ~1.5GB) strains a single process.
  4. File watcher — chokidar watching 100K files generates significant OS-level overhead. Some OSes have inotify/FSEvents limits.
  5. Staleness check — mtime scan of 100K files on every query is no longer ~5ms.

v2 mitigations (not v1 scope):

  • Batch/parallel embedding (worker threads or external GPU inference)
  • Sharded sqlite DBs or migration to a scalable vector store
  • Smarter file watching (git-based change detection instead of filesystem events)
  • Incremental staleness checks (bloom filter or checksum file instead of full scan)
  • Cloud-hosted mode (E6) where indexing happens server-side with better hardware

Bottom line: v1 is honest about its scale. It's a local, single-process tool for project documentation. If you're indexing a Fortune 500 company's entire knowledge base, you need the hosted tier (v2+). That's actually a good monetization forcing function — free works great for small/medium, paid kicks in when you need scale.

Tech Debt (Accepted for MVP)

  • No caching layer for query results (fine for local, problem at scale)
  • Only two embedding providers implemented (local ONNX + OpenAI). Interface exists for more.
  • Markdown-only format support. Format adapter interface not yet abstracted.

Pre-Build Checklist

TaskOwnerStatusNotes
Create @claymore npm orgDanNot startedRequired before first publish. npx @claymore/anvil depends on this.
Create standalone public repoDanNot startedGitHub repo for Anvil source code

Epic Index

EpicDocStatusSummary
E1: Core Server, Chunker & EmbedderlinkNot startedMCP server skeleton, markdown chunker, ONNX embedder, sqlite-vss storage, file watcher, auto-reindexing
E2: MCP Tools & Query LayerlinkNot started4 MCP tools (search_docs, get_page, get_section, list_pages), relevance scoring, metadata enrichment
E3: Developer ExperiencelinkNot startedCLI interface, config file support, status output, error messages, README, quickstart guide

Deferred to v2

EpicSummary
E4: llms.txt GenerationAuto-generate llms.txt and llms-full.txt for non-MCP clients
E5: Format AdaptersSupport for Word docs, PDFs, RST, and other non-markdown formats
E6: Cloud Sync & Hosted MCPRemote DB hosting, SSE transport, team sharing

Dependency Graph

graph LR
    E1[E1: Core Server, Chunker & Embedder] --> E2[E2: MCP Tools & Query Layer]
    E2 --> E3[E3: Developer Experience]
  • E1 → E2 is serial (need the indexing pipeline before you can query it)
  • E2 → E3 is serial (need working tools before you polish the DX)
  • Clean serial pipeline — 3 epics for MVP

The simplification: Since the MCP server now owns everything (no separate plugin, no separate build step), E1 and E2 from the old architecture merge conceptually. The chunker, embedder, and DB writer are all internal to the server. E2 is the query/tool layer on top.


Decisions Log

DateDecisionRationaleAlternatives Considered
2026-03-28mkdocs plugin, not forkMaintain less code, leverage existing ecosystem, easier adoptionFull fork (rejected: maintenance burden), standalone tool (rejected: misses mkdocs integration)
2026-03-28sqlite-vss for vector DBZero infrastructure, file-based, portableChromaDB (heavier, requires server), Pinecone (cloud-only, paid), pgvector (requires Postgres)
2026-03-28Two languages (Python + TypeScript)Play to each ecosystem's strengthAll-Python (immature MCP SDK), All-TypeScript (fighting mkdocs from outside)
2026-03-28Open-source core, MIT licenseMaximize adoption, most recognized license in JS/TS ecosystem, simplest terms. Monetize hosted/enterprise tiers later.BSD-3 (rejected: less common in JS ecosystem, "no endorsement" clause adds no value), AGPL (rejected: scares enterprise), Apache 2.0 (rejected: patent grants overkill for doc indexing CLI)
2026-03-28Heading-based chunking, not token-windowPreserves semantic coherence of doc sectionsFixed token windows (rejected: splits mid-section), page-level (rejected: too coarse)
2026-03-28Diff-based re-embeddingOnly re-embed changed content — saves cost and timeFull re-embed on every build (rejected: wasteful for large sites)
2026-03-28Docs-first, code support deferred to v2Docs are structured for humans, easier to chunk well. Code needs AST-aware chunking — different problem.Code + docs in v1 (rejected: scope creep, different chunking strategies)
2026-03-28Develop alongside QuoteAIDogfooding from day one — QuoteAI docs are the first content, QuoteAI agents are the first consumersBuild in isolation (rejected: no feedback loop)
2026-03-28Local embedding model as defaultZero API keys, zero cost, runs in CI, good enough for bounded-corpus retrievalOpenAI as default (rejected: friction — requires account, API key, costs money)
2026-03-28Read-only MCP server, no write toolsClean security model, agents already have filesystem access for writingRead-write MCP (rejected: scope creep, different product — a docs CMS)
2026-03-28Local-first deployment for v1Simplest model — build locally, DB is a file, MCP server reads itCloud-hosted DB (rejected for v1: premature infrastructure)
2026-03-28v1 has no branch awarenessDB reflects whatever branch is built. Versioned DBs deferred to v2.Multi-branch support (rejected: complexity not justified for solo/small team use)
2026-03-28Defer llms.txt to v2v1 targets MCP-capable clients (OpenClaw, Cursor, Claude Desktop). llms.txt serves non-MCP clients we're not targeting yet.Include in v1 (rejected: extra code path, no immediate consumer)
2026-03-28No MCP write toolsAgents already have filesystem access for writing docs. MCP write adds indirection without capability. Read-only is cleaner.Read-write MCP (rejected: moot for local agents, scope creep toward docs CMS)
2026-03-28Self-managing MCP server (auto-index, file watcher)Eliminates manual rebuild step entirely. File watcher + staleness check on query = always-fresh DB.Manual rebuild (rejected: humans forget), git hooks (rejected: unnecessary if server handles it), CI-built DB (rejected: adds sync complexity)
2026-03-28Decouple from mkdocs entirelyMCP server reads markdown files directly — no mkdocs dependency. Works with any docs directory. Massively expands addressable market.mkdocs plugin (rejected: couples to one ecosystem, limits to Python users, requires separate MCP server process)
2026-03-28All-TypeScript, single processONNX runtime in Node.js eliminates Python dependency. One language, one install, one process.Python + TypeScript split (rejected: two languages, two installs, shared state via file was fragile)
2026-03-28Build despite existing toolsSupply chain ownership (CSDLC depends on this), TypeScript gap in market, CSDLC-native features planned for v2+, heading-based chunking differentiatorUse markdown-vault-mcp (rejected: Python, generic, no CSDLC awareness, supply chain risk), contribute upstream (rejected: different vision, different language)

Glossary

TermDefinition
ChunkA semantically meaningful section of documentation, typically defined by heading boundaries. The atomic unit of retrieval.
EmbeddingA vector representation of text that captures semantic meaning. Enables similarity search — "find docs about X" without keyword matching.
Vector DBA database optimized for storing and searching vector embeddings. sqlite-vss is the local/file-based option.
MCPModel Context Protocol — a standard for LLM clients to connect to external tools and data sources.
llms.txtAn emerging convention for making website content LLM-accessible. A site map with descriptions (llms.txt) and full content dump (llms-full.txt).
RAGRetrieval-Augmented Generation — the pattern of searching for relevant context before generating a response. Anvil enables RAG over documentation.
Diff-based re-embeddingOnly regenerating embeddings for chunks whose content has changed since the last build. Saves API cost and build time.
stdio transportMCP communication over standard input/output pipes. The simplest transport — agent spawns the MCP server as a child process.
Cold start problemAI agents start every session with zero context. Without documentation access, they either get context copy-pasted (expensive) or guess (unreliable).

This design doc is the source of truth for Anvil project architecture. Epic-level details will live in epics/. Update this doc when architecture changes.

Review

🔒

Enter your access token to view annotations