Session Log — 2026-04-10 Foundry Bug Blitz + Agent QA

AI Lead: Claude Code session in main Foundry workspace. Human: Dan. Duration: single evening session. Outcome: 9 GitHub issues closed, 4 new fixes from agent QA, 1 deferred issue filed.

TL;DR

Started as a triage/execute pass on 5 open Foundry bugs (#98, #115, #116, #117, #119) and 2 freshly-filed ones (#121, #122). Ended up running a novel agent-based QA loop — hand off a test prompt to a sub-agent with all the MCP tools, have it exercise every tool in realistic workflows, and report friction points. Round 1 found 10 friction points. Round 2 (after fixes) found 7, with 3 of those being false positives or by-design behavior. Net: ~6 real UX fixes shipped beyond the original bug list, plus consistent file-based reads for get_section, plus new automated tests for every feature added this session.

What Shipped

All commits are on main and pushed to danhannah94/foundry.

Original bug batch (Step 1 execution)

Commit	Issue	Description
`016a362`	#98	`reopen_annotation` now sets status to `draft` instead of `submitted` — reopened annotations land in the editable working set
`1a16dbb`	#117	`create_doc` accepts optional `content` parameter — single-call doc creation with full markdown body
`30dc31e`	#119	`insert_section` uses `subtreeEnd` for insertion point — sequential inserts now produce FIFO order instead of LIFO stack semantics
`9ea36d0`	#115	`findSection()` resolves unambiguous short-form heading paths — `### Tech Stack` works when full path would be `# Title > ## Architecture > ### Tech Stack`
`142e912`	#116	New `move_section` MCP tool — atomically moves a section and all descendants to a new position; guards against self-moves and descendant-of-self moves
`068e0a6`	#122	`update_section` now replaces the entire subtree (uses `subtreeEnd` instead of `bodyEnd`) — updating an H1 replaces everything until the next H1. Also added 14 new vitest tests covering: subtree replacement, content param, FIFO ordering, move_section, short-form resolution
—	#121	Closed as already-fixed — `delete_doc` was added in E11 batch

Agent QA follow-up fixes

After Round 1 QA found 10 friction points, the following fixes landed:

Commit	Friction Point	Description
`3e9a96d`	FP-2	`get_section` routed through `section-parser.ts` (file-based) for consistent `#`-prefixed heading format across all section tools. Old Anvil-backed route removed from `docs.ts`
`71b4d1e`	FP-9, FP-10	`create_annotation` validates the referenced doc exists (no more orphaned annotations on typo paths); `delete_section` blocks H1 deletion with helpful error pointing to `delete_doc`; stale tool descriptions updated
`00fe0d0`	FP-8	`search_docs` filters results below 0.5 relevance score — nonsense queries no longer return 10 plausibly-scored garbage results
`2ebe200`	FP-1, FP-7	Critical routing fix: `createDocCrudRouter()` now mounts BEFORE `createDocsRouter()` in `index.ts`. The greedy `GET /docs/:path(*)` wildcard in docs.ts was swallowing `GET /docs/:path/sections/:heading` requests before they reached doc-crud — caused `get_section` to always return 404 "Document not found". Also simplified search "no results" warning copy
`2c18724`	FP-6	`reopen_annotation` tool description now explicitly says "returns to draft status for editing before re-submission" — agent QA flagged the old one-line description as insufficient context

Test coverage added

doc-crud.test.ts gained 9 new test blocks (14 tests total):

update_section subtree replacement — parent + H1 cases
create_doc content parameter — custom content + template fallback
insert_section FIFO ordering verification
move_section — basic move, move-with-descendants, self-move rejection, missing-source 404

section-parser.test.ts gained findSection — short-form resolution block with 5 tests.

Total: 41 tests, all passing. TypeScript compiles clean.

Agent QA Methodology

Novel pattern introduced this session. Worth documenting in methodology/ later.

The idea

Hand off a structured prompt to a sub-agent with MCP tool access. The prompt tells it to work through realistic scenarios (create → edit → move → delete → annotate → review → cleanup) using every tool in the surface. Crucial rule: the agent must try the intuitive approach first, without reading source code or documentation. Every failure gets logged as a "friction point" with: what was tried, what happened, what was expected, severity, and suggested improvement.

This catches a class of bugs that unit tests never touch — UX friction, confusing error messages, stale documentation, parameter names that don't match mental models, missing affordances.

What it found that unit tests wouldn't have

FP-1 (get_section routing) — Unit tests mount the doc-crud router in isolation, so they never hit the docs.ts wildcard precedence bug. The agent found it on the first call.
FP-2 (heading format inconsistency) — Unit tests pass canonical paths by convention. An agent trying the intuitive path discovered write tools and read tools disagreed about the format.
FP-8 (search noise) — The threshold was never tested because nonsense queries weren't in the test fixtures. The agent searched for "xyzzy plugh zorkmid" and got 10 plausibly-scored results.
Tool descriptions that had gone stale — agent called out delete_section's "use update_section with empty content" advice which was no longer accurate after the subtreeEnd change.

The loop

Round 1: Agent tests against deployed instance → 10 friction points
Fix the real ones (4 already resolved by earlier session commits, 4 new fixes)
Rebuild, redeploy
Round 2: Same prompt, same instance → 7 friction points
Analyze: 3 were real regressions / edge cases, 3 were false positives (stale cache on fly.dev), 1 was by-design but poorly documented

The round-to-round comparison is the valuable part — you can directly measure whether your fixes reduced friction.

Prompt template

Lives at packages/api/src/mcp/__tests__/agent-qa-prompt.md in the Foundry repo. The scenarios cover: document lifecycle, section CRUD, content replacement, annotations + reviews, search, edge cases, cleanup. The report format is structured (Friction Report → Friction Points → Tool Coverage Matrix → Positive Notes) so rounds are comparable.

Cost

Each round is ~60-80 tool calls and ~50K tokens in the sub-agent. Cheap. Worth running before every deploy that touches MCP tools.

Remaining Work

Filed as #123: get_page section structure via Anvil chunking

Covers both agent QA findings FP-3 (inconsistent section structure) and FP-4 (create_doc content order scrambled on read). Same root cause: get_page materializes sections from Anvil chunks, which can aggregate siblings into content blobs and reorder based on chunk ingestion order.

Suggested fix (in the issue): add a file-based get_page route in doc-crud.ts that reads the markdown directly and uses parseSections() — same pattern already applied to get_section in commit 3e9a96d. Anvil stays as the semantic search backend; structural reads go through the file.

This is a bigger change because the current response includes lastModified and title metadata from Anvil, plus the frontend may depend on the Anvil-aggregated chunks for certain views. Needs its own refinement pass to scope the rewrite without breaking consumers.

Not filed

Tool description staleness generally — delete_section description was hand-updated this session. No systemic check catches drift between tool descriptions and actual behavior. Could be a future lint / test: parse the tool's JSDoc on the handler, diff against the MCP description field, fail CI on mismatch. Low priority but would have saved a round of QA friction this session.

Deferred from NEXT.md (not touched this session)

PAT rotation reminder (still unrotated — mentioned in status/next.md)
Corrupted projects/foundry-cc-bridge/design page cleanup
foundry-cc-bridge design doc refinement (the original session target)
Foundry-as-chat shared queue mini-design

Lessons Learned

Agent QA is high-leverage

The first round took ~8 minutes of agent work and surfaced issues that would have taken days to hit organically (the routing precedence bug on get_section would have bitten the first real user on read). The structured report format made it trivial to triage into "fix now / defer / already fixed" buckets. Recommend adding this to the Foundry release checklist for any MCP tool changes.

Express route precedence is a trap for wildcard routes

The GET /docs/:path(*) wildcard in docs.ts silently swallowed GET /docs/:path/sections/:heading because of mount order. This bug was invisible to unit tests (they mount routers in isolation) and invisible to the tool description (which documented the correct API). Only a full-stack integration test (agent QA or similar) catches it.

General rule: any time you see a :path(*) wildcard route, mount specific routes that share that prefix BEFORE the wildcard. Add a comment near the wildcard explaining the constraint.

Worktree cleanup inconsistency

Sub-agents launched with isolation: "worktree" sometimes committed to loose branches (cleaned up on success) and sometimes to their own branches that needed rebasing against main. The #117 fix required a rebase because it was based on old main while #98 and #119 committed directly. Not a blocker but worth documenting in the agent orchestration docs — prefer fast-forward-able branches and check parent commits before merging.

File-based reads > Anvil-backed reads for structural operations

The get_section fix (3e9a96d) established a principle: if you need the exact document structure, read the file directly and parse it. Anvil is for semantic search; it chunks and embeds, which is lossy for structure. Issue #123 applies the same principle to get_page.

Consistent error patterns are best-in-class

Both QA rounds praised the available_headings in 404 errors as a standout UX pattern. When a tool can't find what you asked for, telling you what exists is dramatically better than a bare "not found." Apply this pattern everywhere you have a discoverable namespace (section addresses, annotation IDs, review IDs, doc paths). The delete_doc response's annotations_deleted count is the same idea — tell the caller what cascaded so they understand the side effects.

One-shot doc creation eliminates a huge source of friction

The content parameter on create_doc (#117) turned 8 MCP calls into 1 for a typical NEXT.md-style document. The old pattern (create + N inserts) was also where most of the "duplicate heading" and "section ordering" bugs originated. Single-call creation sidesteps all of that. Consider the same pattern elsewhere: a replace_doc_content(path, content) tool would give you atomic full-document rewrites without the section-by-section addressing dance.

Session Log — 2026-04-10 Foundry Bug Blitz + Agent QA#

TL;DR#

What Shipped#

Original bug batch (Step 1 execution)#

Agent QA follow-up fixes#

Test coverage added#

Agent QA Methodology#

The idea#

What it found that unit tests wouldn't have#

The loop#

Prompt template#

Cost#

Remaining Work#

Filed as #123: get_page section structure via Anvil chunking#

Not filed#

Deferred from NEXT.md (not touched this session)#

Lessons Learned#

Agent QA is high-leverage#

Express route precedence is a trap for wildcard routes#

Worktree cleanup inconsistency#

File-based reads > Anvil-backed reads for structural operations#

Consistent error patterns are best-in-class#

One-shot doc creation eliminates a huge source of friction#

Review