Session Log — 2026-04-10 Foundry Bug Blitz + Agent QA
AI Lead: Claude Code session in main Foundry workspace. Human: Dan. Duration: single evening session. Outcome: 9 GitHub issues closed, 4 new fixes from agent QA, 1 deferred issue filed.
TL;DR
Started as a triage/execute pass on 5 open Foundry bugs (#98, #115, #116, #117, #119) and 2 freshly-filed ones (#121, #122). Ended up running a novel agent-based QA loop — hand off a test prompt to a sub-agent with all the MCP tools, have it exercise every tool in realistic workflows, and report friction points. Round 1 found 10 friction points. Round 2 (after fixes) found 7, with 3 of those being false positives or by-design behavior. Net: ~6 real UX fixes shipped beyond the original bug list, plus consistent file-based reads for get_section, plus new automated tests for every feature added this session.
What Shipped
All commits are on main and pushed to danhannah94/foundry.
Original bug batch (Step 1 execution)
| Commit | Issue | Description |
|---|---|---|
016a362 | #98 | reopen_annotation now sets status to draft instead of submitted — reopened annotations land in the editable working set |
1a16dbb | #117 | create_doc accepts optional content parameter — single-call doc creation with full markdown body |
30dc31e | #119 | insert_section uses subtreeEnd for insertion point — sequential inserts now produce FIFO order instead of LIFO stack semantics |
9ea36d0 | #115 | findSection() resolves unambiguous short-form heading paths — ### Tech Stack works when full path would be # Title > ## Architecture > ### Tech Stack |
142e912 | #116 | New move_section MCP tool — atomically moves a section and all descendants to a new position; guards against self-moves and descendant-of-self moves |
068e0a6 | #122 | update_section now replaces the entire subtree (uses subtreeEnd instead of bodyEnd) — updating an H1 replaces everything until the next H1. Also added 14 new vitest tests covering: subtree replacement, content param, FIFO ordering, move_section, short-form resolution |
| — | #121 | Closed as already-fixed — delete_doc was added in E11 batch |
Agent QA follow-up fixes
After Round 1 QA found 10 friction points, the following fixes landed:
| Commit | Friction Point | Description |
|---|---|---|
3e9a96d | FP-2 | get_section routed through section-parser.ts (file-based) for consistent #-prefixed heading format across all section tools. Old Anvil-backed route removed from docs.ts |
71b4d1e | FP-9, FP-10 | create_annotation validates the referenced doc exists (no more orphaned annotations on typo paths); delete_section blocks H1 deletion with helpful error pointing to delete_doc; stale tool descriptions updated |
00fe0d0 | FP-8 | search_docs filters results below 0.5 relevance score — nonsense queries no longer return 10 plausibly-scored garbage results |
2ebe200 | FP-1, FP-7 | Critical routing fix: createDocCrudRouter() now mounts BEFORE createDocsRouter() in index.ts. The greedy GET /docs/:path(*) wildcard in docs.ts was swallowing GET /docs/:path/sections/:heading requests before they reached doc-crud — caused get_section to always return 404 "Document not found". Also simplified search "no results" warning copy |
2c18724 | FP-6 | reopen_annotation tool description now explicitly says "returns to draft status for editing before re-submission" — agent QA flagged the old one-line description as insufficient context |
Test coverage added
doc-crud.test.ts gained 9 new test blocks (14 tests total):
update_sectionsubtree replacement — parent + H1 casescreate_doccontent parameter — custom content + template fallbackinsert_sectionFIFO ordering verificationmove_section— basic move, move-with-descendants, self-move rejection, missing-source 404
section-parser.test.ts gained findSection — short-form resolution block with 5 tests.
Total: 41 tests, all passing. TypeScript compiles clean.
Agent QA Methodology
Novel pattern introduced this session. Worth documenting in methodology/ later.
The idea
Hand off a structured prompt to a sub-agent with MCP tool access. The prompt tells it to work through realistic scenarios (create → edit → move → delete → annotate → review → cleanup) using every tool in the surface. Crucial rule: the agent must try the intuitive approach first, without reading source code or documentation. Every failure gets logged as a "friction point" with: what was tried, what happened, what was expected, severity, and suggested improvement.
This catches a class of bugs that unit tests never touch — UX friction, confusing error messages, stale documentation, parameter names that don't match mental models, missing affordances.
What it found that unit tests wouldn't have
- FP-1 (get_section routing) — Unit tests mount the doc-crud router in isolation, so they never hit the
docs.tswildcard precedence bug. The agent found it on the first call. - FP-2 (heading format inconsistency) — Unit tests pass canonical paths by convention. An agent trying the intuitive path discovered write tools and read tools disagreed about the format.
- FP-8 (search noise) — The threshold was never tested because nonsense queries weren't in the test fixtures. The agent searched for "xyzzy plugh zorkmid" and got 10 plausibly-scored results.
- Tool descriptions that had gone stale — agent called out
delete_section's "use update_section with empty content" advice which was no longer accurate after thesubtreeEndchange.
The loop
- Round 1: Agent tests against deployed instance → 10 friction points
- Fix the real ones (4 already resolved by earlier session commits, 4 new fixes)
- Rebuild, redeploy
- Round 2: Same prompt, same instance → 7 friction points
- Analyze: 3 were real regressions / edge cases, 3 were false positives (stale cache on fly.dev), 1 was by-design but poorly documented
The round-to-round comparison is the valuable part — you can directly measure whether your fixes reduced friction.
Prompt template
Lives at packages/api/src/mcp/__tests__/agent-qa-prompt.md in the Foundry repo. The scenarios cover: document lifecycle, section CRUD, content replacement, annotations + reviews, search, edge cases, cleanup. The report format is structured (Friction Report → Friction Points → Tool Coverage Matrix → Positive Notes) so rounds are comparable.
Cost
Each round is ~60-80 tool calls and ~50K tokens in the sub-agent. Cheap. Worth running before every deploy that touches MCP tools.
Remaining Work
Filed as #123: get_page section structure via Anvil chunking
Covers both agent QA findings FP-3 (inconsistent section structure) and FP-4 (create_doc content order scrambled on read). Same root cause: get_page materializes sections from Anvil chunks, which can aggregate siblings into content blobs and reorder based on chunk ingestion order.
Suggested fix (in the issue): add a file-based get_page route in doc-crud.ts that reads the markdown directly and uses parseSections() — same pattern already applied to get_section in commit 3e9a96d. Anvil stays as the semantic search backend; structural reads go through the file.
This is a bigger change because the current response includes lastModified and title metadata from Anvil, plus the frontend may depend on the Anvil-aggregated chunks for certain views. Needs its own refinement pass to scope the rewrite without breaking consumers.
Not filed
- Tool description staleness generally —
delete_sectiondescription was hand-updated this session. No systemic check catches drift between tool descriptions and actual behavior. Could be a future lint / test: parse the tool's JSDoc on the handler, diff against the MCPdescriptionfield, fail CI on mismatch. Low priority but would have saved a round of QA friction this session.
Deferred from NEXT.md (not touched this session)
- PAT rotation reminder (still unrotated — mentioned in
status/next.md) - Corrupted
projects/foundry-cc-bridge/designpage cleanup - foundry-cc-bridge design doc refinement (the original session target)
- Foundry-as-chat shared queue mini-design
Lessons Learned
Agent QA is high-leverage
The first round took ~8 minutes of agent work and surfaced issues that would have taken days to hit organically (the routing precedence bug on get_section would have bitten the first real user on read). The structured report format made it trivial to triage into "fix now / defer / already fixed" buckets. Recommend adding this to the Foundry release checklist for any MCP tool changes.
Express route precedence is a trap for wildcard routes
The GET /docs/:path(*) wildcard in docs.ts silently swallowed GET /docs/:path/sections/:heading because of mount order. This bug was invisible to unit tests (they mount routers in isolation) and invisible to the tool description (which documented the correct API). Only a full-stack integration test (agent QA or similar) catches it.
General rule: any time you see a :path(*) wildcard route, mount specific routes that share that prefix BEFORE the wildcard. Add a comment near the wildcard explaining the constraint.
Worktree cleanup inconsistency
Sub-agents launched with isolation: "worktree" sometimes committed to loose branches (cleaned up on success) and sometimes to their own branches that needed rebasing against main. The #117 fix required a rebase because it was based on old main while #98 and #119 committed directly. Not a blocker but worth documenting in the agent orchestration docs — prefer fast-forward-able branches and check parent commits before merging.
File-based reads > Anvil-backed reads for structural operations
The get_section fix (3e9a96d) established a principle: if you need the exact document structure, read the file directly and parse it. Anvil is for semantic search; it chunks and embeds, which is lossy for structure. Issue #123 applies the same principle to get_page.
Consistent error patterns are best-in-class
Both QA rounds praised the available_headings in 404 errors as a standout UX pattern. When a tool can't find what you asked for, telling you what exists is dramatically better than a bare "not found." Apply this pattern everywhere you have a discoverable namespace (section addresses, annotation IDs, review IDs, doc paths). The delete_doc response's annotations_deleted count is the same idea — tell the caller what cascaded so they understand the side effects.
One-shot doc creation eliminates a huge source of friction
The content parameter on create_doc (#117) turned 8 MCP calls into 1 for a typical NEXT.md-style document. The old pattern (create + N inserts) was also where most of the "duplicate heading" and "section ordering" bugs originated. Single-call creation sidesteps all of that. Consider the same pattern elsewhere: a replace_doc_content(path, content) tool would give you atomic full-document rewrites without the section-by-section addressing dance.