Foundry Foundry

Session Log — 2026-04-10 Foundry Bug Blitz + Agent QA

AI Lead: Claude Code session in main Foundry workspace. Human: Dan. Duration: single evening session. Outcome: 9 GitHub issues closed, 4 new fixes from agent QA, 1 deferred issue filed.


TL;DR

Started as a triage/execute pass on 5 open Foundry bugs (#98, #115, #116, #117, #119) and 2 freshly-filed ones (#121, #122). Ended up running a novel agent-based QA loop — hand off a test prompt to a sub-agent with all the MCP tools, have it exercise every tool in realistic workflows, and report friction points. Round 1 found 10 friction points. Round 2 (after fixes) found 7, with 3 of those being false positives or by-design behavior. Net: ~6 real UX fixes shipped beyond the original bug list, plus consistent file-based reads for get_section, plus new automated tests for every feature added this session.


What Shipped

All commits are on main and pushed to danhannah94/foundry.

Original bug batch (Step 1 execution)

CommitIssueDescription
016a362#98reopen_annotation now sets status to draft instead of submitted — reopened annotations land in the editable working set
1a16dbb#117create_doc accepts optional content parameter — single-call doc creation with full markdown body
30dc31e#119insert_section uses subtreeEnd for insertion point — sequential inserts now produce FIFO order instead of LIFO stack semantics
9ea36d0#115findSection() resolves unambiguous short-form heading paths — ### Tech Stack works when full path would be # Title > ## Architecture > ### Tech Stack
142e912#116New move_section MCP tool — atomically moves a section and all descendants to a new position; guards against self-moves and descendant-of-self moves
068e0a6#122update_section now replaces the entire subtree (uses subtreeEnd instead of bodyEnd) — updating an H1 replaces everything until the next H1. Also added 14 new vitest tests covering: subtree replacement, content param, FIFO ordering, move_section, short-form resolution
#121Closed as already-fixed — delete_doc was added in E11 batch

Agent QA follow-up fixes

After Round 1 QA found 10 friction points, the following fixes landed:

CommitFriction PointDescription
3e9a96dFP-2get_section routed through section-parser.ts (file-based) for consistent #-prefixed heading format across all section tools. Old Anvil-backed route removed from docs.ts
71b4d1eFP-9, FP-10create_annotation validates the referenced doc exists (no more orphaned annotations on typo paths); delete_section blocks H1 deletion with helpful error pointing to delete_doc; stale tool descriptions updated
00fe0d0FP-8search_docs filters results below 0.5 relevance score — nonsense queries no longer return 10 plausibly-scored garbage results
2ebe200FP-1, FP-7Critical routing fix: createDocCrudRouter() now mounts BEFORE createDocsRouter() in index.ts. The greedy GET /docs/:path(*) wildcard in docs.ts was swallowing GET /docs/:path/sections/:heading requests before they reached doc-crud — caused get_section to always return 404 "Document not found". Also simplified search "no results" warning copy
2c18724FP-6reopen_annotation tool description now explicitly says "returns to draft status for editing before re-submission" — agent QA flagged the old one-line description as insufficient context

Test coverage added

doc-crud.test.ts gained 9 new test blocks (14 tests total):

  • update_section subtree replacement — parent + H1 cases
  • create_doc content parameter — custom content + template fallback
  • insert_section FIFO ordering verification
  • move_section — basic move, move-with-descendants, self-move rejection, missing-source 404

section-parser.test.ts gained findSection — short-form resolution block with 5 tests.

Total: 41 tests, all passing. TypeScript compiles clean.

Agent QA Methodology

Novel pattern introduced this session. Worth documenting in methodology/ later.

The idea

Hand off a structured prompt to a sub-agent with MCP tool access. The prompt tells it to work through realistic scenarios (create → edit → move → delete → annotate → review → cleanup) using every tool in the surface. Crucial rule: the agent must try the intuitive approach first, without reading source code or documentation. Every failure gets logged as a "friction point" with: what was tried, what happened, what was expected, severity, and suggested improvement.

This catches a class of bugs that unit tests never touch — UX friction, confusing error messages, stale documentation, parameter names that don't match mental models, missing affordances.

What it found that unit tests wouldn't have

  • FP-1 (get_section routing) — Unit tests mount the doc-crud router in isolation, so they never hit the docs.ts wildcard precedence bug. The agent found it on the first call.
  • FP-2 (heading format inconsistency) — Unit tests pass canonical paths by convention. An agent trying the intuitive path discovered write tools and read tools disagreed about the format.
  • FP-8 (search noise) — The threshold was never tested because nonsense queries weren't in the test fixtures. The agent searched for "xyzzy plugh zorkmid" and got 10 plausibly-scored results.
  • Tool descriptions that had gone stale — agent called out delete_section's "use update_section with empty content" advice which was no longer accurate after the subtreeEnd change.

The loop

  1. Round 1: Agent tests against deployed instance → 10 friction points
  2. Fix the real ones (4 already resolved by earlier session commits, 4 new fixes)
  3. Rebuild, redeploy
  4. Round 2: Same prompt, same instance → 7 friction points
  5. Analyze: 3 were real regressions / edge cases, 3 were false positives (stale cache on fly.dev), 1 was by-design but poorly documented

The round-to-round comparison is the valuable part — you can directly measure whether your fixes reduced friction.

Prompt template

Lives at packages/api/src/mcp/__tests__/agent-qa-prompt.md in the Foundry repo. The scenarios cover: document lifecycle, section CRUD, content replacement, annotations + reviews, search, edge cases, cleanup. The report format is structured (Friction Report → Friction Points → Tool Coverage Matrix → Positive Notes) so rounds are comparable.

Cost

Each round is ~60-80 tool calls and ~50K tokens in the sub-agent. Cheap. Worth running before every deploy that touches MCP tools.

Remaining Work

Filed as #123: get_page section structure via Anvil chunking

Covers both agent QA findings FP-3 (inconsistent section structure) and FP-4 (create_doc content order scrambled on read). Same root cause: get_page materializes sections from Anvil chunks, which can aggregate siblings into content blobs and reorder based on chunk ingestion order.

Suggested fix (in the issue): add a file-based get_page route in doc-crud.ts that reads the markdown directly and uses parseSections() — same pattern already applied to get_section in commit 3e9a96d. Anvil stays as the semantic search backend; structural reads go through the file.

This is a bigger change because the current response includes lastModified and title metadata from Anvil, plus the frontend may depend on the Anvil-aggregated chunks for certain views. Needs its own refinement pass to scope the rewrite without breaking consumers.

Not filed

  • Tool description staleness generallydelete_section description was hand-updated this session. No systemic check catches drift between tool descriptions and actual behavior. Could be a future lint / test: parse the tool's JSDoc on the handler, diff against the MCP description field, fail CI on mismatch. Low priority but would have saved a round of QA friction this session.

Deferred from NEXT.md (not touched this session)

  • PAT rotation reminder (still unrotated — mentioned in status/next.md)
  • Corrupted projects/foundry-cc-bridge/design page cleanup
  • foundry-cc-bridge design doc refinement (the original session target)
  • Foundry-as-chat shared queue mini-design

Lessons Learned

Agent QA is high-leverage

The first round took ~8 minutes of agent work and surfaced issues that would have taken days to hit organically (the routing precedence bug on get_section would have bitten the first real user on read). The structured report format made it trivial to triage into "fix now / defer / already fixed" buckets. Recommend adding this to the Foundry release checklist for any MCP tool changes.

Express route precedence is a trap for wildcard routes

The GET /docs/:path(*) wildcard in docs.ts silently swallowed GET /docs/:path/sections/:heading because of mount order. This bug was invisible to unit tests (they mount routers in isolation) and invisible to the tool description (which documented the correct API). Only a full-stack integration test (agent QA or similar) catches it.

General rule: any time you see a :path(*) wildcard route, mount specific routes that share that prefix BEFORE the wildcard. Add a comment near the wildcard explaining the constraint.

Worktree cleanup inconsistency

Sub-agents launched with isolation: "worktree" sometimes committed to loose branches (cleaned up on success) and sometimes to their own branches that needed rebasing against main. The #117 fix required a rebase because it was based on old main while #98 and #119 committed directly. Not a blocker but worth documenting in the agent orchestration docs — prefer fast-forward-able branches and check parent commits before merging.

File-based reads > Anvil-backed reads for structural operations

The get_section fix (3e9a96d) established a principle: if you need the exact document structure, read the file directly and parse it. Anvil is for semantic search; it chunks and embeds, which is lossy for structure. Issue #123 applies the same principle to get_page.

Consistent error patterns are best-in-class

Both QA rounds praised the available_headings in 404 errors as a standout UX pattern. When a tool can't find what you asked for, telling you what exists is dramatically better than a bare "not found." Apply this pattern everywhere you have a discoverable namespace (section addresses, annotation IDs, review IDs, doc paths). The delete_doc response's annotations_deleted count is the same idea — tell the caller what cascaded so they understand the side effects.

One-shot doc creation eliminates a huge source of friction

The content parameter on create_doc (#117) turned 8 MCP calls into 1 for a typical NEXT.md-style document. The old pattern (create + N inserts) was also where most of the "duplicate heading" and "section ordering" bugs originated. Single-call creation sidesteps all of that. Consider the same pattern elsewhere: a replace_doc_content(path, content) tool would give you atomic full-document rewrites without the section-by-section addressing dance.

Review

🔒

Enter your access token to view annotations