Foundry Foundry

The CSDLC Thesis — A New Framework for Human-AI Collaboration

By Dan Hannah & Clay March 2026


The Core Claim

AI collaboration is not a tooling problem. It's a communication design problem.

The models are powerful. They can write code, generate architectures, analyze data, draft legal documents, and reason through complex multi-step problems. The capability is there and it keeps getting better.

And yet, most people using AI to build things are producing mediocre work — unreviewed, untested, architecturally incoherent. The industry has a name for it: vibe coding.

The gap isn't capability. The gap is communication.

The AI can't read your mind. It doesn't know what you built yesterday. It doesn't know why you made the decisions you made. It doesn't know what "good" looks like for your specific project. Every session starts from what we call the cold start problem — a brilliant collaborator with total amnesia, beginning from zero with no memory of your project, your decisions, or your history.

Some tools are beginning to chip away at this. OpenClaw's persistent memory files, for example, let an AI agent remember personal preferences and facts across sessions — what you might call personal memory. But there's a meaningful difference between personal memory ("Dan prefers TypeScript and uses Cloudflare for hosting") and institutional memory ("here's why we chose sqlite-vss over ChromaDB, here are the three approaches we rejected and why, and here's how the chunking strategy interacts with the embedding model's token window").

Personal memory helps the AI know you. Institutional memory helps the AI know your work.

CSDLC solves for institutional memory — the kind that makes complex, multi-session collaboration possible.

CSDLC — the Claymore Software Development Lifecycle — is a methodology built on a simple premise: if you solve the communication problem, the capability problem solves itself. Structure the information flow between human and AI correctly, and the AI's existing capabilities produce dramatically better outcomes. No new models required. No magic prompts. Just disciplined communication design.


What's Broken

Most people working with AI today fall into one of three patterns:

The Conversationalist. They sit in an IDE or chat window and describe what they want in natural language. The AI generates something. They iterate back and forth — "no, not like that, more like this" — until it looks close enough. There's no documentation, no quality gates, no separation between planning and execution. It works for small tasks. It falls apart at project scale.

The Prompter. They've learned that better prompts produce better outputs. They study prompt engineering, craft detailed system prompts, use chain-of-thought reasoning. This is better — but it's still optimizing at the wrong layer. A perfect prompt can't compensate for missing architectural context, unresolved ambiguity, or the cold start problem.

The Automator. They build agentic pipelines — orchestrators, worker agents, tool chains. The architecture is sophisticated. But the methodology is often absent. Agents execute tasks without clear acceptance criteria. Quality gates are afterthoughts. The human's role devolves to "approve or retry." The system is automated but not disciplined.

None of these patterns are wrong — each captures something essential. The Conversationalist's instinct for iterative dialogue is the seed of refinement. The Prompter's craft in structuring information for AI consumption is the seed of context design. The Automator's architecture for agent orchestration is the seed of a scalable pipeline.

CSDLC doesn't reject these patterns. It subsumes them.

The Conversationalist's intuitive back-and-forth becomes formalized refinement — collaborative sessions with structure, explicit ambiguity resolution, and documented outcomes. The Prompter's carefully crafted prompts become agent templates backed by a documentation intelligence layer that provides context automatically rather than through manual curation. The Automator's pipeline gets the quality gates, role definitions, and methodology it needs to produce reliable results rather than impressive demos.

What all three patterns share in their raw form is a blind spot: they treat AI as a tool to be operated, not a collaborator to be communicated with. And that distinction changes everything.


Six Insights

CSDLC is built on six insights discovered through practice — building real products, shipping real features, and learning what actually works when humans and AI collaborate on complex work.

1. It's a Communication Design Problem

The single most important realization: the quality ceiling of AI-assisted work is determined by the quality of communication between human and AI, not by the capability of the model.

A frontier model with bad context produces bad work. A slightly less capable model with excellent context, clear scope, and resolved ambiguity produces excellent work. This isn't theoretical — it's observable in every session. The same model, the same day, will produce wildly different quality depending on how well the human structured the communication.

This means the highest-leverage investment isn't better models or better tools. It's better process — specifically, process designed around the communication channel between human and AI.

Every element of CSDLC exists to reduce friction, eliminate ambiguity, and structure that channel so the AI's existing capabilities actually land:

  • Design docs pre-resolve ambiguity so the AI never has to guess
  • Refinement sessions surface edge cases before they become bugs
  • Context loading protocols ensure the AI starts every session informed
  • Explicit scope boundaries prevent well-intentioned but wrong assumptions
  • Acceptance criteria make "done" objective, not subjective

The question isn't "how do I make the AI smarter?" It's "how do I structure communication so the AI's existing capabilities actually land?"

2. Empathy as Methodology

The biggest unlock in human-AI collaboration: the human must see the world from the AI's perspective.

What does the AI know at session start? Nothing. What context is it missing? Everything from yesterday. What does "clear instructions" look like from the AI's side? Not what most humans think. Where does it struggle? Ambiguity, missing context, conflicting signals, unstated assumptions.

Most people write prompts and specs from their own perspective — what they want to happen. Empathy means writing them from the AI's perspective — what does the AI need to know to make it happen?

There's an important distinction here: specificity is the answer. Empathy is knowing which questions to answer.

Consider the difference:

Without empathy (specificity alone): "I need a function that validates G-code output. It should check for negative depths, out-of-bounds coordinates, and missing tool changes. Here are the exact validation rules…"

This is specific and detailed. But it doesn't consider what the AI doesn't know.

With empathy: "I need a G-code validation function — but what does the AI actually understand here? It probably knows G-code syntax generically, but it doesn't know that our pipeline uses a Y-flip transform, so 'out of bounds' means something different than the AI would assume. It also doesn't know that we have two modes that produce G-code differently. I need to explain the Y-flip, point it at the coordinate systems doc, and explicitly state which mode this validator covers."

Both are being specific. Only one is being empathetic. The first person gives detailed instructions. The second person first models what the AI knows and doesn't know, then structures the instructions to fill those specific gaps.

This extends to understanding the AI's failure modes. AI agents will happily iterate forever on a bad approach — they don't feel the "this is wrong" instinct that experienced humans develop. They'll follow a bad assumption to its logical conclusion without questioning the premise. Knowing this changes how you structure work: you add checkpoints, you separate planning from execution, and you treat the human's gut feeling that "something is off" as a legitimate quality signal.

Empathy as a core human skill. The industry narrative right now is that AI makes human skills less important — "anyone can code now." CSDLC argues the opposite: AI makes a specific human skill more important than ever. Not coding, not prompt crafting, not technical knowledge — but the ability to model another intelligence's knowledge state in real-time and adapt your communication accordingly. The ability to think, with every interaction, "what does the AI not know that I'm taking for granted?"

This is the core human competency for the age of AI collaboration. It's not a technique you apply — it's a way of thinking you develop. And it gets better with practice.

3. Refinement as the Core Practice

If empathy is the mindset, refinement is the practice that operationalizes it.

Refinement is the collaborative process where human and AI work through scope, architecture, risks, and edge cases together — iteratively, adversarially, until there's zero ambiguity. It's not a meeting or a phase. It's the fundamental activity that powers the entire methodology.

What refinement looks like in practice:

The AI Lead reads a requirement and immediately starts poking holes: "What happens when X? How does this interact with Y? What's the failure mode here?" The human pushes back: "That's overthinking it" or "Good catch, let's handle that." Both sides sketch approaches. The AI Lead might propose an architecture; the human might reject it based on domain knowledge the AI doesn't have. Sections get written, reviewed, challenged, rewritten. Not in one pass — iteratively.

It ends when both sides say: "I have no more questions. This is clear."

Why refinement is an insight, not just a step:

Most collaboration frameworks treat planning as a prerequisite you complete and move past. CSDLC treats refinement as the highest-leverage activity in the entire pipeline. A 30-minute refinement session doesn't just prevent misunderstandings — it builds the shared understanding that makes everything downstream work.

During refinement, the human develops empathy for the AI's perspective by watching what questions it asks, where it gets confused, and what assumptions it makes. The AI develops institutional context by having the human explain why, not just what. Both sides get smarter about the problem through the friction of disagreement.

This is where the real gap-closing happens. The human enters refinement thinking they know what they want. The AI enters refinement with generic knowledge and zero project context. Through structured back-and-forth, those two starting positions converge into a shared, precise, unambiguous understanding of the work.

The quality gates are refinement checkpoints. They aren't just process overhead — they're structured moments where human and AI verify that they understood each other correctly. Did the design doc capture the actual intent? Did the sub-agent's output match the ticket scope? Does the final deliverable meet the spirit of the spec, not just the letter?

Critically, quality gates are also how the human stays in the loop without micromanaging. CSDLC assumes a delegation-oriented management philosophy: define clear scope, provide the right context, trust the execution, verify the outcome. This mirrors how effective engineering managers work with human teams: you don't watch your senior engineer type, but you do review their PR before it merges.

4. Documentation as an Intelligence Layer

AI agents start every session from zero. There is no institutional memory without documentation. Every session is a new hire's first day.

Most teams treat documentation as a chore — something you write after shipping, if you write it at all. CSDLC inverts this: documentation is the primary mechanism by which AI agents acquire institutional knowledge. It's not an artifact of the development process. It is the development process.

This reframing has profound implications:

Documentation becomes the shared memory between human and AI. When a human writes or updates a design doc, the AI's knowledge updates on the next session. When the AI needs context for a task, it queries the docs instead of the human copy-pasting thousands of tokens into a prompt. The documentation is the bridge that spans the gap between sessions and solves the cold start problem.

Documentation quality directly determines the quality ceiling of AI output. A well-structured design doc with architecture decisions pre-made, ambiguity pre-resolved, and trade-offs pre-documented enables the AI to produce work that's architecturally coherent from the first pass. A missing or stale doc forces the AI to guess — and guesses compound into rework.

Documentation is not just text — it's a queryable intelligence layer. This insight directly sparked the creation of Anvil, an MCP server that indexes documentation and makes it semantically searchable by AI agents. Instead of pasting 15,000 tokens of context into every session, agents query the docs themselves and pull exactly what they need — 500 to 1,000 tokens of targeted retrieval per query. The documentation becomes infrastructure, not just prose.

The compound effect: Every design doc written makes every future session more productive. Every architectural decision documented is a decision that never needs to be re-debated. Every lesson learned that's captured is a mistake that's never repeated. Documentation compounds in a way that no other investment in the process does — and with AI agents that start from zero every session, this compounding is the only thing that creates continuity.

The Routr project proved this in reverse. Months of development without design docs produced working software — but also accumulated inconsistencies, undocumented assumptions, and architectural drift that made AI collaboration increasingly friction-heavy. Creating retroactive design docs for Routr was one of the first applications of CSDLC, and the immediate improvement in session quality was the strongest evidence that documentation isn't optional — it's foundational.

5. MCP as First-Class Architecture

Most teams encounter the Model Context Protocol (MCP) as a feature to expose to users — "let agents talk to your API." CSDLC treats MCP as a first-class architectural concern that should be designed into systems from day one, not bolted on later.

The standard framing: expose tools and data to AI agents so they can perform actions on behalf of users.

The CSDLC framing: MCP is the universal protocol surface that serves both end-user functionality and the development process itself.

MCP closes the testing loop. When your system exposes its capabilities via MCP, you can point an adversarial AI agent at those same interfaces and have it try to break what you just built. This isn't traditional fuzz testing — it's a reasoning agent that understands your system's semantics, can craft creative edge cases, and can find holes in your assumptions in ways that scripted tests never would. The same protocol surface your users will consume becomes the surface your QA agents attack. Your test coverage maps directly to your real-world attack surface with zero divergence.

MCP enables agent self-service. When documentation is served via MCP (as Anvil does), agents don't need humans to curate and paste context. They query what they need, when they need it. This transforms the human's role from manual context curator to architect of information access patterns.

MCP forces interface discipline. Designing MCP tools requires thinking about your system's capabilities as discrete, well-scoped operations with clear inputs and outputs. This is just good API design — but MCP makes it a natural part of the development process rather than a retrofit. Systems designed with MCP in mind tend to have cleaner abstractions because the protocol demands it.

The practical implication: if you're building a system that AI agents will interact with — and increasingly, that's every system — design the MCP surface as part of your architecture, not as an integration add-on.

6. Beyond Code — This Is a Collaboration Framework

CSDLC was born from software development, but the insights aren't code-specific. They're about how humans and AI collaborate on complex, multi-session work — regardless of domain.

The core problems are universal:

  • The AI starts every session from zero (cold start problem)
  • Ambiguity in instructions produces unpredictable output
  • Complex work requires decomposition into manageable units
  • Quality requires separation between creation and review
  • Institutional knowledge must be captured or it's lost

These problems apply equally to writing a novel, building a legal case, designing a curriculum, managing a research project, or running a marketing campaign. Any domain where the work spans multiple sessions, has enough complexity to require decomposition, quality matters enough to warrant process, and a human and AI are collaborating rather than the human just querying — is a domain where CSDLC's principles apply.

The specific rituals (standups, sprint tracking, git worktrees) are software-flavored, but the underlying methodology — refinement-driven planning, documentation as shared memory, quality gates, empathy as methodology — is domain-agnostic.

An author using AI to write a novel series faces the same cold start problem: how does the AI remember character details, plot threads, and world-building rules from book one when starting book three? CSDLC's answer — structured documentation as queryable intelligence — works regardless of whether the "project" is a codebase or a manuscript.

A legal team using AI to analyze contracts faces the same ambiguity problem: "find problematic clauses" means different things depending on jurisdiction, contract type, and client risk tolerance. CSDLC's answer — collaborative refinement until ambiguity is zero — works regardless of whether the deliverable is code or legal analysis.

The process is the product. Individual projects come and go. The methodology compounds.


How CSDLC Works (Overview)

CSDLC is a pipeline with clear roles, explicit quality gates, and continuous refinement. The full operational detail lives in PROCESS.md. Here's the high-level structure:

Three roles:

  • Human — Technical Product Owner & QA Director. Defines what to build and why. Final approval on everything. The judgment layer.
  • AI Lead — Architect, Scrum Master & Tech Lead. Drives refinement, crafts sub-agent prompts, reviews all output before it reaches the human. The orchestration layer.
  • Sub-agents — Development team. Execute well-scoped tickets with curated context. Disposable. The execution layer.

The pipeline:

Refinement (continuous — the practice that powers everything)
    ↓
Step 0: Design Doc (Human ↔ AI Lead)
Step 1: Story Breakdown (Human ↔ AI Lead)
Step 2: Agent Prompt Crafting (AI Lead)
Step 3: Sub-agent Execution
Step 4: AI Lead Review
Step 5: QA (Separate from Implementation)
Step 6: Human Review → Ship

The key principle: every step is mandatory. Skipping QA or review is how quality dies. The 30-minute investment in Step 0 refinement eliminates hours of rework downstream. The process is the discipline.


Evidence — Projects Built with CSDLC

Routr — A Patent-Pending CNC Application

Routr is a browser-based CNC workshop planning app that translates woodworking designs into machine-ready G-code. It includes a constraint solver for multi-board assembly, a G-code generation pipeline with coordinate system transforms, SVG import with pocket detection, heightmap-based toolpath simulation, and a complete payment and authentication stack.

This isn't a todo app. It's a complex engineering project with a patent filed — built by one person collaborating with AI agents using CSDLC.

What Routr proved about the methodology:

  • Complex domain knowledge transfers through documentation. CNC machining, coordinate systems, toolpath generation — domain-specific knowledge that the AI doesn't have built-in was successfully communicated through design docs and consumed by sub-agents to produce working code.

  • Retroactive documentation has value. Routr was built for months before CSDLC was formalized. Creating retroactive design docs surfaced inconsistencies, undocumented assumptions, and architectural drift — proving that the documentation layer adds value even when applied after the fact.

  • The methodology survives real pivots. Assembly Mode's combinatorial complexity (every combination of board shapes × joint types × edge connections) was one factor driving the Workshop Mode pivot — but equally important was the strategic decision to bring a simpler, more focused product to market faster. The design doc captured both the technical reasoning and the business reasoning, and every subsequent session started with that context intact. Without the documentation layer, the rationale for the pivot would have been lost — and likely re-debated in every future session.

  • Physical-world validation adds a quality dimension AI can't handle alone. G-code ultimately runs on a CNC machine cutting real wood. The layered validation strategy — unit tests → snapshot tests → simulator → physical cuts — demonstrates how CSDLC accommodates quality requirements that can't be fully automated.

Anvil — From Insight to Shipped Product

Anvil is an open-source MCP server that makes documentation semantically searchable by AI agents. Point it at a directory of markdown files, and it chunks, embeds, and serves them over MCP — zero config, zero API keys, TypeScript-first.

What Anvil proved about the methodology:

  • CSDLC produces real products fast. Three epics, fourteen stories, 149 tests — from design doc to published npm package. The core server and MCP tools were built in a single "Lightning Strike" session: six PRs merged in roughly 21 minutes of agent execution time. The DX polish, CLI, and publish prep followed in subsequent sessions.

  • The methodology eats its own tail. Anvil was born from a pain point that CSDLC itself surfaced: context injection through design docs was working but was expensive and manual. The methodology's own feedback loops identified the problem, and the methodology's own process produced the solution. CSDLC built the tool that makes CSDLC better.

  • Validated the documentation intelligence layer. Anvil's existence proves that documentation-as-infrastructure isn't just philosophy — it's buildable, shippable product. The reduction from 15,000+ tokens of pasted context to 500–1,000 tokens of targeted retrieval per query is a measurable efficiency gain produced by treating docs as a queryable intelligence layer.


The Landscape — What Exists Today

The AI-assisted development space is evolving rapidly. Understanding where CSDLC fits requires understanding what the current tools and approaches do well — and where they stop.

Note

This landscape reflects the state of the market as of March 2026. These tools are evolving quickly — the specific capabilities described here may change, but the structural gap CSDLC addresses is unlikely to be closed by tooling alone.

IDE-Integrated Assistants

Tools: Cursor, GitHub Copilot, Windsurf, Cline

These tools operate at the implementation layer. They read your codebase, understand your context within a session, generate code, run tests, and iterate. They're powerful and getting more capable by the month.

What they get right: Low friction. You're already in your IDE, the AI sees your code, you describe what you want, it happens. For individual tasks — implement this function, fix this bug, write this test — they're excellent.

What they don't provide: A methodology. These tools don't tell you how to decompose a project, how to manage context across sessions, how to structure quality gates, or how to maintain architectural coherence over time. They're instruments without a score.

Agentic CLIs

Tools: Claude Code, Codex CLI, Aider

Terminal-based tools that give AI agents direct access to your filesystem, shell, and development environment. More autonomous than IDE assistants — they can run multi-step workflows, execute commands, and iterate on their own output.

What they get right: Autonomy and power. Claude Code in particular can tackle complex, multi-file changes with minimal hand-holding.

What they don't provide: Same gap — no methodology layer. They're more powerful instruments, but still instruments. The human is responsible for all the orchestration decisions: what to work on, in what order, with what context, and how to verify quality.

Agentic Platforms

Tools: OpenClaw, Claude Cowork

Platforms that go beyond coding into general-purpose AI agent orchestration. OpenClaw connects AI to your messaging apps, file system, and services for autonomous workflow execution. Cowork operates on your local files for knowledge work.

What they get right: The vision of AI as an always-on collaborator, not just a tool you invoke. OpenClaw's skill system and persistent memory across sessions is a meaningful step toward solving the cold start problem at the personal memory level.

What they don't provide: A development methodology. They're platforms — they provide the capability for agent orchestration but not the discipline.

Industry Frameworks

Reports: Anthropic's 2026 Agentic Coding Trends Report, Karpathy's "Agentic Engineering" framing, various enterprise adoption guides

The industry is converging on high-level principles: engineers shift from writing code to orchestrating agents, you need quality gates, you need human-in-the-loop checkpoints, multi-agent architectures need coordination patterns.

What they get right: The directional vision. The shift from "writing code" to "orchestrating agents" is real and well-articulated.

What they don't provide: Operational specificity. They describe what the future looks like, not how to work that way today. "You'll need quality gates" is correct but insufficient — which gates, when, owned by whom, triggered by what? The gap between "here's the trend" and "here's how to run your Tuesday" is where CSDLC lives.

Where CSDLC Fits

CSDLC is not a tool and not a platform. It's the operational methodology that sits on top of whatever tools you use. You can practice CSDLC with Cursor, with Claude Code, with OpenClaw, or with a plain text editor and API calls.

The tools provide capability. CSDLC provides discipline.

The closest analog in traditional software development: Agile is a methodology, not a tool. You can practice Agile with Jira, with sticky notes, or with a spreadsheet. The methodology is independent of the tooling. CSDLC is the Agile equivalent for human-AI collaboration — a structured way of working that produces predictable, high-quality outcomes regardless of which AI tools you're using.


Where This Goes

CSDLC is a living methodology. It improves with every project, every sprint, every lesson learned. The core insights — communication design, empathy as methodology, refinement as practice, documentation as intelligence, MCP as architecture, applicability beyond code — are stable. The specific practices evolve as tools improve and patterns emerge.

The compounding effect is the point. Every design doc written makes every future session more productive. Every lesson captured is a mistake never repeated. Every refinement of the process itself makes the next refinement faster. Individual projects ship and are done. The methodology accumulates.

Models will keep getting more capable. Tools will keep getting more powerful. But the communication problem between human and AI isn't going away — it's getting more important as AI takes on more complex, higher-stakes work. The teams and individuals who invest in solving that problem systematically will outperform those who rely on better models alone.

The process is the product. The methodology compounds. Everything else is tooling.


This document is the thesis statement for CSDLC. For the full operational methodology, see PROCESS.md. For core values, see Core Values. For project-specific evidence, see Routr and Anvil.

The Claymore way: ship fast, ship right, get better every time. ⚔️

Review

🔒

Enter your access token to view annotations