Embrace the specification

Posted: 2025-11-01

This is part of a series I’m writing on generative AI.

Preamble: What this article isn’t about

Rather than discuss the very long and complex topic of ideal formats for specifications (and the many questions listed at the end), this text deliberately takes it for granted that we can produce a “specification” (a description of a software’s requirements) that is both:

Succinct –or, more accurately, easier to maintain than working directly (whether with or without AI) on the full source code.
Robust enough for the AI to reliably produce an implementation of the software.

This is not as tall an order as it may seem. We don’t need a perfect, formal system, merely something marginally better than maintaining the full implementation —for example, source code with placeholder markers that AI implements.

The form of this specification –whether a folder with structured markdown files; Behavior-Driven Development-style tests; a new formal language; code templates, perhaps, in a format similar to Knuth’s literate programming– is a critical question, but is fairly tangential to the point of this article. All that’s necessary is that it is a single source-of-truth artifact from which the AI can implement the software.

In a follow-up article, I intend to explore this, describing what I’ve been able to accomplish.

A concrete example

To make this tangible, I’ll share a concrete example (more details in Module specifications in Duende). While this description is a simplification that leaves out significant details, it illustrates the core idea.

I use this template as the specification for a tests-generation module. This is turned into the full implementation and 13 unit tests.

The result: 178 lines of specification (a third of which is just declaring two strings) produce 1476 lines of code and tests.

The old world: Source code is the source of truth

Pre-AI, as a developer, you translated an implicit specification in your head into a source code artifact, from which binaries were generated¹:

╭──────────────╮
│ 🧑 developer <──────╮
╰──┬───────────╯      │
   │              ┏━━━┷━━━━━━━━━┓
   ╰────edits─────> Source code ┃
                  ┗━━━━━━━━━━┯━━┛
╭─────────────╮              │
│ ⚙️ compiler <───input for──╯
╰──┬──────────╯
   │              ┏━━━━━━━━┓
   ╰──generates───> Binary ┃
                  ┗━━━━━━━━┛

Binaries are essential: there is no execution without them. We use them every day as we run our software. However, source code is the sine qua non. Besides execution or distribution, you only very rarely read the binaries. They have a secondary place.

To extend the software you run your text editor and modify the source code until it meets your new requirements (and then you rebuild binaries from scratch, modulo caching).

AI as a code modifier

A common AI-assisted software development pattern is creating “requirement-oriented contexts”:

Can you look at src/select_python.py and improve the implementation of FindPythonDefinition to support identifiers like Foo.Bar? This should match the method Bar in the class Foo.

Make sure to support multiple (arbitrary) levels of nesting.²

Source code remains the primary artifact; AI is simply used to adjust it. For each new requirement or feature, you write a specific prompt (or, more generally, a context) and use it to adjust the source code:

╭──────────────╮
│ 🧑 developer │
╰──┬───────────╯  ┏━━━━━━━━━━━━━━━━━━━┓
   │              ┃    Requirement    ┃
   ╰───creates────>  oriented context ┃
                  ┗━━┯━━━━━━━━━━━━━━━━┛
                     │
╭────────────╮       │   ┏━━━━━━━━━━━━━━━┓
│ 🤖 Agentic <───────┴───┨  Source code  ┃
│  workflow  │           ┃ (old version) ┃
╰───┬────────╯           ┗━━━━━━━━━━━━━━━┛
    │
    │                    ┏━━━━━━━━━━━━━━━┓
    ╰──────produces──────>  Source code  ┃
                         ┃ (new version) ┃
                         ┗━━━━━┯━━━━━━━━━┛
                               ┊

In this approach there is usually a lot of manual guidance: if the agent conversation veers off, you step in and steer it in the right direction (sometimes with a lot of swearing!):

“Why are you still reading src/foo.cc? That is entirely irrelevant to the task at hand! Please read src/bar.cc before you do anything else!”

Once the feature is implemented, you review and commit the change; discard all the AI context; and start crafting the context for the next requirement. Why would you keep the AI context? What for? At best, you keep it around as secondary documentation³.

Spec-driven development

Spec-driven development is a complementary alternative. This model proposes a fundamental shift: a high-level specification becomes the canonical representation of your software. The AI generates the source code from it, making source code a secondary generated artifact:

╭──────────────╮
│ 🧑 developer │
╰──┬───────────╯
   │              ┏━━━━━━━━━━━━━━━┓
   ╰───adjusts────> Specification ┃
                  ┗━━┯━━━━━━━━━━━━┛
╭────────────╮       │
│ 🤖 Agentic <───────╯
│  workflow  │
╰──┬─────────╯
   │              ┏━━━━━━━━━━━━━━━┓
   ╰───produces───>  Source code  ┃
                  ┗━━━━━┯━━━━━━━━━┛
                        ┊

This is a simplified view, which we refine below.

The goal shifts from “maintain source code implementing (implicit) requirements” to “maintain a specification that describes those requirements”. The specification takes center stage.

The AI outputs source code from the specification; just as a compiler outputs binaries from the source code. Source code becomes an intermediate representation.

The specification becomes the single source of truth, but there’s a critical question: what about all the unspecified but observable details? As we’ll see below, the (old) source code will still play a crucial role, just not as the primary artifact.

With requirement-oriented contexts, the order in which requirements are implemented directly shapes the contexts: each is based on the current state of the code it is extending.

Rather than manage a series of delta-oriented contexts, in spec-driven development you adjust the specification directly (i.e., document the new behavior in the most logical place). The order in which you implement requirements has no bearing in the specification (only in its history).

In Dumb AI and the software revolution I propose that developers continue to own the high-level skeleton of the software product. The specification is the embodiment of this skeleton, in a format that enables the AI to implement the details –the leaves of the “trees of ideas”.

The two approaches can be combined. For example, as the specifications themselves become complex (though, naturally, simpler than the full source code), we may use requirement-oriented contexts to extend the specifications themselves.

The new “debugging” loop

Occasionally, AI still struggles to deliver new requirements; or it produces outputs that miss our expectations. Perhaps you …

… didn’t decompose the problem well-enough for the AI to implement the specifications.
… left something underspecified, a bit vague, and the AI made questionable assumptions.
… have a contradiction. Or, if not exactly a contradiction, you didn’t fully reason about unintended consequences of parts of our specification.

When this happens, you intervene. But how you intervene changes completely:

Requirement-oriented contexts: Direct guidance in the agentic loop (“why are you looking at foo.cc instead of…”).
Spec-driven development: You fix the specification and restart the process.

We still need the old source code

In practice, the AI doesn’t (and shouldn’t) generate the new code in a vacuum. When generating a new version (likely after the specification has changed), we consult the old version. The spec-driven workflow actually looks like this:

╭──────────────╮
│ 🧑 developer │
╰──┬───────────╯
   │              ┏━━━━━━━━━━━━━━━┓
   ╰───creates────> Specification ┃
                  ┗━━┯━━━━━━━━━━━━┛
                     │
╭────────────╮       │   ┏━━━━━━━━━━━━━━━┓
│ 🤖 Agentic <───────┴───┨  Source code  ┃
│  workflow  │           ┃ (old version) ┃
╰───┬────────╯           ┗━━━━━━━━━━━━━━━┛
    │
    │                    ┏━━━━━━━━━━━━━━━┓
    ╰──────produces──────>  Source code  ┃
                         ┃ (new version) ┃
                         ┗━━━━━┯━━━━━━━━━┛
                               ┊

Reading the previous version serves various purposes:

Preserve observable behaviors (that aren’t covered in the specification). I describe this below.
Make it easier to review changes. The Agentic Workflow can avoid changing existing source code spuriously. How important this is depends on the reliability of the generation. I hope that some day we’ll be able to trust the AI outputs, in which case this will not be that relevant.
Incremental adoption. With this approach, you can gradually opt-in existing software function-by-function. At the start, all code is “unmanaged”, but you can both (1) start adopting specifications for new modules, and (2) gradually convert old modules to be based on specifications (which may not be worth it in all cases).
Efficient generation (analogous to how compilers cache previously compiled modules).

Preserving observable unspecified behaviors

As mentioned above, source code still matters (beyond the fact that, just like binaries, without it there is no execution): it locks-in observable behaviors that shouldn’t be part of the specification but should be preserved across versions.

Intentional instability: AI-generated code vs Hyrum’s Law gives the example of a function that raises an exception where we don’t care about the exact error text –any (reasonable) format works. The specification shouldn’t mandate it. However, once a format is exposes to users, it should only change deliberately.

The vast majority of these minor observable details don’t belong in the specification; they would make it as verbose as the source code itself, rendering it nearly useless. Instead, we maintain a lean specification covering only requirements we care about.

Though the specification is sufficient to generate a valid implementation of the entire project, the previous version of the source code implicitly augments the specification. The AI-generation workflow uses it to avoid accidentally modifying observable behaviors.

Obviously, if the specification (explicit requirements) and the old version (implicit requirements) conflict, the specification wins. In this case, we are deliberately changing some observable behavior (such as a bug we are fixing, or the lack of some feature).

The workflow can choose to prioritize preserving observable behaviors or, more strongly, preserving existing implementation. An excerpt from one of my prompts:

In your implementation, try to reuse the old implementation (from the $path file). Only change implementation if this is strictly necessary: if the old version has bugs (e.g., does not honor some documented property), new requirements have been added, or requirements have been removed/relaxed (in ways that allow code simplifications).

Advantages

The advantages of spec-driven development are significant. The main one is that the specification is a more useful embodiment of our software than its source code: it represents explicitly the properties that we care about, rather than leaving them mixed with accidental properties (Hyrum’s “observable aspects”).

The specifications are often significantly shorter than the source code. In the example from the preamble, a 178-line specification generates 1476 lines of code and tests –a reduction to ⅛ the size. While the exact ratio depends on many factors, this conciseness makes the specification far easier to maintain, modify, and understand.

A second, crucial advantage: the specification retains the guidance we give the AI, making it reusable. If we steered an AI as it worked on some new requirements (e.g., “don’t write unit tests this way because…”), that knowledge is lost in a requirement-oriented context. In a spec-driven approach, that guidance is directly encoded in the specification itself. This is like the difference between testing a function interactively in a REPL (ephemeral, must be repeated manually) versus writing repeatable unit tests (durable, automated).

I haven’t used spec-driven development for very long, but I expect that, over time, implementing my programs mainly as high-quality specifications will make them significantly more malleable and robust.

What about user interfaces?

We use debuggers today; they can be tremendously useful when something goes wrong. I expect that we’ll come to see the “chat” interfaces used by many of today’s agentic systems similarly. We’ll continue to occasionally need to steer AI interactively, but we’ll accept that this is a sign that something went wrong somewhere (as I wrote in Agentic AI: Recipes for reliable generative workflows).

But they will not be the default interface to develop software with AI. Instead, the ideal user interface we’ll use to develop software as we embrace AI-assisted workflows will not be too far from the file-oriented views most developers use today: you’ll modify your input files and kick off the generation process. Compilers give your editor reports about faulty lines; I expect spec-driven workflows will do something similar.

Challenges and follow up questions

This approach is not a silver bullet and leaves many open questions, some of which I intend to explore in future essays:

Specification format: Per the preamble, what are good formats for specifications?
Testing and validation: Does the AI generate tests from the specification? Are the tests embedded directly in the specification? How do we test the tests? How do we trust-the-trust?
Reviewing changes: How do we review changes? Does it suffice to review the deltas in the specification? Or do we also need to review the deltas to the generated code? Ideally, we’d find ways to avoid the latter.
Cost of new abstraction: What’s the cost of introducing the new abstraction of the specification? This is a new abstraction layer which adds some overhead. How do we minimize it?
Software engineering skills: How will this shift affect software engineering? I have written a few related ideas in Dumb AI and the software revolution, but the question deserves deeper exploration.

Dumb AI and the software revolution: Generative AI models are “frustratingly dumb,” yet “astonishingly powerful”, and poised to impact fields like software engineering. The key is to stop waiting for a “genius” and instead harness these fast, flawed collaborators with proper structure. This approach will lead to an explosion of custom software.
Module specifications in Duende: A report of what I’ve been able to accomplish so far with AI-assisted spec-driven development.
Up: Essays on AI

For brevity, this text describes compiled languages. However, the underlying ideas apply as much to interpreted languages: where I say “binaries” think of something like “in-memory representation of the interpreted program being executed”.↩︎
This is an excerpt from an actual prompt I used on 2025-06-28.↩︎
I saved a few of these prompts. I don’t expect to get much out of them.↩︎