Intentional instability: AI-generated code vs Hyrum’s Law

Posted: 2025-10-24

This is part of a series I’m writing on generative AI.

Introduction

If you own an API, you know the pain of Hyrum’s Law: “with a sufficient number of users, all observable behaviors of your system will be depended upon.”

Here I explore ramifications of using AI code generation to implement “intentional instability”: cheaply and constantly regenerating code to modify implicit behaviors on purpose, forcing users to reply only on explicit contracts.

Preamble

Consider a specification for an unimplemented get_markers function, given as a type signature and a set of properties1:

async def get_markers(path: pathlib.Path) -> list[Marker]:
  """Returns all markers in `path` in appearance order.

  Properties:
  …
  * A RepeatedMarkersError exception helpfully mentions explicitly all
    repeated markers in `path`.

  Raises:
    RepeatedMarkersError: If `path` contains repeated (identical) markers.
  """
  raise NotImplementedError()  # {{🍄 list markers}}

The properties are requirements. We write tests against them and they tell our users what to expect. This text focuses on the specific property shown: RepeatedMarkerError must mention all repeated markers.

Based on the spec, any of these messages would be correct:

These would be incorrect:

AI’s arbitrary choice

How would an AI workflow write get_markers?

  1. Test-driven: The AI could start by generating a unit test that codifies an assumption, asserting a specific format (e.g., "Repeated markers: foo, bar"). It would then generate an implementation to satisfy the test.

  2. Implementation-driven: The AI could start with the implementation, arbitrarily picking picking a specific format. The test generation step would then peek into the implementation and write a test that locks in the specific choice.

  3. It could fail: The AI could warn the user that the specification is too vague: the tests and implementation can’t be generated independently –one must peek at the other. This is the worst option. If we must fully specify every minor detail, our specifications become hopelessly verbose. We might as well just write the function directly and let the implementation be the specification.

In options 1 and 2 the result is the same: an unspecified detail becomes a concrete observable behavior2.

Pushing back on Hyrum’s law

This brings us back to Hyrum’s Law. As soon as we ship get_markers, someone will start parsing the AI’s arbitrary exception message. The arbitrary choice of an error message, explicitly left out of the spec, effectively becomes part of the contract.

But here is the catch: AI-assisted workflows can cheaply produce brand-new implementations on demand3.

We could extend our AI-assisted workflow to preserve old behaviors. It could inspect the old implementation, extract observable behaviors (including the exact exception format), and treat them as implicit properties for the new implementation.

But with AI, we can just as easily choose not to. We can effectively tell AI: “any correct implementation (of the explicit specs) is fine.”

Better yet, we can instruct the AI to deliberately change unspecified, observable behaviors with every regeneration. This is intentional instability.

This isn’t a brand-new idea; it’s just been expensive to implement manually.

Consider C++’s std::hash:

“Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision denial-of-service attacks.”

With std::hash this is easy: just pick a new random salt at process start.

What AI offers is the ability to apply this same principle –intentional instability– cheaply and anywhere.

Applicability

Deliberately changing all unspecified aspects can be very hostile. Users often have good reasons for depending on things, even if you wish they didn’t. They no longer need to learn just how the software behaves; they must also learn which of all observable behaviors are part of the contract.

So where is this technique actually useful?

  1. Spec-aware fuzzers: you can generate variant implementations from the same spec. This helps you find contract-violating assumptions in your own customer code. This is invaluable for high-security, high-reliability environments and for fighting implicit coupling between microserves.

  2. Identify what should be specified: By intentionally varying a behavior, you’ll quickly hear from users what really matters. This feedback helps you understand which “observable” behaviors are so critical they should be “frozen”: promoted to the official specified contract.

In our RepeatedMarkersError example, the feedback from users parsing the string might lead us to a stable, machine-readable property (e.g., RepeatedMarkersError.repeated_markers: set[str]). We can choose to maintain intentional instability for the message string, while providing a robust new mechanism for the users that need this.

Using intentional instability through AI-generation of software can help us manage the boundary between “observable” and “specified”, to help us maintain control of our API contracts.


  1. This isn’t just a theoretical example. I’m already using AI to generate this function from a template with the properties and the 🍄 placeholder. I’ve omitted other properties for brevity; this one suffices for our purpose.↩︎

  2. I suspect option 1 is generally better than option 2 –it feels safer to have the implementation conform to the tests than the reverse– but this question is distraction from the main point.↩︎

  3. Why regenerate? Perhaps we’re adding a new parameter with its own complex properties, and it’s easier to just regenerate the function.↩︎