Concrete types yield better maintainability
Posted: 2025-10-25
One of my most astute colleagues at Google always explains things through examples, rather than in abstract terms. He has improved many of my docs by modifying them to start with concrete examples before any general formulas or principles.
He often quotes Feynman, for whom we both have a great deal of respect:
“You can know the name of a bird in all the languages of the world, but when you’re finished, you’ll know absolutely nothing whatever about the bird… So let’s look at the bird and see what it’s doing —that’s what counts. I learned very early the difference between knowing the name of something and knowing something.”
☙
Guido van Rossum has argued strongly against heavy use of
higher-order functions for iteration. He even argued for the removal of
map and filter in favor of list
comprehensions:
I think dropping
filter()andmap()is pretty uncontroversial;filter(P, S)is almost always written clearer as[x for x in S if P(x)], and this has the huge advantage that the most common usages involve predicates that are comparisons, e.g.x == 42, and defining a lambda for that just requires much more effort for the reader (plus the lambda is slower than the list comprehension). Even more so formap(F, S)which becomes[F(x) for x in S].
Python has reluctantly kept map and filter,
but they take a secondary role (mainly because of the choice of having
them operate on generators), at least when compared with their Lisp
variants1.
Python’s “explicit is better than implicit” mantra (ref) is interpreted as “show explicitly how each individual value is manipulated”. Python programmers will write:
[x * x for x in range(100) if is_prime(x)]
This philosophy for expressing iteration stands in sharp contract with the approach taken by both array languages and Lisp/Scheme. In Scheme we would write:
(map square (filter is-prime? (iota 100)))
Which approach is better? Explain the iteration by showing how an individual value is manipulated or focus on the logical structure of the computation?
☙
This tension between concrete and abstract doesn’t just apply to the logical structure of our computations. It finds another manifestation at the heart of our type systems. We see the same tension when deciding between concrete types and generic abstractions.
The vast majority of functions in dynamically typed languages don’t specify the type of their arguments:
def average(elements):
return sum(elements) / len(elements)
These functions can check at runtime (e.g.,
assert isinstance(x, Duck)), but this is rare. Instead,
they simply specify loose requirements of the values (“must quack like a
duck”) and are happy to accept values of any types (as long as they
implement those requirements).
What’s the concrete set of types that average is being
evaluated on? You can’t tell! It can include any type that:
- Has a defined size (implements the
__len__method). - Is iterable (implements
__iter__) and contains elements supporting+.
☙
In C++ you can also abstract concrete types away:
template <typename ContainerType>
auto average(const ContainerType& elements) {
using ElementType = typename ContainerType::value_type;
ElementType sum = std::accumulate(std::begin(elements),
std::end(elements), ElementType{});
return sum / std::size(elements);
}
In Java, you could introduce an interface and express your algorithm in terms of its methods.
Why would you do this? The common wisdom is that one should make
functions as general as possible: instead of committing
average to just one specific type, allow it to be applied
to any (that implements the requirements).
“Write your code against interfaces, not implementations,” we’re told. “Minimize coupling.”
There’s an intriguing
article from Herb Sutter suggesting exactly this approach (in the
context of this quote, “the caller” would be our average
function):
the caller does not, and should not, commit to a single concrete type, which would make the caller’s code less general and less able to be reused with new types.
I expect most programmers would agree that, ceteris paribus, the
generic implementation of average is much more useful than
one bound to a specific type, say
std::vector<int>.
☙
Unlike Python, Scheme or JavaScript, if someone calls the C++ (or
Java, or statically-annotated Python, or TypeScript) implementation of
average with incorrect types —if they pass something that
doesn’t quite quack like a duck— the compiler detects the mistake.
Detecting the mistake at compile time is great! Or, rather, not detecting the mistake statically absolutely sucks. Life is too short to waste it writing code without static type checks.
However, compile-time error detection of interface errors does little to address the problem of Hyrum’s Law: that callers will inevitably depend on implementation details. Unless you have the means (e.g., a mono-repo) and time to adjust all your callers, you’ll still have a problem: what do you do when you need to adjust your implementation to rely on something that you had only implicitly assumed?
Suppose that, in addition to explicitly assuming that the input quacks, you had also implicitly assumed that the input has legs and now you need to change your implementation to make the input stand up?
Suppose you establish that
averageis a bottleneck and want to rewrite it using a SIMD-optimized loop. Or accumulate concurrently through a thread pool.Maybe you want to switch from
std::accumulateto a different algorithm (such as Kahan summation).
These situations would likely bring additional requirements for
ContainerType and/or ElementType. If you had
used auto average(const std::vector<int>&), this
change would be much simpler. Now you are forced to do something more
complex (e.g., use C++20 concepts, SFINAE, or specialized
overloads).
☙
The main advantage of explicit, concrete, static types isn’t just to enable compile-time validations; can be achieved with generic/templated functions. The main advantage is that code that is explicitly coupled to concrete underlying types –that locks them in– is easier to reason about. Knowing the actual type you’re manipulating makes things simpler than adding the layer of indirection of having to reason in terms of an interface or generic type. And code that’s easier to reason about is easier to maintain.
For the vast majority of cases, explicitly coupling your functions to specific types can make your code more maintainable.
Obviously, there are great reasons to use templates and generalize some functions. I have many examples:
But this generalization should be an intentional decision, not the default, for those cases where the function truly needs to be general. YAGNI: Only build the abstract function when you have a concrete need for it.
☙
Herb Sutter’s article didn’t age very well. The C++ community quickly
became enamoured with auto. I think we initially thought:
“Great, we can have the advantages of dynamically typed languages, while
detecting mistakes at compile time!”
… and then we quickly fell out-of-love with auto
(besides the few very justifiable cases, where it greatly simplifies
things). I think we quickly started experiencing the pain of overly
generic code.
☙
If you must generalize, generalize only to the extent that is
strictly necessary. Maybe it’s enough to make average take
a const std::vector<ElementType>& (rather than a
fully generic const ContainerType&).
Bear in mind this refactoring asymmetry:
From concrete to generic: easy. Turning a concrete function like
auto average(const std::vector<int>&)into a generic (templated) version is straightforward.From generic to concrete: painful. Taking a generic
template <T> average(T)function used by hundreds of types and trying to specialize or constraint it (because of a new requirement) can be a nightmare.
A similar asymmetry applies to your testing surface. If someone
somewhere calls average on a container that updates itself
asynchronously, their code will compile, but it will be incorrect. The
burden of testing the generic average implementation is now
on its callers. In other words:
Concrete functions: The set of tests is very clear: the specific type involved.
Generic functions: You have a conceptually infinite testing surface. If tests pass with a concrete type, can you conclude that they pass with all types that meet the requirements? Or are you, instead, accientally relying on some implicit property of the concrete type you tested with? You should test with vectors, lists, sets, custom containers, containers of ints, floats, custom classes, ….
Related
C++23: From imperative loops to declarative ranges: Explaining why
std::viewsandstd::rangesare often preferable over low-level iteration.Source code: Minimizing merge conflicts Teams making fast progress on any shared codebase inevitably face (source code) merge conflicts. This text describes general strategies to improve the situation.
Up: Essays
Consider that this is an error (and there are good reasons why this should be an error, as
mapreturns a lazy iterator):len(map(square, range(100)))↩︎