The Second Day Problem¶

2026-07-02

Even now, coding with the strongest models available, I run into the same pattern on every project.

On day one, in my discussions with the agent, the agent has the upper hand. It has simply seen more than I have. On technical detail after technical detail it outclasses me; the designs it proposes are more production-ready than mine, closer to best practice.

Then, starting on day two, the relationship flips. Sometimes it doesn't even take a day — half a day can do it. (If you're coding fourteen hours a day, the first eight hours it's teaching me.) From that point on, almost every discussion is me teaching it. On knowledge of this particular project's code, it can no longer compete with me at all.

Two curves¶

If you drew this as curves:

The human curve is monotonically increasing. On a given project, the knowledge and ability we hold keeps growing. Sure, it can flatten — if you don't take the project seriously, or the project itself is mediocre and meandering — and forgetting can even pull it down. But on a serious software engineering project, human memory and intuition grow very fast, and the ceiling is high.

The agent curve, under the ordinary, non-spec-driven way of working, is a horizontal line. Its knowledge and understanding of the codebase doesn't improve at all.

So by day two or day three, my ability on this project is far beyond the agent's. And the agent starts to look "dumber" and dumber — because the information it's handling has moved outside its distribution. Ask it to eat the raw codebase, or a pile of stacked-up code files, and its perplexity is far higher than reading a well-kept document, or a mix of 50% docs and 50% code. (That's an experiment worth actually running: measure the perplexity.)

What this comes down to, I think, is that the beings best suited to long-range work are still humans. Physically and intellectually, we are formidable endurance runners; today's agents can hardly do this at all.

Our spec-driven approach exists, at its core, to make up for exactly this — the agent's missing capability on genuinely long-range tasks. And by long-range we don't mean a few hours. We mean work that runs a week, even a month.

Hidden knowledge and collapse¶

Intent matters here. Original intent is a precious asset, and it never fully shows up in the code.

So much of what makes a codebase work is hidden knowledge. Sometimes we reject one approach and choose another, and that choice is anything but trivial. But when an agent takes over the codebase, it doesn't know the design decisions that came before — so it keeps making the same mistake, one I've come to think of as collapse. It collapses back onto the method it's used to, the one that looks most natural at first glance.

This is enormously frustrating for a developer. We chose this technical route for very good reasons — and then the agent, while fixing some bug, follows its habits and deliberately changes the architecture back. Bound by its training corpus, accustomed to the generic method, it can't correct for it, can't adapt to the codebase in front of it. It doesn't know how many twists and turns sit behind the code, let alone how much deliberation went into choosing this design in the first place.

Climbing with protection¶

I've already felt what the alternative is like, and honestly, it's a kind of pleasure: programming with specs is deeply comfortable.

Once your codebase gets complex, you no longer have to worry that the agent will completely misunderstand your original intent. It feels like climbing a rock face with full protection. Developing without specs is free solo. You're afraid that one mistake of yours will break a feature that used to work — and even more afraid that the agent, in one seemingly careless design choice, will paralyze a whole series of precise designs that took real effort to get working.

One more thing has become obvious along the way: the fresh-context agent we launch every time is much less prone to overfitting than the long-lived agent you've used forever and can't bear to close. A fresh agent starts from the specs — not from the sediment of an old conversation.

Why we're building SpexCode¶

SpexCode is this idea built into a tool. Every change lands as a versioned spec node, so original intent and the design decisions behind the code stay readable next to it. Each task runs in a fresh-context agent that starts from the spec tree rather than a stale chat. The point is to bend the agent's flat line upward — so that on day two, and on day thirty, it's still climbing the project's curve with you.