What I changed my mind about in 2025

End-of-year posts are usually a list of accomplishments. This one is the opposite. Three things I was wrong about this year, and what changed.

1. I underestimated how fast coding agents would become useful

A year ago I was working with a coding agent the way a senior engineer lets a junior one draft something, then rewrites most of it. Productivity win: marginal. Risk of subtle bugs: real. Net: I used it for boilerplate and avoided it for anything load-bearing.

I think I was half-right about the bugs and fully wrong about the productivity.

What changed was not model capability in some abstract sense. The underlying models are not dramatically better than a year ago. What changed was the loop.

A year ago the loop was: prompt, wait, copy, paste, fix. Cycle time a minute or two per iteration. Today the loop is: agent running in the background, edits applied in place, tests run automatically, broken states caught before I see them. Cycle time in seconds.

The shell around the model is what turns a clever demo into a tool you actually use. I was reasoning about capability when I should have been reasoning about integration.

2. I got the open-source AI question wrong in both directions

A year ago I thought open-weight models were obviously the right default. Still essential. No longer as clean as I was making it.

The weakening: frontier-scale open-weight models are actual dual-use artifacts. The same weights that let a researcher fine-tune for a disability aid can be fine-tuned to remove safety training in a weekend. Both are real. Both have been demonstrated this year.

The strengthening: centralizing the most powerful models in three companies is a worse failure mode than the one open-weights creates, and I was underweighting that. A world where every consumer task routes through one of three APIs is a different kind of capture than a fine-tuneable weapon in an amateur's hands.

I used to have a clean answer. I now have a position: open-weight for everything below the frontier, ambiguous at the frontier. I don't love the ambiguity, but the confident version was wrong.

3. I was wrong about automated accessibility tools

A year ago I put automated accessibility tools roughly on par with linting. Useful for the obvious stuff, dangerous when treated as the whole program, not a substitute for real testing.

I still think the compliance-as-ceiling trap is the trap. The 2025 update is that the tools got meaningfully better at the obvious stuff, and the obvious stuff is a larger fraction of real problems than I was crediting.

The common stat for automated tools has been "roughly a third of WCAG issues." That stat is out of date. On the sites I audit, 2025 tools catch closer to half. That shift is not "solved." It changes what manual auditors should spend their time on. If a tool catches half, the auditor is now freed to work on the harder half, not the whole.

The failure mode I was defending against, teams treating "axe passes" as the job done, is still the failure mode. Better tools don't fix the org problem. They just mean the tool is now more useful to the teams that already understand it is not the whole job.

I was right to be skeptical. I was slightly wrong about where the skepticism belonged.

What to take from this

Nothing grand. Three specific cases where holding the position less tightly would have gotten me to the better answer faster.

I expect more of these next year. Writing them down is part of the mechanism that makes the next round shorter.

All articles