Fragments: March 16

Annie Vella did some research into how 158 professional software engineers used
AI, her first question was:

Are AI tools shifting where engineers actually spend their time and effort? Because if they are, they’re implicitly shifting what skills we practice and, ultimately, the definition of the role itself.

She found that participants saw a shift from creation-oriented tasks to verification-oriented tasks, but it was a different form of verification than reviewing and testing.

In my thesis, I propose a name for it: supervisory engineering work – the effort required to direct AI, evaluate its output, and correct it when it’s wrong.

Many software folks think of inner and outer loops. The inner loop is writing code, testing, debugging. The outer loop is commit, review, CI/CD, deploy, observe.

What if supervisory engineering work lives in a new loop between these two loops? AI is increasingly automating the inner loop – the code generation, the build-test cycle, the debugging. But someone still has to direct that work, evaluate the output, and correct what’s wrong. That feels like a new loop, the middle loop, a layer where engineers supervise AI doing what they used to do by hand.

A potential issue with this research is that it finished in April 2025, before the latest batch of models greatly improved their software development capabilities. But my sense is that this improvement in models has only accelerated a shift to supervisory engineering. This shift is a traumatic change to what we do and the skills we need. It doesn’t mean “the end of programming”, rather a change of what it means to be programming.

A lot of software engineers right now are feeling genuine uncertainty about the future of their careers. What they trained to do, what they spent years upskilling in, is shifting – and in many ways, being commoditised. The narratives don’t help: either AI is coming for your job, or you should just “move upstream” into architecture and “higher value” work. Neither tells you what to actually do on Monday morning.

That’s why this matters. There is still plenty of engineering work in software engineering, even if it looks different from what most of us trained for. Supervisory engineering work and the middle loop are one way of describing what that different looks like, grounded in what engineers are actually reporting.

❄ ❄ ❄ ❄ ❄

Bassim Eledath lays out 8 levels of Agentic Engineering.

AI’s coding ability is outpacing our ability to wield it effectively. That’s why all the SWE-bench score maxxing isn’t syncing with the productivity metrics engineering leadership actually cares about. When Anthropic’s team ships a product like Cowork in 10 days and another team can’t move past a broken POC using the same models, the difference is that one team has closed the gap between capability and practice and the other hasn’t.

That gap doesn’t close overnight. It closes in levels. 8 of them.

His levels are:

Tab Complete
Agent IDE
Context Engineering
Compounding Engineering
MCP & Skills
Harness Engineering
Background Agents
Autonomous Agent Teams

Eight seems to be the number thou shalt have for levels. Earlier this year Steve Yegge proposed eight levels in Welcome to Gas Town. His levels were

Zero or Near-Zero AI: maybe code completions, sometimes ask Chat questions

Coding agent in IDE, permissions turned on. A narrow coding agent in a sidebar asks your permission to run tools.

Agent in IDE, YOLO mode: Trust goes up. You turn off permissions, agent gets wider.

In IDE, wide agent: Your agent gradually grows to fill the screen. Code is just for diffs.

CLI, single agent. YOLO. Diffs scroll by. You may or may not look at them.

CLI, multi-agent, YOLO. You regularly use 3 to 5 parallel instances. You are very fast.

10+ agents, hand-managed. You are starting to push the limits of hand-management.

Building your own orchestrator. You are on the frontier, automating your workflow.

I’m sure neither of these Maturity Models is entirely accurate, but both resonate as reasonable frameworks to think about LLM usage, and in particular to highlight how people are using them differently

❄ ❄ ❄ ❄ ❄

Chad Fowler thinks we have to change our thinking of what our target is when generating code.

…in a world where code can be generated quickly and cheaply, the real constraint has shifted. The problem is no longer producing code. The problem is replacing it safely.

Regenerative software does not work if the unit of generation is an application. Regeneration only works if the unit of generation is a component that compiles into a system architecture

He outlines several architectural constraints that make it easier to replace components

a small amount of communication patterns
clear ownership of data (“exclusive mutation authority for each dataset to a single component”)
clear evaluation surfaces, allowing behavior to be verified independently of implementation
the right size of components (natural grain). That size is based on data ownership boundaries and evaluation surfaces

Dividing complex systems into networks of replaceable components has long been a goal of software architecture. So far, this is still important in the world of agentic engineering.

❄ ❄ ❄ ❄ ❄

Mike Masnick summarized troubling experiences of using AI detection systems on student writing. (He’s summarizing an article by Dadland Maye, which is behind a registration wall that I’m too lazy to form-fill.) Maye’s institution used tools to detect and flag AI writing.

We are teaching an entire generation of students that the goal of writing is to sound sufficiently unremarkable! Not to express an original thought, develop an argument, find your voice, or communicate with clarity and power—but to produce text bland enough that a statistical model doesn’t flag it.

The hopeful outcome was that Maye stopped requiring students to disclose their AI usage, which changed the conversation to a discussion about how to use the tools effectively.

Students approached me after class to ask how to use these tools well. One wanted to know how to prompt for research without copying output. Another asked how to tell when a summary drifted too far from its source. These conversations were pedagogical in nature. They became possible only after AI use stopped functioning as a disclosure problem and began functioning as a subject of instruction.

We need to teach people how to use AI tools to improve their work. The tricky thing with that aim is that they are so new, there aren’t yet any people experienced in how to use them properly. For one of the gray-haired brigade, it’s a fascinating time to watch our society react to the technology, but that’s little comfort for those trying to plot out their future.

❄ ❄ ❄ ❄ ❄

Ankit Jain thinks that not just should humans not write code, they also shouldn’t review it.

Humans already couldn’t keep up with code review when humans wrote code at human speed. Every engineering org I’ve talked to has the same dirty secret: PRs sitting for days, rubber-stamp approvals, and reviewers skimming 500-line diffs because they have their own work to do.

He posits a shift to layers of evaluation filters:

Compare Multiple Options
Deterministic Guardrails
Humans define acceptance criteria
Permission Systems as Architecture
Adversarial Verification

Like Birgitta, I’m uneasy about the notion that “the code doesn’t matter”. I find that when I’m working at my best, the code clearly and precisely captures my intent. It’s easier for me to just change the code than to figure out how to explain to an chatbot what to change. Now, I’m not always at my best, and many changes are much more awkward than that. But I do think that a precise, understandable representation is a useful direction to aim to, and that agentic AI may be best used to help us get there.

In particular I don’t find his suggestion for #3 that natural language BDD specs are the way to go here. They are wordy and ambiguous. Tests are a valuable way to understand what a system does, and it may be that our agentic future has us thinking more about tests than implementation. But such tests need a different representation.

❄ ❄ ❄ ❄ ❄