BIRKEY CONSULTING

ABOUT  RSS  ARCHIVE


Posts tagged "engineering":

16 May 2026

The Agent Is Not the Point

I recently finished building a coding agent1. Not a wrapper around someone else's. A small one, from scratch, in Rust. Its core is four tools: read a file, write a file, edit part of a file, and run a shell command. One agent loop. A session that appends linearly to a log. That is roughly it.

The experience taught me something that I think the current conversation about AI in software engineering is mostly missing. So I want to say it directly:

The agent is not the point.

People are the point. Engineering rigor is the point. Being able to gain clarity on the actual problem you are trying to solve is the point. The agent is a tool that, if harnessed well, can help with those things. But the tool itself is not what matters. What matters is what it enables the human to do.

Two camps, both wrong

I see two extreme positions dominating the current AI conversation, and I think both are mostly emotional reactions to something that deserves clearer thinking.

The first camp says that coding agents will do everything. No more coders. No more software engineers. Massive layoffs are coming. Big names in the industry are declaring the end of programming as we know it. Within this camp, there is a further impulse: stop reading code, just let the agent do it. Accept the output. Move on. Everyone is suddenly talking about "agentic this" and "agentic that" without, in many cases, actually understanding what an agent is or how it works under the hood. The word has become a branding exercise more than a technical description.

The second camp says that AI is fundamentally bad. It steals work. It produces garbage. It makes people dumb. It offloads thinking. It is a threat to the profession and to the craft of engineering. This camp includes engineers I have learned a great deal from over the years, people whose judgment I ordinarily trust. The animosity is real and honest, but I think the conclusion is wrong.

Both camps are having an emotional reaction to a genuine change in the landscape. I understand why. The change is real and the pace is fast. But the framing on both sides is shortsighted. The question was never whether AI will replace you or whether AI makes people stupid. That is the wrong question. It was always the wrong question. The question is: what is this thing actually good for, what does it need to be useful, and how do we harness it in a way that produces verifiable, reproducible, deterministic outcomes?

Having built one

I did not set out to build a coding agent to prove a point. I built one because I wanted to understand what was actually happening when I used one. I had been using Pi2 and Claude Code in my daily work, and I was impressed but also unsatisfied with how much of the process stayed opaque. So I built OneLoop1.

Here is what OneLoop does: it reads files, it writes files, it edits files, and it runs shell commands. There is an agent loop that assembles a prompt with the conversation history, sends it to the LLM, parses the response, and executes whatever tool the model asks for. The session is a JSONL file that grows linearly. That is the whole thing.

What surprised me was not how complex the agent needed to be. It was how capable the LLM is with just those few primitives. Given the ability to read files and run commands, the model does genuine detective work. It gathers evidence. It recognizes patterns across a codebase. It follows leads from one file to another, from one log line to a stack trace to a root cause. It can narrow a bug from a vague symptom to a specific line of code by systematically reading, searching, and cross-referencing.

That is not magic. It is a small number of durable primitives plus a very powerful pattern recognition engine. But seeing it up close, from the inside, made something click for me. The value is not in the agent as a thing. The value is in what those primitives enable the LLM to do gather evidence so the human can see more clearly.

What the LLM is actually good for

The LLM is good at pattern recognition across large surfaces. It is good at gathering and synthesizing information. It is good at following a trail of evidence when you give it the tools to look around. It is good at generating plausible code, yes, but that is almost a side effect of a deeper capability: it is good at helping the human gain clarity on the problem.

That is what I keep coming back to. The real value of an LLM in a coding workflow is not that it writes code for you. It is that it helps you see the problem more clearly. It gathers context you might not have the patience to gather. It spots patterns you might miss because you are too close. It follows leads you might not have thought to chase. The code generation is real and useful, but it flows from the clarity, not the other way around.

This is also why the "just let the agent do everything" camp is wrong. If you accept the output without inspecting it, without understanding what happened, without building your own mental model of the problem, you have gained nothing durable. You have a patch that works right now and no understanding of why. That is not engineering. An LLM will happily write and rewrite code to fix one thing while breaking another, over and over, and you should not just accept that cycle without rigor. If you do, there is a term for it: faith-based engineering.

And it is why the "AI is fundamentally bad" camp is wrong too. The pattern recognition is real. The evidence gathering is real. The clarity it can produce is real. Throwing that away because the current implementations are imperfect, or because some of the surrounding hype is ridiculous, is like refusing to use a compiler because you once saw it generate a wrong optimization. The tool does not have to be perfect to be genuinely useful.

The opportunity

I do not want this to read as a cautious "be careful with AI" post. I am genuinely excited about what this enables.

There is so much accumulated waste in our industry. Clunky systems that evolved through years of locally reasonable decisions into globally unreasonable messes. Wrong abstractions that nobody has time to fix. Half-baked integrations, scattered validation logic, unclear ownership, duplicated effort. We all know these systems. We have all worked in them. The problem was never that we lacked the intelligence to fix them. The problem was that we lacked the time and the patience to gather the evidence needed to see clearly what was actually wrong.

That is where the LLM changes the economics. It can gather that evidence. It can read every file, run every test, trace every dependency, and lay it out in front of you. Not so you can blindly accept its conclusions, but so you can think more clearly about what to do. The human judgment is still the scarce resource. The LLM just makes that judgment cheaper to exercise well.

I see a tremendous opportunity to disrupt waste that has been sitting around for years, not because we did not know it was there, but because the cost of understanding it was too high relative to everything else on the backlog. That cost is dropping fast. The question is whether we use that drop to generate more noise or to finally clean things up.

A concrete example. You inherit a codebase and you know there is cruft. It is hazy though. You cannot quite see where the patterns of waste begin and end. There is duplicated logic scattered across handlers, half of it slightly different from the other half, and you are not sure which version is the source of truth. It would take you a full day just to trace all the variations, map the dependencies, and build a mental model of what the code is actually doing versus what the domain needs.

Instead, you ask the agent to read through the relevant files and identify the pattern. It does. It lays out every variation side by side, shows you where they diverge, and highlights which ones match the domain intent and which ones are just drift. Now you can see it. What was hazy is concrete. You did not outsource the thinking — you outsourced the gathering. The judgment about what to keep, what to collapse, and how to restructure is still yours. But the cost of getting to that judgment just dropped from a day to ten minutes.

That is the economics I am talking about. You still have to think. You still have to decide. But you get to do it from a position of clarity instead of a position of exhaustion. And once you can see the pattern clearly, you can guide the agent to clean it up in a way that stays close to the domain — with tests to verify, diffs to inspect, and a clear before and after.

The same applies to building new things. If you can describe what you want with enough clarity — invariants, contracts, tests, acceptance criteria — the LLM can help you get there faster. Not by replacing your thinking, but by amplifying it. The hard part was never the typing. The hard part was always the thinking. Nothing about that has changed.

Where this is heading

I think the future of coding agents — and agents in general — looks less like a product and more like a library. Small composable building blocks. APIs, maybe even ABIs3, that expose just enough surface for people to build on top of. Every major platform will roll out their own agent APIs. The winners will be the ones that treat the agent as something you compose and extend, not something you adopt wholesale.

That is the spirit behind OneLoop. It meets my needs because it is tailored to the way I work — my tools, my environment, my workflow. It is not designed for everybody, and I am not pretending it is. When I believe I have hardened the core pieces enough, with clean interfaces so that someone else can pick, choose, and compose their own workflow on top of it, I will open source it. Until then, I might just release it as-is — warts and all — so people can take inspiration, steal what is useful, and build something that fits their own brain.

Because that, I think, is the right shape for an agent. Not a monolithic product that tells you how to work. A small set of composable primitives that you shape around how you already think.

The agent is not the point

I keep coming back to the same realization. The agent is not the point. It never was.

The point is people thinking more clearly about the problems they are trying to solve. The point is engineering rigor producing verifiable, reproducible outcomes. The point is being able to gain clarity on the actual problem, make sound judgments, and create real value. The point is disrupting the waste and the clunk that has built up over years of shortcutting.

The coding agent is a tool that, built on a handful of small primitives and harnessed with discipline, can help with all of that. But it is still just a tool. It is a very powerful one, and I am excited about what it makes possible, but it is not the thing that matters. What matters is what the human does with the clarity the tool helps produce.

The question was never whether AI will replace you or whether AI will make you dumb. The question is: will you use this tool to think more clearly, build more intentionally, and create more value? That question has always been the right one. The tool just changed.

Footnotes:

1

OneLoop is a tiny coding agent I wrote in Rust. Its core is four tools (read, write, edit, bash), one agent loop, and a session model that appends linearly to a JSONL file. It is a private repo for now. I will open-source it at some point when I believe it is ready. I highly recommend every engineer who cares about their craft to build one from scratch. I guarantee you will have some aha moments.

2

Pi is a minimal, extensible coding agent by Mario Zechner: https://pi.dev/. I wrote about why I love it here: https://www.birkey.co/2026-04-19-why-i-love-pi.html

3

ABI stands for Application Binary Interface — a lower-level contract than an API that defines how software components interact at the machine level. I use it here to make the point that agent interfaces might eventually need to be as stable and well-specified as the contracts that operating systems and compilers have provided for decades.

Tags: AI engineering
05 Apr 2026

Oneness is All You Need

Tony Hoare put the problem well when he wrote, "I conclude that there are two ways of constructing a software design."1 One path is simplicity. The other is complexity, whose deficiencies are harder to see.

That line has stayed with me for years because it names a real danger in our industry. We often mistake the absence of visible flaws for actual clarity. We add layers, libraries, frameworks, helper services, configuration systems, and alternative paths until the whole thing looks sophisticated enough that nobody can easily challenge it. Then we call that maturity.

In the current era of bloated, fast-generated code, that danger feels even more immediate. We are producing more software than ever, often faster than we can understand, verify, or justify. That makes simplicity less of a preference and more of a survival strategy.

Most projects do not fail because engineers lacked yet another abstraction. They fail because complexity compounds faster than the team can reason about it. The system becomes harder to inspect, harder to change, harder to verify, and eventually harder to trust.

That is why I keep returning to one design pressure that has become more important to me over time: oneness.

By oneness, I do not mean anything mystical. I mean something very operational:

The point is not ideological purity. The point is reducing avoidable complexity so the system stays legible, easily testable, and verifiable by the people building it.

Why this feels harder than it should

Even before the current LLM era, it was difficult to stay simple. There were always reasons not to.

An engineer wants to move fast, so a new library gets introduced before its trade-offs are understood. A team wants flexibility, so it creates multiple ways to achieve the same outcome. A system outgrows its original design, so validation rules get copied into controllers, jobs, database constraints, frontend checks, and downstream consumers. Another team arrives and adds a second workflow rather than cleaning up the first. Then a third team adds a wrapper around both.

Nothing in that sequence sounds absurd in isolation. That is exactly why complexity is dangerous. It rarely arrives as one obviously wrong decision. It arrives as a long series of locally reasonable decisions that collectively destroy clarity.

The result is familiar:

  • New engineers cannot tell where to start.
  • Existing engineers cannot tell which layer owns what.
  • Bugs become archaeology.
  • Data integrity becomes probabilistic.
  • Every change carries too much fear.

This is why I have long been drawn to ideas like Easy To Change2, declarative systems3, self-documenting tools4, and a programmable workbench5 that keeps the whole loop visible. They all pull in the same direction. They reduce multiplicity. They reduce drift. They give you one place to think from.

Why the LLM era changes the economics

The LLM era does not remove this problem. It sharpens it.

LLMs lower the cost of producing code. They do not lower the cost of ambiguity. If they are not harnessed well, they can compound it so quickly that the resulting system becomes almost impossible to reason about. That is the danger. The opportunity is that the same tools can also help us reduce complexity, but only if we use them with discipline.

In fact, ambiguous systems are exactly where generated code becomes most dangerous. If a repository has three ways to configure a service, two half-trusted test setups, duplicated validation logic, unclear module ownership, and no obvious path through the codebase, an agent will happily generate more material inside that ambiguity. It can amplify existing confusion faster than a human ever could.

But I also think the LLM era gives us an opportunity that did not exist in quite the same way before. We can now spend less human energy on producing boilerplate and more human energy on collapsing unnecessary complexity. An agent can help standardize interfaces, remove duplicated code paths, migrate scattered logic into one owned layer, and push a codebase toward a more coherent shape.

The principle did not change. The economics did.

That is why I do not see the current moment as a reason to compromise on simplicity. It is one of the first times in my career when I feel I can insist on it more aggressively.

Code is abundant, understanding is not

SICP says that programs must be written for people to read, and later adds that readers should know what not to read.6 I still agree with both points, but I think their practical implication changes in the agentic era.

If generated code becomes abundant, it becomes impossible for a human to read all of it with equal depth. That is not a moral failure. It is just arithmetic. The surface area grows too quickly.

So we need to move one level higher.

Instead of assuming the human must read every line with equal care, we should design systems so the core behavior can be reviewed through a much smaller surface: tests, invariants, contracts, and executable examples.

I am not claiming tests replace reading code. They do not. Bad tests can hide bad systems, just as bad abstractions can. But the essence of a system is often much smaller than its total implementation volume.

A team may generate or write a thousand lines of code, but what the system fundamentally does may fit in:

  • ten invariants
  • twenty meaningful examples
  • a handful of properties
  • a short list of input/output contracts

That smaller surface is something a human can actually hold in their head. It is something another engineer can review, an agent can execute repeatedly, and CI can verify without asking everybody to reread the entire implementation every time.

In other words, the human review surface should get smaller as code generation gets cheaper.

That is not a retreat from engineering rigor. It is an attempt to put rigor where it gives us the most leverage.

In a healthy system, a human reviewer should not need to reread the entire generated implementation to regain confidence. They should be able to look at a smaller behavioral surface and ask: Are the invariants still true? Do the core examples still hold? Did this change widen the contract or violate it? That is a far more realistic way to supervise generated code than pretending abundance did not change the review problem.

Oneness inside layers

When I say oneness, I am not arguing against layered systems. I am arguing that each layer should have one clear responsibility and one obvious place where certain truths become real.

For example:

  • There should be one place where a piece of data becomes valid.
  • One place where a business invariant is enforced.
  • One default command that gets a human or an agent into the project.
  • One declared artifact that owns environment setup where practical.
  • One obvious module that owns a transformation.

If the honest answer to those questions is often "it depends," the system is usually paying a complexity tax already.

This is also why I care so much about one source of truth. A scattered system forces every engineer to rebuild the same mental model from fragments. A coherent system lets them ask a smaller set of questions, because the data model and ownership model are clearer. That matters for humans and for agents too.

Sometimes the benefit is almost embarrassingly concrete. One declared artifact for environment setup is better than shell scripts, wiki instructions, and CI fragments all telling slightly different stories. One default project command is better than three nearly equivalent ways to run tests. One layer owning a business invariant is better than duplicating partial validation in the UI, the handler, the job runner, and the database and hoping they never drift apart.

Take something as ordinary as order creation. In a messy system, the shape of an order gets partially validated in the frontend, partially checked again in the HTTP handler, partially normalized in a background job, and partially constrained in the database. The tests mirror that fragmentation, so no single test surface tells you what an order is supposed to mean. A more coherent design gives one layer ownership of turning input into a valid order, one place where the invariants become real, and one test suite that expresses those rules directly. The total lines of code may not shrink much, but the number of places you need to look in order to trust the system absolutely does.

This is part of why I like tools and conventions that collapse scattered state into one place. A flake.nix7 can become the declared truth for environment setup. A self-documenting Makefile8 can become the obvious entry point into a project. A well-owned test suite can become the smallest trustworthy surface for reviewing behavioral changes. None of these ideas are glamorous. That is precisely why they age well.

An LLM works much better when there is one obvious command to run, one obvious directory to modify, one obvious test suite to extend, one obvious place where integrity checks belong, and one obvious owner for the relevant behavior. Ambiguity is expensive for humans, and it is even more expensive when delegated.

What oneness is not

Oneness is not:

  • one giant service
  • one giant function
  • one person making all decisions
  • refusal to use libraries
  • denial of layering
  • minimalism for its own sake

It is a bias. It is a design pressure. It says that unnecessary multiplicity should have to justify itself.

Sometimes reality will justify it. There are cases where multiple paths, multiple representations, or multiple deployment forms are the right answer. But the burden of proof should be on complexity, not on simplicity.

That is the part our industry often gets backwards.

What I mean in practice

When I look at a system now, especially in the presence of coding agents, I increasingly want to ask very plain questions:

  • What is the one thing this layer owns?
  • What is the one mental model for the data in this layer?
  • Where is the one place I check whether data is valid?
  • What is the one command I should run first?
  • What is the one test file or suite that best expresses intended behavior?
  • What is the smallest reviewable surface that captures the essence of this change?

Those questions do not solve every design problem. But they keep me oriented toward legibility and verification.

And that, to me, is the real opportunity of the LLM era. We can use these tools to generate more code, yes. But a much better use is to generate less confusion. We can use them to push systems toward one obvious path, one owned responsibility, one executable behavioral surface, and one source of truth where possible.

I do not think software becomes better by becoming more clever. I think it becomes better when it becomes more legible, more deterministic, and easier to verify.

In an era of abundant code, that is what I mean by oneness.

Footnotes:

1

C. A. R. Hoare, The Emperor's Old Clothes, the 1980 ACM Turing Award Lecture, published in Communications of the ACM 24(2), 1981: https://www.labouseur.com/projects/codeReckon/papers/The-Emperors-Old-Clothes.pdf The "two ways" passage appears on PDF p. 13.

3

My post on why declarative systems matter to me: https://www.birkey.co/2026-03-22-why-i-love-nixos.html

4

The GNU Emacs manual describes Emacs as a "self-documenting" editor and explains that this means you can use help commands at any time to find out what your options are and what commands do: https://www.gnu.org/software/emacs/manual/html_node/emacs/Intro.html

5

My earlier post on Emacs as a programmable workbench: https://www.birkey.co/2026-03-28-emacs-as-a-programmable-workbench.html

6

Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs, Preface to the first edition: https://sicp.sourceacademy.org/sicpjs.pdf The readability passage appears on PDF p. 24, including the line about programs being written for people to read and the follow-on point that readers should know what not to read.

7

Official Nix documentation on flakes: https://nix.dev/concepts/flakes.html

8

My post on self-documenting project entry points: https://www.birkey.co/2020-03-05-self-documenting-makefile.html

Tags: engineering ai
26 Dec 2020

ETC - The principle to guide engineering

If you are just curious about the ETC principle to ground all other principles of software engineering, you can scroll all the way down of this post to find out what I meant by ETC. If you know what ETC stands for and are already grounding your engineering efforts, you can just skip this post and go on with your MIT (most important task). However, If you are skeptical, which you always should be, you might want to read on to learn why.

As an Engineer, our primary function on daily basis is to come up with a working solution to a specific problem that arose out of need from our users. I highlight the word user here since the user could be our end user, could be our fellow engineers, or could just be ourselves. If you have been in the industry for a while, you might have been exposed to plethora of must follow principles and practices from existing literature, which I categorize as `transmittable` knowledge. Apart from that, there is another type of knowledge that I would like to classify as `untransmittable` where you have to experience it to make it your own. Those two types of knowledge corresponds to how we learn new things: 1. We read transmittable knowledge from various sources 2. We bring them into our practice to form our deep understanding of it. Only then, we are able to utilize our newly learned knowledge effectively to achieve our end result, which is by the way to meet the needs of our fellow users as apposed to the need of certain systems. Now, let us stack ETC principle against three of the most well known engineering paradigms so we are always grounded in our approach to utilizing them.

Monolith vs Microservice

We have come a long way since the 1950s in our approach to different design and architectural paradigms. Over the last 10 years or so, we all have been preached to about how great Microservice is and have drunk its Kool-Aid. It has penetrated engineering organizations to such a degree that it has become our new hammer. Now all of a sudden, we start waking up to its trade offs and even going back to old monolith architecture. What happened? We just did not ground ourselves when we made a decision to adopt microservice style. The question we should have asked ourselves before committing to microservice or monolith is: How it enables us to make whatever we are doing easy to change? Does monolith style make our system easy to change? Maybe. Does microservice approach enables us easy to change our system? Maybe. So the definite answer to both questions is "it depends". So what should be the basis of our decision to go with one way or the other? Maybe an example in order to drive my point here. Let's say you're designing an e-commerce system. It has catalog, checkout and shipping components. Does putting all of the components into one service make the system easy to change for you? May be it will if you are the only person or your team are the only team working on it. Does putting all of the components into separate services make the system easy to change? May be it will if you have three separate teams working on them.

Top down vs bottom up design

I have seen teams swear by one way or the other on this matter. From time to time, someone comes along to preach one over the other declaring the other approach is dead or should just be avoided. Every time when I face such a design issue, I always ask myself this question over and over again: Does top down or bottom up design enables me or my team to respond to change? Most of the time, I end up choosing both approaches because it helps me to focus on making the system easier to change. I tend to use following rule of thumb. When I have clear mental model, I use the bottom up approach so I can create layers of abstractions to compose better, which in turn enables me to make changes easier. When I have a high level of domain clarity, I tend to start with top down design because it helps me to focus on not that clear pieces in isolation, which in turn makes changes a lot more manageable.

OO vs FP paradigm

I was preached and have been preaching OO style of programming paradigm from since my college days and well into the early years of my profession. Then FP became the new style (old became new?) and all OO programming languages started to add more FP style constructs. Now we see debates over why FP is superior over OO and should be used over all cost claiming that OO is responsible for all the mess that we created over the last 20 years or so. Now, I came to realize that it is not this or that paradigm that is responsible for all the issues we created. Rather, it is the blind adoption of them by us as practitioners. Let us think about a minute what OO have given us when we started to adopt it: The ability to structure, reuse and share code across systems. Essentially, we were able to build/change information systems faster and easier than ever before. However, it did not hold up well against ETC principle with its humongous frameworks and so called best practices. Now, FP is in its renaissance and being preached as the savior at the cost of relegating OO to its oblivion. OO is a great tool set in our arsenal for certain types of problem domain and SmallTalk is an example of how it should be practiced. FP is another excellent way of approaching to engineering problem in that it encourages to treat systems as referential data transformation pipeline. Does it inherently make the system easy to change? Not really. We are quite capable of making a spaghetti mess out of FP as much as we have done with OO. See a pattern here? ETC principle grounds your choice of paradigm into its proper place: Is x helping you to make changes easier? If not, then x most likely not best route for you.

I can go on and on with arguing ETC be the test to pass for all of the paradigms and even development approaches such as TDD, BDD and DDD. For example, If your code is easy to change, it will most likely be easy to test, maps most likely well with your domain and most likely models use cases better. You can adopt any approach or principles such as SOLID principle if you consistently ask yourself: Does this really helps me to make changes easier? If it passes this test, adopt it, if not avoid it.

It is not my intention to make you dogmatic about Easy To Change (ETC) principle but rather to convince yourself to have one principle to ground all other aspects of your engineering endeavors. Happy engineering and never forget to have and share fun coding!

Tags: engineering
Other posts