BIRKEY CONSULTING

ABOUT  RSS  ARCHIVE


05 Apr 2026

Oneness is All You Need

Tony Hoare put the problem well when he wrote, "I conclude that there are two ways of constructing a software design."1 One path is simplicity. The other is complexity, whose deficiencies are harder to see.

That line has stayed with me for years because it names a real danger in our industry. We often mistake the absence of visible flaws for actual clarity. We add layers, libraries, frameworks, helper services, configuration systems, and alternative paths until the whole thing looks sophisticated enough that nobody can easily challenge it. Then we call that maturity.

In the current era of bloated, fast-generated code, that danger feels even more immediate. We are producing more software than ever, often faster than we can understand, verify, or justify. That makes simplicity less of a preference and more of a survival strategy.

Most projects do not fail because engineers lacked yet another abstraction. They fail because complexity compounds faster than the team can reason about it. The system becomes harder to inspect, harder to change, harder to verify, and eventually harder to trust.

That is why I keep returning to one design pressure that has become more important to me over time: oneness.

By oneness, I do not mean anything mystical. I mean something very operational:

The point is not ideological purity. The point is reducing avoidable complexity so the system stays legible, easily testable, and verifiable by the people building it.

Why this feels harder than it should

Even before the current LLM era, it was difficult to stay simple. There were always reasons not to.

An engineer wants to move fast, so a new library gets introduced before its trade-offs are understood. A team wants flexibility, so it creates multiple ways to achieve the same outcome. A system outgrows its original design, so validation rules get copied into controllers, jobs, database constraints, frontend checks, and downstream consumers. Another team arrives and adds a second workflow rather than cleaning up the first. Then a third team adds a wrapper around both.

Nothing in that sequence sounds absurd in isolation. That is exactly why complexity is dangerous. It rarely arrives as one obviously wrong decision. It arrives as a long series of locally reasonable decisions that collectively destroy clarity.

The result is familiar:

  • New engineers cannot tell where to start.
  • Existing engineers cannot tell which layer owns what.
  • Bugs become archaeology.
  • Data integrity becomes probabilistic.
  • Every change carries too much fear.

This is why I have long been drawn to ideas like Easy To Change2, declarative systems3, self-documenting tools4, and a programmable workbench5 that keeps the whole loop visible. They all pull in the same direction. They reduce multiplicity. They reduce drift. They give you one place to think from.

Why the LLM era changes the economics

The LLM era does not remove this problem. It sharpens it.

LLMs lower the cost of producing code. They do not lower the cost of ambiguity. If they are not harnessed well, they can compound it so quickly that the resulting system becomes almost impossible to reason about. That is the danger. The opportunity is that the same tools can also help us reduce complexity, but only if we use them with discipline.

In fact, ambiguous systems are exactly where generated code becomes most dangerous. If a repository has three ways to configure a service, two half-trusted test setups, duplicated validation logic, unclear module ownership, and no obvious path through the codebase, an agent will happily generate more material inside that ambiguity. It can amplify existing confusion faster than a human ever could.

But I also think the LLM era gives us an opportunity that did not exist in quite the same way before. We can now spend less human energy on producing boilerplate and more human energy on collapsing unnecessary complexity. An agent can help standardize interfaces, remove duplicated code paths, migrate scattered logic into one owned layer, and push a codebase toward a more coherent shape.

The principle did not change. The economics did.

That is why I do not see the current moment as a reason to compromise on simplicity. It is one of the first times in my career when I feel I can insist on it more aggressively.

Code is abundant, understanding is not

SICP says that programs must be written for people to read, and later adds that readers should know what not to read.6 I still agree with both points, but I think their practical implication changes in the agentic era.

If generated code becomes abundant, it becomes impossible for a human to read all of it with equal depth. That is not a moral failure. It is just arithmetic. The surface area grows too quickly.

So we need to move one level higher.

Instead of assuming the human must read every line with equal care, we should design systems so the core behavior can be reviewed through a much smaller surface: tests, invariants, contracts, and executable examples.

I am not claiming tests replace reading code. They do not. Bad tests can hide bad systems, just as bad abstractions can. But the essence of a system is often much smaller than its total implementation volume.

A team may generate or write a thousand lines of code, but what the system fundamentally does may fit in:

  • ten invariants
  • twenty meaningful examples
  • a handful of properties
  • a short list of input/output contracts

That smaller surface is something a human can actually hold in their head. It is something another engineer can review, an agent can execute repeatedly, and CI can verify without asking everybody to reread the entire implementation every time.

In other words, the human review surface should get smaller as code generation gets cheaper.

That is not a retreat from engineering rigor. It is an attempt to put rigor where it gives us the most leverage.

In a healthy system, a human reviewer should not need to reread the entire generated implementation to regain confidence. They should be able to look at a smaller behavioral surface and ask: Are the invariants still true? Do the core examples still hold? Did this change widen the contract or violate it? That is a far more realistic way to supervise generated code than pretending abundance did not change the review problem.

Oneness inside layers

When I say oneness, I am not arguing against layered systems. I am arguing that each layer should have one clear responsibility and one obvious place where certain truths become real.

For example:

  • There should be one place where a piece of data becomes valid.
  • One place where a business invariant is enforced.
  • One default command that gets a human or an agent into the project.
  • One declared artifact that owns environment setup where practical.
  • One obvious module that owns a transformation.

If the honest answer to those questions is often "it depends," the system is usually paying a complexity tax already.

This is also why I care so much about one source of truth. A scattered system forces every engineer to rebuild the same mental model from fragments. A coherent system lets them ask a smaller set of questions, because the data model and ownership model are clearer. That matters for humans and for agents too.

Sometimes the benefit is almost embarrassingly concrete. One declared artifact for environment setup is better than shell scripts, wiki instructions, and CI fragments all telling slightly different stories. One default project command is better than three nearly equivalent ways to run tests. One layer owning a business invariant is better than duplicating partial validation in the UI, the handler, the job runner, and the database and hoping they never drift apart.

Take something as ordinary as order creation. In a messy system, the shape of an order gets partially validated in the frontend, partially checked again in the HTTP handler, partially normalized in a background job, and partially constrained in the database. The tests mirror that fragmentation, so no single test surface tells you what an order is supposed to mean. A more coherent design gives one layer ownership of turning input into a valid order, one place where the invariants become real, and one test suite that expresses those rules directly. The total lines of code may not shrink much, but the number of places you need to look in order to trust the system absolutely does.

This is part of why I like tools and conventions that collapse scattered state into one place. A flake.nix7 can become the declared truth for environment setup. A self-documenting Makefile8 can become the obvious entry point into a project. A well-owned test suite can become the smallest trustworthy surface for reviewing behavioral changes. None of these ideas are glamorous. That is precisely why they age well.

An LLM works much better when there is one obvious command to run, one obvious directory to modify, one obvious test suite to extend, one obvious place where integrity checks belong, and one obvious owner for the relevant behavior. Ambiguity is expensive for humans, and it is even more expensive when delegated.

What oneness is not

Oneness is not:

  • one giant service
  • one giant function
  • one person making all decisions
  • refusal to use libraries
  • denial of layering
  • minimalism for its own sake

It is a bias. It is a design pressure. It says that unnecessary multiplicity should have to justify itself.

Sometimes reality will justify it. There are cases where multiple paths, multiple representations, or multiple deployment forms are the right answer. But the burden of proof should be on complexity, not on simplicity.

That is the part our industry often gets backwards.

What I mean in practice

When I look at a system now, especially in the presence of coding agents, I increasingly want to ask very plain questions:

  • What is the one thing this layer owns?
  • What is the one mental model for the data in this layer?
  • Where is the one place I check whether data is valid?
  • What is the one command I should run first?
  • What is the one test file or suite that best expresses intended behavior?
  • What is the smallest reviewable surface that captures the essence of this change?

Those questions do not solve every design problem. But they keep me oriented toward legibility and verification.

And that, to me, is the real opportunity of the LLM era. We can use these tools to generate more code, yes. But a much better use is to generate less confusion. We can use them to push systems toward one obvious path, one owned responsibility, one executable behavioral surface, and one source of truth where possible.

I do not think software becomes better by becoming more clever. I think it becomes better when it becomes more legible, more deterministic, and easier to verify.

In an era of abundant code, that is what I mean by oneness.

Footnotes:

1

C. A. R. Hoare, The Emperor's Old Clothes, the 1980 ACM Turing Award Lecture, published in Communications of the ACM 24(2), 1981: https://www.labouseur.com/projects/codeReckon/papers/The-Emperors-Old-Clothes.pdf The "two ways" passage appears on PDF p. 13.

3

My post on why declarative systems matter to me: https://www.birkey.co/2026-03-22-why-i-love-nixos.html

4

The GNU Emacs manual describes Emacs as a "self-documenting" editor and explains that this means you can use help commands at any time to find out what your options are and what commands do: https://www.gnu.org/software/emacs/manual/html_node/emacs/Intro.html

5

My earlier post on Emacs as a programmable workbench: https://www.birkey.co/2026-03-28-emacs-as-a-programmable-workbench.html

6

Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs, Preface to the first edition: https://sicp.sourceacademy.org/sicpjs.pdf The readability passage appears on PDF p. 24, including the line about programs being written for people to read and the follow-on point that readers should know what not to read.

7

Official Nix documentation on flakes: https://nix.dev/concepts/flakes.html

8

My post on self-documenting project entry points: https://www.birkey.co/2020-03-05-self-documenting-makefile.html

Tags: engineering ai
28 Mar 2026

Emacs as a programmable workbench

What I care about in Emacs has less to do with editing text and more to do with one idea: a stable workbench becomes more valuable as the surrounding tool landscape becomes more volatile.

Software engineering has never been only about typing code into files, but that is especially obvious now. More of the job is spent coordinating services, processes, tests, logs, prompts, REPLs1, diffs, generated artifacts, and feedback loops. LLMs amplify that change, but they do not alter the core of the work. If anything, they expose it more clearly. Generated code is abundant. Engineering judgment is not. The hard part is still turning tentative output into something inspectable, reproducible, composable, and reliable.

That is why the environment I work in matters. I want one place where that whole loop stays visible, and Emacs is best understood as exactly that: not a text editor in the narrow sense but a system you can shape around how you inspect, compose, verify, and iterate. The key abstraction is the buffer.

A file can live in a buffer. A terminal can live in a buffer. A REPL can live in a buffer. A compilation can live in a buffer. The output of a command can live in a buffer. A conversation can live in a buffer. Notes, prompts, logs, diffs, and half-formed ideas can live there too. Not every kind of computing reduces cleanly to a buffer, of course, but enough of software work does that the abstraction becomes powerful.

Buffers matter because they are working surfaces. They are inspectable, editable, searchable, programmable places where rough work can stay rough long enough to become better work.

That point gets clearer in a real workflow. Suppose I am using a coding agent to add a feature to a service. The agent runs in one terminal buffer. It proposes a patch. I inspect the diff in another buffer, jump to the changed source, and notice that the edge case handling is wrong. I run the failing test and keep the output open in a compilation buffer. I poke at the behavior in a REPL. I write a short note to myself about the invariant that actually matters. I refine the prompt, rerun the agent, and compare the new diff against the old one. When a command sequence turns out to be useful, I keep it. When the note proves durable, I turn it into documentation or code. What began as a loose collection of prompts, commands, output, guesses, and generated code becomes a more reliable artifact.

That is the real value of the workbench. The whole path from generation to verification to reuse stays visible and inspectable.

One principle has stayed constant across twenty years of using Emacs: I want the tools I depend on to feel native inside my workbench, not bolted on from the outside.

Many code-oriented tools, including coding agents, still do their best work through commands, files, patches, tests, and process output. If the terminal lives inside Emacs in an eterm buffer2, the agent is not working off to the sideโ€”it is operating in the same workbench where I am reading code, reviewing diffs, checking logs, and deciding what to do next.

Emacs does not magically make the work easy. What it does is keep the work inspectable, repeatable, and easier to automate well.

At this point the obvious objection is: any editor with plugin support or a configuration language does most of this. And it doesโ€”up to a point. But what I value in Emacs is not mere coexistence of tools. It is unified manipulation. The same editing model, navigation model, search model, history model, and programmability apply across many kinds of work. The successful one-off can be promoted into a habit, a function, a command, or a workflow without crossing several conceptual boundaries first. What Emacs buys me is that the integration is not a plugin I configureโ€”it is part of the same environment in which I inspect code, review output, capture notes, and shape reusable workflows. And because it is Emacs Lisp throughout, nothing is opaque: every behavior I depend on can be read, modified, and extended in the same language.

That same pattern matters even when no agent is involved. A REPL is not some foreign object. Compilation output is not a separate world. Version control is not exiled to another app. If a tool exposes text, process interaction, or a surface that can be inspected and shaped, Emacs can often bring it into the same workspace and let you build stable habits around it.

This is also why Emacs has aged unusually well. New tools keep appearing: agent shells, chat interfaces, MCP servers and clients3, debugging helpers, deployment wrappers, one-off scripts, and whatever comes next. The tools change, but the need does not. You still need a place to inspect what happened, adjust it, connect it to the rest of your workflow, and promote successful patterns into reusable ones.

Emacs is good at that because its core abstractions are durable. Text, buffers, processes, commands, functions, windows, and programmable transformation through Emacs Lisp are not fashion-driven ideas. They have held up because they map well to real work. When a new tool can speak through those abstractions, I do not have to start over mentally just because the industry has a new wrapper or a new brand name for the same basic activity.

For me, Emacs is still the best answer I know to that problem. It gives me one programmable place to think, stage, inspect, verify, compose, and gradually solidify work that often begins in a messy state.

That, to me, is the enduring value of Emacs.

Footnotes:

1

REPL stands for Read-Eval-Print Loop: an interactive environment where you enter an expression, the language evaluates it, and the result is printed immediately. The idea originated in Lisp and the dynamic Lisp family of languages, but most modern languages now offer one in some form.

2

eterm is my fork of EAT, a pure Emacs Lisp terminal emulator, modified to fit my workflow.

3

By MCP I mean Model Context Protocol, a standard way for tools and applications to expose capabilities to LLM-based clients. I wrote a more practical post about it here: MCP explained with code.

Tags: emacs ai
22 Mar 2026

Why I love NixOS

What I love about NixOS has less to do with Linux and more to do with the Nix package manager1.

To me, NixOS is the operating system artifact of a much more important idea: a deterministic and reproducible functional package manager. That is the core of why I love NixOS. It is not distro branding that I care about. It is the fact that I can construct a whole operating system as a deterministic result of feeding Nix DSL to Nix and then rebuild it, change it bit by bit, and roll it back if I do not like the result.

I love NixOS because most operating systems slowly turn into a pile of state. You install packages, tweak settings, try random tools, remove some of them, upgrade over time and after a while you have a machine that works but not in a way that you can confidently explain from first principles. NixOS felt very different to me. I do not have to trust a pile of state. I can define a system and build it.

I love NixOS because I can specify the whole OS including the packages I need and the configuration in one declarative setup. That one place aspect matters to me more than it might sound at first. I do not have to chase package choices in one place, desktop settings in another place and keyboard behavior somewhere else. Below are a couple of small Nix DSL examples.

environment.systemPackages = with pkgs; [
  gnomeExtensions.dash-to-dock
  gnomeExtensions.unite
  gnomeExtensions.appindicator
  libappindicator
];

services.desktopManager.gnome.extraGSettingsOverrides = ''
  [org.gnome.shell]
  enabled-extensions=['dash-to-dock@gnome-shell-extensions.gcampax.github.com', 'unite@hardpixel.eu', 'appindicatorsupport@rgcjonas.gmail.com']

  [org.gnome.shell.extensions.dash-to-dock]
  dock-position='BOTTOM'
  autohide=true
  dock-fixed=false
  extend-height=false
  transparency-mode='FIX'
'';
services.keyd = {
  enable = true;

  keyboards = {
    usb_keyboard = {
      ids = [ "usb:kb" ];
      settings.main = {
        leftcontrol = "leftmeta";
        leftmeta = "leftcontrol";
        rightalt = "rightmeta";
        rightmeta = "rightalt";
      };
    };

    laptop_keyboard = {
      ids = [ "laptop:kb" ];
      settings.main = swapLeftAltLeftControl;
    };
  };
};

Those are ordinary details of a working machine, but that is exactly the point. I can describe them declaratively, rebuild the system and keep moving. If I buy a new computer, I do not have to remember a long chain of manual setup steps or half-baked scripts scattered all over. I can rebuild the system from a single source of truth.

I love NixOS because it has been around for a long time. In my experience, it has been very stable. It has a predictable release cadence every six months. I can set it up to update automatically and upgrade it without the usual fear that tends to come with operating system upgrades. I do not have to think much about upgrade prompts, desktop notifications or random system drift in the background. It mostly stays out of my way. And if I want to be more adventurous, it also has an unstable channel2 that I can enable to experiment and get newer software.

I love NixOS because it lets my laptop be boring in the best possible sense. I recently bought an HP laptop3 and NixOS worked beautifully on it out of the box. I did not have to fight the hardware to get to a reasonable baseline. That gave me exactly what I want from a personal computer: a stable system that I can configure declaratively and then mostly ignore while I focus on actual work.

I love NixOS because it makes experimentation cheap and safe. I can try packages without mutating the base system. I can construct a completely isolated package shell4 for anything from a one-off script to a full-blown project. If I want to harden it further, I can use the Nix DSL to specify the dependencies, build steps and resulting artifacts declaratively. That is a much better way to work than slowly polluting my daily driver and hoping I can reconstruct what I did later.

I love NixOS because I can use the same package manager across macOS and Linux. There is also community-maintained support for FreeBSD, though I have not used it personally. That is a huge practical benefit because my development tooling and dependency management can stay mostly uniform across those systems. It means the value of Nix is not tied only to NixOS. NixOS happens to be the most complete expression of it, but the underlying model is useful to me across platforms.

I love NixOS because it fits especially well with the way I work in the current LLM coding era.

Tools are changing very quickly. Coding agents often need very specific versions of utilities, compilers and runtimes. They need to install something, use it, throw it away, try another version and keep going without turning my PC into a garbage dump of conflicting state. Nix fits that model naturally. If I tell a coding agent that I use Nix, it is usually clever enough to reach for nix shell or nix develop to bring the needed tool into an isolated environment and execute it there. That is especially handy because Nix treats tooling as a declared input instead of an accidental side effect on the system.

A concrete example: I recently built a voice-to-text agent in Rust5. I did not have the Rust toolchain installed on my system. I simply told the coding agent that I use Nix, and it figured out how to pull in the entire Rust toolchain through Nix, compile the project inside an isolated shell and produce a working binary. My base system was never touched. No ~/.cargo, no ~/.rustup, no mutated PATH entries left behind. Without Nix, the agent would have reached for curl | sh to install rustup, quietly mutated my environment and left my system slightly different forever. With Nix, none of that happened.

This pattern generalizes. Every time an agent needs Python 3.11 vs 3.12, a specific version of ffmpeg, an obscure CLI tool or a particular compiler, Nix gives it a clean and reversible way to get exactly what it needs. The agent does not have to guess whether a tool is already installed or in the wrong version. It just declares what it needs and Nix takes care of the rest in a sandboxed way.

The other thing I appreciate is that Nix turns an agent's experiment into something you can actually commit and reproduce. Once the agent has a working setup, you can capture the exact dependencies in a flake.nix and run nix flake check to verify it builds cleanly from scratch. That transforms an ad hoc agent session into a reproducible, verifiable artifact. That is a much stronger foundation for delivering something that works reliably in production than hoping the environment happens to be in the right shape on the next machine.

I love NixOS because I like what Nix gives me in deployment too. I have never been a big fan of Docker as the final answer to the "works on my machine" problem. It solved important problems for the industry, no doubt about that, but I always found the overall model less satisfying than a truly deterministic one. Nix gives me a much better story. I can use dockerTools.buildLayeredImage to build smaller Docker images in a deterministic and layered approach. If I can build it on one computer with the proper configuration, I can build the same artifact on another one as long as Nix supports the architecture, which in my experience has been very reliable.

That coherence is one of the things I value most about NixOS. The same underlying model helps me with my laptop, my shell, my project dependencies, my CI pipeline and my deployment artifact. It is one way of thinking about software instead of a loose collection of unrelated tools and habits.

So when I say I love NixOS, what I really mean is that I love what it represents. I love a system that is declarative, reproducible, reversible and stable. I love being able to experiment without fear and upgrade without drama. I love that it helps me focus on building and experimenting with fast-moving tools, including LLM coding agents, without worrying about messing up my system in the process.

I love NixOS because it is the most complete everyday expression of what I think software systems should be.

Footnotes:

1

If you are new to Nix, I wrote a more practical getting-started guide here: Nix: Better way for fun and profit.

2

By unstable channel I mean the official `nixos-unstable` or `nixpkgs-unstable` channels. See Channel branches and channels.nixos.org.

3

HP EliteBook X G1a 14 inch Notebook with 64 GiB RAM and AMD Ryzen AI 9 HX PRO 375.

4

For example, nix develop drops you into an interactive shell environment that is very close to what Nix would use to build the current package or project.

5

A voice-to-text agent I built in Rust that replaced Whisper and Willow Voice in my personal workflow. I wrote it first for macOS and then ported it to Linux. I have been using it as a daily driver for a couple of months now. I am considering open sourcing it or releasing it as a standalone app.

Tags: nix ai
28 Dec 2025

One csv parser to rule them all

One would think that parsing CSV files is pretty straightforward until you get bitten by all kinds of CSV files exists in the wild. Many years ago, I have written a small CSV reader with following requirements in mind:

The result is csvx. I update it to work across Clojure and ClojureScript both in NodeJS and browser environment. The entire code is less than 200 lines including comments and blank lines. If you find yourself in need of a csv reader with above requirements, you are welcome to steal the code. Enjoy!

Tags: Clojure ClojureScript
02 Aug 2025

Hacker news AI coding experience analysis

I have been using and experimenting with AI coding tools heavily for the last 3 months or so since joining Legion Health as a Founding Engineer. I was somewhat skeptical and approached using AI with suspicion since ChatGPT came out. I use Emacs as my workbench and optimized my workflow around using it as a terminal multiplexer, which naturally fits with Claude Code that I use as my main programming assistant. Below is my simple setup that might benefit other fellow minimalist Emacs users.

(use-package eat :ensure t :config
  (setq eat-term-name "xterm-256color" eat-kill-buffer-on-exit t
        process-adaptive-read-buffering nil eat-term-scrollback-size 500000)
  (define-key eat-semi-char-mode-map [?\s-v] #'eat-yank)
  (define-key eat-semi-char-mode-map [?\C-c ?\C-r] #'k/eat-redisplay))
(defun k/eat-redisplay ()
"Fix eat flicker/flash and display funkiness"
(interactive)
(unless (derived-mode-p 'eat-mode)
  (error "Not in an eat-mode buffer"))
(when (and (boundp 'eat-mode) eat-mode (boundp 'eat-terminal) eat-terminal)
  (let* ((process (eat-term-parameter eat-terminal 'eat--process))
         (window (get-buffer-window (current-buffer))))
    (if (and process (process-live-p process) window)
        (eat--adjust-process-window-size process (list window)))))
(setq-local window-adjust-process-window-size-function
            'window-adjust-process-window-size-smallest)
(goto-char (point-min))
(redisplay)
(goto-char (point-max))
(redisplay)
(setq-local window-adjust-process-window-size-function 'ignore))

I start an eat shell and run:

cd ~/repos/project-x && claude

This is a fast moving landscape and I find following few points are extremely helpful in my workflow:

On top of that, you need to make sure Claude has access to tools that enhances its ability to look up relevant information. To provide more balanced overview of the AI coding experience, I used a great data analysis tool for Hacker News called CamelAI. Below are the result that more or less resonates with my personal experience.

๐Ÿ† Top Stories by Engagement

  • Claude 3.7 Sonnet and Claude Code (2,127 points) ๐ŸŸข
    • Overwhelmingly positive reception for AI coding capabilities
    • Demonstrates Claude's dominance in the space
  • Cursor IDE lockout policy problems (1,511 points) ๐Ÿ”ด
    • Major backlash against policy changes causing user cancellations
    • Shows fragility of user trust in AI tools
  • AlphaEvolve: Gemini coding agent (1,036 points) ๐Ÿš€
    • Google's advanced algorithm design agent
    • High interest in autonomous coding capabilities
  • "Enough AI copilots, we need AI HUDs" (964 points) ๐ŸŽ›๏ธ
    • Forward-thinking discussion about UI evolution
    • Community wants more integrated experiences
  • Void: Open-source Cursor alternative (948 points) ๐Ÿ”“
    • Strong demand for open-source alternatives
    • Privacy and control concerns driving adoption

๐Ÿ“Š Key Trends & Patterns

  • ๐ŸŽฏ Claude Dominance in AI Coding
    • Evidence: Claude 3.7 Sonnet (2,127 pts), consistent praise in experience stories
    • Insight: Anthropic's Claude has emerged as the clear leader for serious coding work, with developers consistently praising its code quality and reasoning capabilities over competitors
  • โšก Tool Fragmentation & User Frustration
    • Evidence: Cursor problems (1,511 pts), multiple "stopped using AI" stories (365, 109 pts)
    • Insight: Users are jumping between tools due to policy changes, reliability issues, and unmet expectations. No single tool has achieved universal satisfaction, leading to "tool fatigue"
  • ๐Ÿ”„ The Productivity Paradox
    • Evidence: "Anyone struggling to get value out of coding LLMs?" (345 pts), productivity studies showing mixed results
    • Insight: Despite massive hype, many developers struggle to see concrete productivity gains. The "almost right" code problem creates hidden productivity taxes that offset benefits
  • ๐Ÿง  Cognitive Dependency Concerns
    • Evidence: "After months of coding with LLMs, I'm going back to using my brain" (365 pts)
    • Insight: Growing concern about over-reliance on AI leading to skill atrophy and reduced problem-solving capabilities among developers
  • ๐Ÿข Enterprise vs Individual Experience Gap
    • Evidence: Microsoft 365 Copilot disaster (602 pts) vs individual success stories
    • Insight: Stark divide between enterprise rollout failures and individual developer successes. Enterprise context adds complexity that current tools struggle with
  • ๐Ÿ”“ Open Source Alternative Movement
    • Evidence: Void alternative (948 pts), Tabby self-hosted (366 pts)
    • Insight: Strong demand for open-source, self-hosted alternatives driven by privacy concerns, cost considerations, and desire for control

๐ŸŽฏ Engineer Experience Patterns

  • ๐ŸŸข Positive Experiences
    • Who: Experienced developers using AI as an enhancement tool
    • Patterns: Claude-based tools getting consistent praise, terminal-based tools popular with power users
    • Benefits: Code generation, debugging assistance, learning new patterns
    • Key Success Factor: Using AI to amplify existing skills, not replace them
  • ๐Ÿ”ด Negative Experiences
    • Who: Beginners over-relying on AI, enterprise users with complex requirements
    • Patterns: Policy changes causing churn, productivity promises not materializing, "almost right" code creating more work
    • Problems: Skill degradation, tool reliability issues, hidden productivity costs
    • Key Failure Factor: Expecting AI to replace fundamental programming knowledge
  • ๐ŸŸก Mixed Experiences
    • Who: Pragmatic developers experimenting with different approaches
    • Patterns: Tools working well for specific use cases, steeper learning curve than expected, context-dependent effectiveness
    • Insight: Success heavily depends on matching use case, experience level, and realistic expectations

๐Ÿ“ˆ Temporal Evolution (2024-2025)

  • Early 2024: Initial hype phase - GitHub Copilot going free, new tool launches
  • Mid 2024: Reality check phase - limitations becoming apparent, user frustrations mounting
  • Late 2024: Maturation phase - Claude emerges as leader, tool fragmentation increases
  • Early 2025: Sophistication phase - Claude 3.7/Code dominance, better understanding of limitations
  • Mid 2025: Pragmatic phase - Focus on specific use cases, open-source alternatives, realistic expectations

๐ŸŽฏ Critical Insights for Engineers

  • ๐Ÿ’ช Skill Foundation is Critical
    • AI tools amplify existing programming skills rather than replace them. Developers who understand fundamentals see the most benefit.
  • ๐ŸŽฏ Context Matters Enormously
    • Success depends heavily on use case, project complexity, and domain. There's no universal "AI coding works" or "doesn't work."
  • ๐Ÿ”ง Tool Landscape is Rapidly Changing
    • Claude-based tools currently leading, but the landscape shifts quickly. Expect to try multiple approaches.
  • โš ๏ธ Cognitive Risks are Real
    • Over-reliance can lead to skill degradation. Many successful developers use AI selectively while maintaining core problem-solving abilities.
  • ๐Ÿ“Š Productivity Benefits are Mixed
    • Benefits exist but are often not as dramatic as promised. The "almost right" problem creates hidden costs that offset gains.
  • ๐Ÿข Enterprise Success โ‰  Individual Success
    • Individual developer success doesn't guarantee organizational success. Enterprise complexity creates additional challenges.

๐Ÿ”ฎ Future Outlook

  • ๐ŸŽฏ Specialization: Tools becoming more domain-specific and context-aware
  • ๐Ÿค Hybrid Workflows: Combination of AI assistance and traditional coding becoming the norm
  • ๐Ÿ”ฌ Better Metrics: More sophisticated ways to measure actual productivity impact
  • ๐ŸŽ“ Education Evolution: Teaching AI-assisted development as a core skill
  • ๐Ÿ”“ Democratization: More open-source and self-hosted options emerging
  • ๐ŸŽ›๏ธ UI Innovation: Moving beyond copilots to more integrated experiences (AI HUDs)

๐Ÿ” Specific Tool Performance

  • ๐Ÿฅ‡ Claude: Clear winner for code quality, reasoning, and complex tasks
  • ๐Ÿฅˆ Cursor: Popular but plagued by policy and reliability issues
  • ๐Ÿฅ‰ GitHub Copilot: Solid mainstream choice, good accessibility for beginners
  • ๐Ÿ”“ Open Source (Void/Tabby): Rising alternatives for privacy/control-conscious developers
  • โš ๏ธ Enterprise Tools: Microsoft 365 Copilot struggled badly in enterprise rollouts

๐Ÿ’ก Bottom Line

  • The AI coding experience is highly polarized - it works exceptionally well for some developers in specific contexts, but fails to deliver promised productivity gains for many others. Success requires:
    • Matching the right tool to the right use case
    • Maintaining realistic expectations
    • Preserving core programming skills
    • Understanding tool limitations
    • Being prepared to adapt as the landscape evolves
Tags: AI
Other posts