Emacs as a programmable workbench
What I care about in Emacs has less to do with editing text and more to do with one idea: a stable workbench becomes more valuable as the surrounding tool landscape becomes more volatile.
Software engineering has never been only about typing code into files, but that is especially obvious now. More of the job is spent coordinating services, processes, tests, logs, prompts, REPLs1, diffs, generated artifacts, and feedback loops. LLMs amplify that change, but they do not alter the core of the work. If anything, they expose it more clearly. Generated code is abundant. Engineering judgment is not. The hard part is still turning tentative output into something inspectable, reproducible, composable, and reliable.
That is why the environment I work in matters. I want one place where that whole loop stays visible, and Emacs is best understood as exactly that: not a text editor in the narrow sense but a system you can shape around how you inspect, compose, verify, and iterate. The key abstraction is the buffer.
A file can live in a buffer. A terminal can live in a buffer. A REPL can live in a buffer. A compilation can live in a buffer. The output of a command can live in a buffer. A conversation can live in a buffer. Notes, prompts, logs, diffs, and half-formed ideas can live there too. Not every kind of computing reduces cleanly to a buffer, of course, but enough of software work does that the abstraction becomes powerful.
Buffers matter because they are working surfaces. They are inspectable, editable, searchable, programmable places where rough work can stay rough long enough to become better work.
That point gets clearer in a real workflow. Suppose I am using a coding agent to add a feature to a service. The agent runs in one terminal buffer. It proposes a patch. I inspect the diff in another buffer, jump to the changed source, and notice that the edge case handling is wrong. I run the failing test and keep the output open in a compilation buffer. I poke at the behavior in a REPL. I write a short note to myself about the invariant that actually matters. I refine the prompt, rerun the agent, and compare the new diff against the old one. When a command sequence turns out to be useful, I keep it. When the note proves durable, I turn it into documentation or code. What began as a loose collection of prompts, commands, output, guesses, and generated code becomes a more reliable artifact.
That is the real value of the workbench. The whole path from generation to verification to reuse stays visible and inspectable.
One principle has stayed constant across twenty years of using Emacs: I want the tools I depend on to feel native inside my workbench, not bolted on from the outside.
Many code-oriented tools, including coding agents, still do their best
work through commands, files, patches, tests, and process output. If
the terminal lives inside Emacs in an eterm buffer2, the
agent is not working off to the sideโit is operating in the same
workbench where I am reading code, reviewing diffs, checking logs, and
deciding what to do next.
Emacs does not magically make the work easy. What it does is keep the work inspectable, repeatable, and easier to automate well.
At this point the obvious objection is: any editor with plugin support or a configuration language does most of this. And it doesโup to a point. But what I value in Emacs is not mere coexistence of tools. It is unified manipulation. The same editing model, navigation model, search model, history model, and programmability apply across many kinds of work. The successful one-off can be promoted into a habit, a function, a command, or a workflow without crossing several conceptual boundaries first. What Emacs buys me is that the integration is not a plugin I configureโit is part of the same environment in which I inspect code, review output, capture notes, and shape reusable workflows. And because it is Emacs Lisp throughout, nothing is opaque: every behavior I depend on can be read, modified, and extended in the same language.
That same pattern matters even when no agent is involved. A REPL is not some foreign object. Compilation output is not a separate world. Version control is not exiled to another app. If a tool exposes text, process interaction, or a surface that can be inspected and shaped, Emacs can often bring it into the same workspace and let you build stable habits around it.
This is also why Emacs has aged unusually well. New tools keep appearing: agent shells, chat interfaces, MCP servers and clients3, debugging helpers, deployment wrappers, one-off scripts, and whatever comes next. The tools change, but the need does not. You still need a place to inspect what happened, adjust it, connect it to the rest of your workflow, and promote successful patterns into reusable ones.
Emacs is good at that because its core abstractions are durable. Text, buffers, processes, commands, functions, windows, and programmable transformation through Emacs Lisp are not fashion-driven ideas. They have held up because they map well to real work. When a new tool can speak through those abstractions, I do not have to start over mentally just because the industry has a new wrapper or a new brand name for the same basic activity.
For me, Emacs is still the best answer I know to that problem. It gives me one programmable place to think, stage, inspect, verify, compose, and gradually solidify work that often begins in a messy state.
That, to me, is the enduring value of Emacs.
Footnotes:
REPL stands for Read-Eval-Print Loop: an interactive environment where you enter an expression, the language evaluates it, and the result is printed immediately. The idea originated in Lisp and the dynamic Lisp family of languages, but most modern languages now offer one in some form.
By MCP I mean Model Context Protocol, a standard way for tools and applications to expose capabilities to LLM-based clients. I wrote a more practical post about it here: MCP explained with code.
Why I love NixOS
What I love about NixOS has less to do with Linux and more to do with the Nix package manager1.
To me, NixOS is the operating system artifact of a much more important idea: a deterministic and reproducible functional package manager. That is the core of why I love NixOS. It is not distro branding that I care about. It is the fact that I can construct a whole operating system as a deterministic result of feeding Nix DSL to Nix and then rebuild it, change it bit by bit, and roll it back if I do not like the result.
I love NixOS because most operating systems slowly turn into a pile of state. You install packages, tweak settings, try random tools, remove some of them, upgrade over time and after a while you have a machine that works but not in a way that you can confidently explain from first principles. NixOS felt very different to me. I do not have to trust a pile of state. I can define a system and build it.
I love NixOS because I can specify the whole OS including the packages I need and the configuration in one declarative setup. That one place aspect matters to me more than it might sound at first. I do not have to chase package choices in one place, desktop settings in another place and keyboard behavior somewhere else. Below are a couple of small Nix DSL examples.
- GNOME extensions:
environment.systemPackages = with pkgs; [ gnomeExtensions.dash-to-dock gnomeExtensions.unite gnomeExtensions.appindicator libappindicator ]; services.desktopManager.gnome.extraGSettingsOverrides = '' [org.gnome.shell] enabled-extensions=['dash-to-dock@gnome-shell-extensions.gcampax.github.com', 'unite@hardpixel.eu', 'appindicatorsupport@rgcjonas.gmail.com'] [org.gnome.shell.extensions.dash-to-dock] dock-position='BOTTOM' autohide=true dock-fixed=false extend-height=false transparency-mode='FIX' '';
- Key mapping per keyboard:
services.keyd = {
enable = true;
keyboards = {
usb_keyboard = {
ids = [ "usb:kb" ];
settings.main = {
leftcontrol = "leftmeta";
leftmeta = "leftcontrol";
rightalt = "rightmeta";
rightmeta = "rightalt";
};
};
laptop_keyboard = {
ids = [ "laptop:kb" ];
settings.main = swapLeftAltLeftControl;
};
};
};
Those are ordinary details of a working machine, but that is exactly the point. I can describe them declaratively, rebuild the system and keep moving. If I buy a new computer, I do not have to remember a long chain of manual setup steps or half-baked scripts scattered all over. I can rebuild the system from a single source of truth.
I love NixOS because it has been around for a long time. In my experience, it has been very stable. It has a predictable release cadence every six months. I can set it up to update automatically and upgrade it without the usual fear that tends to come with operating system upgrades. I do not have to think much about upgrade prompts, desktop notifications or random system drift in the background. It mostly stays out of my way. And if I want to be more adventurous, it also has an unstable channel2 that I can enable to experiment and get newer software.
I love NixOS because it lets my laptop be boring in the best possible sense. I recently bought an HP laptop3 and NixOS worked beautifully on it out of the box. I did not have to fight the hardware to get to a reasonable baseline. That gave me exactly what I want from a personal computer: a stable system that I can configure declaratively and then mostly ignore while I focus on actual work.
I love NixOS because it makes experimentation cheap and safe. I can try packages without mutating the base system. I can construct a completely isolated package shell4 for anything from a one-off script to a full-blown project. If I want to harden it further, I can use the Nix DSL to specify the dependencies, build steps and resulting artifacts declaratively. That is a much better way to work than slowly polluting my daily driver and hoping I can reconstruct what I did later.
I love NixOS because I can use the same package manager across macOS and Linux. There is also community-maintained support for FreeBSD, though I have not used it personally. That is a huge practical benefit because my development tooling and dependency management can stay mostly uniform across those systems. It means the value of Nix is not tied only to NixOS. NixOS happens to be the most complete expression of it, but the underlying model is useful to me across platforms.
I love NixOS because it fits especially well with the way I work in the current LLM coding era.
Tools are changing very quickly. Coding agents often need very
specific versions of utilities, compilers and runtimes. They need to
install something, use it, throw it away, try another version and keep
going without turning my PC into a garbage dump of conflicting
state. Nix fits that model naturally. If I tell a coding agent that I
use Nix, it is usually clever enough to reach for nix shell or
nix develop to bring the needed tool into an isolated environment
and execute it there. That is especially handy because Nix treats
tooling as a declared input instead of an accidental side effect on
the system.
A concrete example: I recently built a voice-to-text agent in
Rust5. I did not have the Rust toolchain installed on my system. I
simply told the coding agent that I use Nix, and it figured out how
to pull in the entire Rust toolchain through Nix, compile the project
inside an isolated shell and produce a working binary. My base system
was never touched. No ~/.cargo, no ~/.rustup, no mutated PATH
entries left behind. Without Nix, the agent would have reached for
curl | sh to install rustup, quietly mutated my environment and
left my system slightly different forever. With Nix, none of that
happened.
This pattern generalizes. Every time an agent needs Python 3.11 vs
3.12, a specific version of ffmpeg, an obscure CLI tool or a
particular compiler, Nix gives it a clean and reversible way to get
exactly what it needs. The agent does not have to guess whether a
tool is already installed or in the wrong version. It just declares
what it needs and Nix takes care of the rest in a sandboxed way.
The other thing I appreciate is that Nix turns an agent's
experiment into something you can actually commit and reproduce. Once
the agent has a working setup, you can capture the exact dependencies
in a flake.nix and run nix flake check to verify it builds
cleanly from scratch. That transforms an ad hoc agent session into a
reproducible, verifiable artifact. That is a much stronger foundation
for delivering something that works reliably in production than hoping
the environment happens to be in the right shape on the next
machine.
I love NixOS because I like what Nix gives me in deployment too. I have never been a big fan of Docker as the final answer to the "works on my machine" problem. It solved important problems for the industry, no doubt about that, but I always found the overall model less satisfying than a truly deterministic one. Nix gives me a much better story. I can use dockerTools.buildLayeredImage to build smaller Docker images in a deterministic and layered approach. If I can build it on one computer with the proper configuration, I can build the same artifact on another one as long as Nix supports the architecture, which in my experience has been very reliable.
That coherence is one of the things I value most about NixOS. The same underlying model helps me with my laptop, my shell, my project dependencies, my CI pipeline and my deployment artifact. It is one way of thinking about software instead of a loose collection of unrelated tools and habits.
So when I say I love NixOS, what I really mean is that I love what it represents. I love a system that is declarative, reproducible, reversible and stable. I love being able to experiment without fear and upgrade without drama. I love that it helps me focus on building and experimenting with fast-moving tools, including LLM coding agents, without worrying about messing up my system in the process.
I love NixOS because it is the most complete everyday expression of what I think software systems should be.
Footnotes:
If you are new to Nix, I wrote a more practical getting-started guide here: Nix: Better way for fun and profit.
By unstable channel I mean the official `nixos-unstable` or `nixpkgs-unstable` channels. See Channel branches and channels.nixos.org.
HP EliteBook X G1a 14 inch Notebook with 64 GiB RAM and AMD Ryzen AI 9 HX PRO 375.
For example, nix develop drops you into an interactive shell environment that is very close to what Nix would use to build the current package or project.
A voice-to-text agent I built in Rust that replaced Whisper and Willow Voice in my personal workflow. I wrote it first for macOS and then ported it to Linux. I have been using it as a daily driver for a couple of months now. I am considering open sourcing it or releasing it as a standalone app.
One csv parser to rule them all
One would think that parsing CSV files is pretty straightforward until you get bitten by all kinds of CSV files exists in the wild. Many years ago, I have written a small CSV reader with following requirements in mind:
- Should not depend on any other code other than Clojure
- Should allow me to control how I tokenize and transform lines
- Should allow me to have complete controll over delimiting charactor or charactors, file encoding, amount of lines to read and error handling
The result is csvx. I update it to work across Clojure and ClojureScript both in NodeJS and browser environment. The entire code is less than 200 lines including comments and blank lines. If you find yourself in need of a csv reader with above requirements, you are welcome to steal the code. Enjoy!
Hacker news AI coding experience analysis
I have been using and experimenting with AI coding tools heavily for the last 3 months or so since joining Legion Health as a Founding Engineer. I was somewhat skeptical and approached using AI with suspicion since ChatGPT came out. I use Emacs as my workbench and optimized my workflow around using it as a terminal multiplexer, which naturally fits with Claude Code that I use as my main programming assistant. Below is my simple setup that might benefit other fellow minimalist Emacs users.
(use-package eat :ensure t :config
(setq eat-term-name "xterm-256color" eat-kill-buffer-on-exit t
process-adaptive-read-buffering nil eat-term-scrollback-size 500000)
(define-key eat-semi-char-mode-map [?\s-v] #'eat-yank)
(define-key eat-semi-char-mode-map [?\C-c ?\C-r] #'k/eat-redisplay))
(defun k/eat-redisplay ()
"Fix eat flicker/flash and display funkiness"
(interactive)
(unless (derived-mode-p 'eat-mode)
(error "Not in an eat-mode buffer"))
(when (and (boundp 'eat-mode) eat-mode (boundp 'eat-terminal) eat-terminal)
(let* ((process (eat-term-parameter eat-terminal 'eat--process))
(window (get-buffer-window (current-buffer))))
(if (and process (process-live-p process) window)
(eat--adjust-process-window-size process (list window)))))
(setq-local window-adjust-process-window-size-function
'window-adjust-process-window-size-smallest)
(goto-char (point-min))
(redisplay)
(goto-char (point-max))
(redisplay)
(setq-local window-adjust-process-window-size-function 'ignore))
I start an eat shell and run:
cd ~/repos/project-x && claude
This is a fast moving landscape and I find following few points are extremely helpful in my workflow:
- Spend few minutes to gather up and feed context related to what needs to be done to Claude at the start of a session.
- Always ask to show a plan and instruct Claude for guidance if there multiple options for a solution.
- Provide a skeleton such as directory structure, file names, function names and signatures.
- Provide use case and acceptance criteria testing instructions.
On top of that, you need to make sure Claude has access to tools that enhances its ability to look up relevant information. To provide more balanced overview of the AI coding experience, I used a great data analysis tool for Hacker News called CamelAI. Below are the result that more or less resonates with my personal experience.
๐ Top Stories by Engagement
- Claude 3.7 Sonnet and Claude Code (2,127 points) ๐ข
- Overwhelmingly positive reception for AI coding capabilities
- Demonstrates Claude's dominance in the space
- Cursor IDE lockout policy problems (1,511 points) ๐ด
- Major backlash against policy changes causing user cancellations
- Shows fragility of user trust in AI tools
- AlphaEvolve: Gemini coding agent (1,036 points) ๐
- Google's advanced algorithm design agent
- High interest in autonomous coding capabilities
- "Enough AI copilots, we need AI HUDs" (964 points) ๐๏ธ
- Forward-thinking discussion about UI evolution
- Community wants more integrated experiences
- Void: Open-source Cursor alternative (948 points) ๐
- Strong demand for open-source alternatives
- Privacy and control concerns driving adoption
๐ Key Trends & Patterns
- ๐ฏ Claude Dominance in AI Coding
- Evidence: Claude 3.7 Sonnet (2,127 pts), consistent praise in experience stories
- Insight: Anthropic's Claude has emerged as the clear leader for serious coding work, with developers consistently praising its code quality and reasoning capabilities over competitors
- โก Tool Fragmentation & User Frustration
- Evidence: Cursor problems (1,511 pts), multiple "stopped using AI" stories (365, 109 pts)
- Insight: Users are jumping between tools due to policy changes, reliability issues, and unmet expectations. No single tool has achieved universal satisfaction, leading to "tool fatigue"
- ๐ The Productivity Paradox
- Evidence: "Anyone struggling to get value out of coding LLMs?" (345 pts), productivity studies showing mixed results
- Insight: Despite massive hype, many developers struggle to see concrete productivity gains. The "almost right" code problem creates hidden productivity taxes that offset benefits
- ๐ง Cognitive Dependency Concerns
- Evidence: "After months of coding with LLMs, I'm going back to using my brain" (365 pts)
- Insight: Growing concern about over-reliance on AI leading to skill atrophy and reduced problem-solving capabilities among developers
- ๐ข Enterprise vs Individual Experience Gap
- Evidence: Microsoft 365 Copilot disaster (602 pts) vs individual success stories
- Insight: Stark divide between enterprise rollout failures and individual developer successes. Enterprise context adds complexity that current tools struggle with
- ๐ Open Source Alternative Movement
- Evidence: Void alternative (948 pts), Tabby self-hosted (366 pts)
- Insight: Strong demand for open-source, self-hosted alternatives driven by privacy concerns, cost considerations, and desire for control
๐ฏ Engineer Experience Patterns
- ๐ข Positive Experiences
- Who: Experienced developers using AI as an enhancement tool
- Patterns: Claude-based tools getting consistent praise, terminal-based tools popular with power users
- Benefits: Code generation, debugging assistance, learning new patterns
- Key Success Factor: Using AI to amplify existing skills, not replace them
- ๐ด Negative Experiences
- Who: Beginners over-relying on AI, enterprise users with complex requirements
- Patterns: Policy changes causing churn, productivity promises not materializing, "almost right" code creating more work
- Problems: Skill degradation, tool reliability issues, hidden productivity costs
- Key Failure Factor: Expecting AI to replace fundamental programming knowledge
- ๐ก Mixed Experiences
- Who: Pragmatic developers experimenting with different approaches
- Patterns: Tools working well for specific use cases, steeper learning curve than expected, context-dependent effectiveness
- Insight: Success heavily depends on matching use case, experience level, and realistic expectations
๐ Temporal Evolution (2024-2025)
- Early 2024: Initial hype phase - GitHub Copilot going free, new tool launches
- Mid 2024: Reality check phase - limitations becoming apparent, user frustrations mounting
- Late 2024: Maturation phase - Claude emerges as leader, tool fragmentation increases
- Early 2025: Sophistication phase - Claude 3.7/Code dominance, better understanding of limitations
- Mid 2025: Pragmatic phase - Focus on specific use cases, open-source alternatives, realistic expectations
๐ฏ Critical Insights for Engineers
- ๐ช Skill Foundation is Critical
- AI tools amplify existing programming skills rather than replace them. Developers who understand fundamentals see the most benefit.
- ๐ฏ Context Matters Enormously
- Success depends heavily on use case, project complexity, and domain. There's no universal "AI coding works" or "doesn't work."
- ๐ง Tool Landscape is Rapidly Changing
- Claude-based tools currently leading, but the landscape shifts quickly. Expect to try multiple approaches.
- โ ๏ธ Cognitive Risks are Real
- Over-reliance can lead to skill degradation. Many successful developers use AI selectively while maintaining core problem-solving abilities.
- ๐ Productivity Benefits are Mixed
- Benefits exist but are often not as dramatic as promised. The "almost right" problem creates hidden costs that offset gains.
- ๐ข Enterprise Success โ Individual Success
- Individual developer success doesn't guarantee organizational success. Enterprise complexity creates additional challenges.
๐ฎ Future Outlook
- ๐ฏ Specialization: Tools becoming more domain-specific and context-aware
- ๐ค Hybrid Workflows: Combination of AI assistance and traditional coding becoming the norm
- ๐ฌ Better Metrics: More sophisticated ways to measure actual productivity impact
- ๐ Education Evolution: Teaching AI-assisted development as a core skill
- ๐ Democratization: More open-source and self-hosted options emerging
- ๐๏ธ UI Innovation: Moving beyond copilots to more integrated experiences (AI HUDs)
๐ Specific Tool Performance
- ๐ฅ Claude: Clear winner for code quality, reasoning, and complex tasks
- ๐ฅ Cursor: Popular but plagued by policy and reliability issues
- ๐ฅ GitHub Copilot: Solid mainstream choice, good accessibility for beginners
- ๐ Open Source (Void/Tabby): Rising alternatives for privacy/control-conscious developers
- โ ๏ธ Enterprise Tools: Microsoft 365 Copilot struggled badly in enterprise rollouts
๐ก Bottom Line
- The AI coding experience is highly polarized - it works exceptionally well for some developers in specific contexts, but fails to deliver promised productivity gains for many others. Success requires:
- Matching the right tool to the right use case
- Maintaining realistic expectations
- Preserving core programming skills
- Understanding tool limitations
- Being prepared to adapt as the landscape evolves
MCP explained with code
So you are curios about how Model Context Protocol works as a stand-alone client and server, and as a hub to provide context for LLM models to do their magic? Then read on.
In this post, I am going to write about following few points:
- Why MCP?
- How does it work?
- What makes it worth learning and excited about?
Disclaimer: This blog post is not about making a claim of its merit only without its trade offs, which is a blog post for another day and another time. So take everything you read here with a grain of salt.
Why MCP?
LLMs needs context to narrow its pattern recognitions so it can provide more contextual help for the domain you are engaging them with. MCP provides a plug and play way to do just that. I am not going to repeat the excellent documentation from its web site here but I'd like to drive the point a bit further expanding upon their USB-C analogy. USB-C is a universal standard (interface to be exact) that every device manufactures uses so their device can be plugged into a USB port of a computer. Once a device (think of it as a MCP Server) connects to USB-C port, it exchanges information with the host about its capabilities using a USB-C subsystem of your PC (think of it as a MCP client) so you can communicate with the device. MCP client/server behaves just like that and we are ready to the next section to see the actual code so you can run to convince yourself.
How does it work?
To understand any concept, I usually take it apart to its smallest logical units to see them in actions. Below are all you need to run a stand-alone MCP client and server in TypeScript:
// Core bits of Server here. See full code:
// https://github.com/oneness/ts-mcp-client-server/blob/main/src/server.ts
class MCPServer {
... // omitted boiler plate code
private setupToolHandlers() {
// Handle list_tools requests
this.server.setRequestHandler(ListToolsRequestSchema, async (): Promise<ListToolsResult> => {
return {
tools: [
{
name: "say_hello",
description: "Says hello to a person",
inputSchema: {
type: "object",
properties: {
name: {
type: "string",
description: "The name of the person to greet",
},
},
required: ["name"],
},
} as Tool,
{
name: "get_time",
description: "Gets the current time",
inputSchema: {
type: "object",
properties: {},
},
} as Tool,
],
};
});
// Handle call_tool requests
this.server.setRequestHandler(CallToolRequestSchema, async (request): Promise<CallToolResult> => {
const { name, arguments: args } = request.params;
switch (name) {
case "say_hello":
const personName = args?.name || "World";
return {
content: [
{
type: "text",
text: `Hello, ${personName}! This is a greeting from the MCP server.`,
},
],
};
case "get_time":
return {
content: [
{
type: "text",
text: `Current time: ${new Date().toISOString()}`,
},
],
};
default:
throw new Error(`Unknown tool: ${name}`);
}
});
}
... // omitted
}
// Client
// https://github.com/oneness/ts-mcp-client-server/blob/main/src/client.ts
class MCPClient {
... // omitted
async listTools() {
try {
const response = await this.client.listTools() as ListToolsResult;
console.log("Available tools:");
response.tools.forEach((tool) => {
console.log(`- ${tool.name}: ${tool.description}`);
});
return response.tools;
} catch (error) {
console.error("Error listing tools:", error);
return [];
}
}
async callTool(name: string, args: any = {}) {
try {
const response = await this.client.callTool({
name,
arguments: args,
}) as CallToolResult;
console.log(`Tool '${name}' response:`);
response.content.forEach((content) => {
if (content.type === "text") {
console.log(content.text);
}
});
return response;
} catch (error) {
console.error(`Error calling tool '${name}':`, error);
}
}
... //omitted
}
As you have noticed, there are few interfaces (functions with args and return schema ) that you need to implement so client and server can speak to each other using Request/Response format that they understand (JSON2.0 RPC if you are curious).
Now you understand how a stand-alone MCP client and server communicates with each other (I highly recommend you clone the repo and run `npm run mcp` to see it in action), Let us see the actual LLM chat that uses MCP client to talk to LLM by providing MCP server capabilities as a context to it. LLM can use the structured data (MCP capabilities) to determine what MCP server tool it can use to answer user's request, which in turn MCP client uses to execute the tool. Then the tool response is provided back to LLM so it can formulate a final response back to the user. Here is the simple ASCII diagram that visualizes the flow I am talking about:
+----------+ +------------+ +-----------------+ +-----+
| User | | MCP Client | | MCP Server Tool | | LLM |
+----------+ +------------+ +-----------------+ +-----+
| | | |
| | 0. Connect & Get | |
| | Capabilities | |
| +----------------->| |
| |<-----------------+ |
| | (Capabilities now known by Client) |
| | | |
| | 1. Setup System | |
| | Prompt w/ MCP | |
| | Capabilities | |
| +------------------------------------>|
| | | |
| | | (LLM is now |
| | | aware of tools) |
| | | |
| 2. Chat Req. | | |
+---------------->| | |
| | | |
| | 3. User Query | |
| | (subsequent) | |
| +------------------------------------>|
| | | |
| | | 4. Process Query |
| | | & Context |
| | | |
| | |<-----------------| (Tool identified?)
| | | 5. Tool Exec. |
| | | Request |
| |<------------------------------------|
| | | |
| | 6. Execute Tool | |
| +----------------->| |
| | | |
| |<-----------------+ |
| | 7. Tool Output | |
| | | |
| | 8. Tool Output | |
| +------------------------------------>|
| | | |
| | | |
| | | |
| |<------------------------------------|
| | | 9. Final Response|
|<----------------+ | |
| 10. Chat Resp. | | |
Here is the code that shows above flow in action:
// Omitted for brevity. See below link for details
// https://github.com/oneness/ts-mcp-client-server/blob/main/src/llm.ts#L69
async processMessage(userMessage: string): Promise<string> {
console.log(`\n๐ค Processing: "${userMessage}"`);
// Add user message to conversation
this.conversationHistory.push({
role: "user",
content: userMessage
});
try {
// Prepare tools for Claude
const tools = this.convertMCPToolsToAnthropicFormat();
// Get Claude's response
const response = await this.anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1000,
system: this.systemPrompt,
messages: this.conversationHistory,
tools: tools.length > 0 ? tools : undefined,
});
console.log(`๐ง Claude response:`, JSON.stringify(response, null, 2));
// Process the response
let finalResponse = "";
const toolResults: MCPToolResult[] = [];
// Handle different content types
for (const content of response.content) {
if (content.type === 'text') {
finalResponse += content.text;
} else if (content.type === 'tool_use') {
console.log(`๐ง Claude wants to use tool: ${content.name} with args:`, content.input);
// Call the MCP tool
const mcpResult = await this.mcpClient.callTool(content.name, content.input);
let toolResultText = "No result";
if (mcpResult && mcpResult.content) {
toolResultText = mcpResult.content
.filter(c => c.type === 'text')
.map(c => c.text)
.join('\n');
}
toolResults.push({
tool: content.name,
result: toolResultText
});
// Add tool result to conversation for Claude's next response
this.conversationHistory.push({
role: "assistant",
content: response.content
});
this.conversationHistory.push({
role: "user",
content: [
{
type: "tool_result",
tool_use_id: content.id,
content: toolResultText
}
]
});
}
}
// If we used tools, get Claude's final response incorporating the results
if (toolResults.length > 0) {
console.log(`๐ Tool results:`, toolResults);
const finalCompletion = await this.anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 1000,
system: this.systemPrompt,
messages: this.conversationHistory,
tools: tools.length > 0 ? tools : undefined,
});
// Extract text from final response
finalResponse = "";
for (const content of finalCompletion.content) {
if (content.type === 'text') {
finalResponse += content.text;
}
}
this.conversationHistory.push({
role: "assistant",
content: finalCompletion.content
});
} else {
// No tools were used, add the response to history
this.conversationHistory.push({
role: "assistant",
content: response.content
});
}
return finalResponse;
} catch (error) {
console.error('Error calling Claude:', error);
return "I'm sorry, I encountered an error processing your request.";
}
}
What makes it worth learning and excited about?
If you have been following LLM landscape, you might have come to a realization that most of us (unless you are an AI researcher) are in the business of providing the most accurate and up to date context to the foundational LLM models. MCP unifies the way LLM models, Consumer Applications (clients) and Resource Providers (servers) communicates thus reducing M*N integration issues to M+N, which is worth learning, implementing and being excited about. Even without the context of LLM, I hope more data or service providers exposes their system capabilities by implementing MCP server contracts. That would dramatically reduce so much wasted time spent on ad-hoc integration API glue code.
Hope you learned a thing or two about MCP. Keep learning and have fun!