Agent Harness

Google Search AI definition

An AI agent is the model that reasons and decides ("what/why"), while the harness is the surrounding software infrastructure ("how/where") that enables it to act. The agent (LLM) runs in a loop, taking actions and observing results, whereas the harness provides the tools, memory, and persistence needed for that loop to function, essentially acting as the agent's operating system.

Key Differences & Examples:

Definition: An Agent is the AI model + Prompting. A Harness is everything else (filesystems, bash, memory).
Role: The agent handles logic and decision-making; the harness manages tool execution, state saving, and reliability.
Usage Examples:
- Agent (The "Brain"): A GPT-4 model told to "fix this bug."
- Harness (The "Wrapper"): A framework like OpenClaw, LangChain (in ReAct mode), or a Bash script that provides the agent with terminal access, file reading, and saving its own progress, allowing the agent to actually run git commit and test.sh.
Synonyms/Related Terms:
- Agent: LLM agent, Agent loop, Model+Prompt.
- Harness: Runtime environment, Scaffold, Agent infrastructure, Wrapper, Toolkit.

Analogy: If the agent is the engine, the harness is the entire car (steering, fuel system, dashboard) that makes the engine useful. A harness makes the agent autonomous by giving it a "computer" (bash) rather than just a limited set of tools.

Reading

Articles, Essays, Blogs

The Anatomy of an Agent Harness
Fowler Bliki: Harness Engineering: "The term harness has emerged as a shorthand to mean everything in an AI agent except the model itself - Agent = Model + Harness. That is a very wide definition, and therefore worth narrowing down for common categories of agents. I want to take the liberty here of defining its meaning in the bounded context of using a coding agent. In coding agents, part of the harness is already built in (e.g. via the system prompt, or the chosen code retrieval mechanism, or even a sophisticated orchestration system). But coding agents also provide us, their users, with many features to build an outer harness specifically for our use case and system."
What is an agent harness?
Agent Frameworks vs Agent Harnesses:
- Agent framework: Agent frameworks sit in the middle of the agent development line with raw API-calling code on the far left. They give you structure and abstractions, but you still make a lot of decisions. You pick the memory system, you configure the tools, you define the orchestration logic. The framework has opinions about how things should connect, but it's modular. You can swap components. A framework gives you abstractions for building agents. You define roles, tasks, tools. You specify how agents coordinate, whether they work sequentially or hierarchically. The framework handles the plumbing. Calling the LLM, routing tool outputs, managing the execution loop. But you're still making architectural decisions. The framework is opinionated about what the building blocks look like. It has a memory abstraction, a tool interface, a task structure. But those pieces are swappable. If you don't like the default memory implementation, you can plug in your own. If you want to use a different LLM provider, you configure it. The framework gives you a standard interface, but you're still composing the system. That modularity is the point. Frameworks are built for people who want to build agents, not just use them. You're expected to understand how the pieces fit together, because you're the one deciding which pieces to use.
- Agent harness: Agent harnesses sit on the far right. They're maximally opinionated. Everything is baked in. You add your API keys, maybe point it at a few tools, and it runs. Memory, context management, the agent loop, safety checks. All of that is decided for you. A harness doesn't give you building blocks. It gives you a complete system. The best recent example is OpenClaw, which went viral a few weeks ago. It's a harness. You download it, add your API keys, and suddenly you have an agent you can chat with on WhatsApp, Telegram, and other platforms. Memory is handled. Context management is handled. The agent loop is handled. Tool calling, permissions, state persistence. All of it is built in. You're not configuring a memory system. You're not deciding how tools get registered or how the agent recovers from errors. Those decisions were made by whoever built the harness. Your job is to point it at a task and let it run. That's the tradeoff. You get something that works immediately, but you don't get to change how it works under the hood. The harness has an opinion about everything, and you're accepting that opinion when you use it.
- The spectrum matters because it maps to different problems. If you're prototyping, experimenting, or building something custom, you want a framework. You need the flexibility to swap components, test different approaches, and control the details. The framework gives you structure without locking you in. If you need something that works now, reliably, for a specific use case, you want a harness. You're trading control for speed. The harness has already solved the hard problems. Context management, durable execution, error recovery. You're just using the solution.
From amateur to mature: setting up Cursor for real development: 7 Phases
1. Phase 0: the amateur start. You open the chat, paste a problem, and get back something long and confident and partly wrong. The agent has no idea what your service does or how your team writes code, so it guesses. Sometimes the guess is good. Often it isn't, and you're left auditing. There's nothing wrong with starting here. Everyone does. The mistake is staying here and concluding the tool is the problem. It isn't. The context is.
2. Phase 1: learn the modes. The first real step is noticing that the agent has different modes, and each one is good at a different job. Ask mode is read-only, for poking around a codebase and understanding how things flow. Plan mode is for designing an approach before anyone touches code, which is what you want for anything large or fuzzy. Agent mode executes once the plan is clear. Debug mode is for systematic troubleshooting with actual runtime evidence rather than vibes. Picking the right mode changes how the agent behaves. It won't fix your output quality on its own though. That comes from context, which is the rest of this list.
3. Phase 2: give the project a memory. This is the step that moved me from amateur to something resembling competent, and it's the one I'd do first if I were starting over. You give the agent persistent knowledge of the project so it stops guessing. Two pieces do most of the work. An AGENTS.md file at the repo root holds the project overview: how to build, test and run it, the code standards, the architecture, and the gotchas a new engineer would trip over in their first week. Then .cursor/rules/*.mdc files carry the targeted stuff. I keep one always-on overview rule, deliberately minimal so it barely costs any context. The heavier rules (language and style scoped by glob, API conventions, database and migration patterns, testing rules pointed at the tests/ folder) only activate when they're relevant. That last point matters more than it looks. If you mark every rule "always apply," you burn context on every single prompt. Keep the overview always-on and let the rest load on demand. The payoff shows up fast. The agent starts respecting your architecture, including the rule I care about most: never commit without me asking.
4. Phase 3: encode your expertise as skills. Once the project context is in place, you start catching yourself explaining the same thing for the fifth time. That repetition is the signal to build a skill, a SKILL.md file the agent reaches for on its own when the situation calls for it. Mine grew slowly and accidentally. A code-review skill came out of our engineering guidelines, and I later folded in quality-gate thresholds and our Definition of Done. There's a summarize-chat skill for capturing decisions, a code-flow skill for tracing how a change ripples across services, and a couple of PR-review skills wired into our tooling. None of these existed on day one. I used the agent, watched where it fell short, and wrote down whatever I kept having to repeat. The strange result is that your prompts get shorter while the output gets better, because the expertise now lives in the setup instead of in your typing.
5. Phase 4: connect it to your real tools with MCP. Now you go past the codebase. MCP (Model Context Protocol) servers connect the agent to the tools you actually work in. I run a memory server for knowledge that survives across sessions, and a wiki server so the agent can pull from our internal docs. The one I'd single out is the Azure DevOps MCP, and not just for developers. ADO MCP earns its keep for the whole team. If your scrum masters write user stories properly (real descriptions, clear acceptance criteria, sensible scope) then everyone downstream benefits. Developers get enough context to make changes in the right services. Reviewers can hold the actual code change up against the expected outcome, with the agent helping. Planning, building and reviewing all point at the same source of truth instead of three different ones.
6. Phase 5: make knowledge persistent with hooks and Memory Palace. A mature setup doesn't throw away a good session. Hooks let you trigger behavior around the agent's lifecycle: session start, before a shell command, after a file edit, before context compaction. I use one to auto-summarize a conversation right before the context gets compacted, so nothing useful evaporates. I also gate beforeShellExecution, which keeps automated commands from doing anything reckless on a corporate network. Then there's Memory Palace: It's a technique, with Python libraries available, that saves your chat history together with its surrounding context. The useful part is that it notices when several separate discussions are actually about the same thing and links them up. So the next time I start an analysis, the relevant past context is already sitting there waiting. Every time I have a session worth keeping, I save it, and the whole thing compounds into a knowledge base that makes the next task faster.
7. Phase 6: build small automations and runbooks. This is the phase that separates "I use AI to write code" from "I use AI to ship and verify it." Once your context is solid, you point the agent at the boring-but-critical work: local testing, smoke tests, the quick checks you run before a deploy.

Tutorials

build-your-claude-code-from-scratch: Fundamental concepts including tool calling patterns, ReAct (Reasoning and Acting) methodology, real-time streaming responses, conversation history management, smart context cropping, structured task management, sub-agent delegation, and Model Context Protocol (MCP) integration. DeepWiki
learn-harness-engineering: Harness engineering beginner tutorial, from 0 to 1.

Tags: ai agent harness

Last modified 21 June 2026

Agent Harness

A category of tool around LLMs.

Google Search AI definition

Reading

Articles, Essays, Blogs

Tutorials