Stop Treating Your AI Agent Like a Search Engine: Why You Need an "AI Harness"

Sunil Kashyap

May 7, 2026

Unstructured AI agents give great answers for an hour and bad answers for a week. A harness — files, conventions, and feedback loops — is what turns them into something that actually ships.

Most of us have had the same experience with AI coding agents. You give it a prompt, you get back a clean block of code in seconds. Feels like magic.

Then you try to ship a real feature with it over the course of a week.

The two-line bug fix turns into a 1,400-line refactor. It writes tests that pass against nothing. It hits a small snag and just... waits, halfway through, until you come back and tell it what to do. The honest way to describe an unstructured agent isn't "senior engineer." It's a brilliant intern with no memory, no desk, no calendar, and no idea what they did yesterday.

You get good answers for an hour, and bad answers for a week.

If you want an agent that actually ships features overnight, instead of just typing faster than you can, you have to stop bolting it onto the workflows you already had. You need a harness.

What a harness actually is

A harness isn't software you install. It's a set of files, conventions, and workflows that the agent reads, writes, and consults every time it works. The point is to make the agent's effort compound across sessions.

Without one, every new chat starts at zero. With one, the agent wakes up, reads the relevant domain notes, checks the most recent handoff, and gets going.

It does three things:

Keeps context around. The agent reads architectural constraints and prior decisions before touching code, so it re-orients in minutes instead of guessing.
Forces external evaluation. The agent never grades its own work. Asking a language model whether its code works is like asking a suspect if they're guilty. Tests, linters, and explicit acceptance gates produce signals it can't fake.
Closes the loop. Every change runs end-to-end: a trigger kicks off the work, code gets produced, something evaluates it (tests, metrics, a checker), and the result goes back into the main branch. If your system isn't learning from what the agent did, you're running open-loop.

The failure modes you'll recognize

If you've watched an agent run for more than an afternoon, you've seen these.

Context anxiety. The agent finishes early, or sloppily, because its context window is filling up and it knows it.

Scope drift. Without explicit acceptance criteria, every nearby file looks like a refactoring opportunity.

The mid-loop block. You set the agent to run overnight. At 11pm it stops to ask, "are you sure you want me to proceed?" and sits there until morning. That's eight hours of warmed-up context wasted, plus the API spend.

A working harness prevents all three.

What's actually in one

You don't need exotic tooling. You need discipline and a few mundane pieces:

Persistent context files. CLAUDE.md or AGENTS.md sitting in the directories they describe. Architecture, build commands, local conventions, the stuff the agent shouldn't have to guess.
A structured backlog. When the agent finishes a task, it shouldn't ask what's next. It should know to check a Strategy.md or task frontmatter and pick up the next priority on its own.
In-code annotations. TODO(wave-2), NOTE(invariant), that kind of thing. The best place to tell an agent about a constraint is inside the file it's about to edit.
Dossiers for the big work. Anything bigger than a single PR gets a folder: problem statement, decisions made, and an append-only JOURNAL.md. The journal becomes the agent's long-term memory for that project.
A decision cascade for ambiguity. When the agent isn't sure what to do, it can't just stop. It writes down the ambiguity, considers the options, checks prior decisions, picks one, logs the choice, and keeps going. This is the part that makes overnight runs survive contact with reality.

Where humans fit

There's an obvious worry that automating this much makes the engineer redundant. It's the opposite.

A harness pulls humans out of the role of information router. You stop being the person who copies status from Jira to the IDE, or the one who has to remember why a camera threshold was set to 3 meters and not 10. The agent handles execution, logging, and cross-referencing. You handle direction, strategy, and the patterns the agent works inside of.

That's a better job.

Where to start

This sounds like a project. It isn't. The first time you wake up to a finished overnight run, the upfront cost gets paid back.

Tomorrow: write one root file describing your project's architecture and three things you never want the agent to do.

The day after: drop an AGENTS.md into your most active directory with a clear definition of "done."

Next week: require every deferred idea to land as an in-code TODO with a reason attached.

Pretty quickly, you stop chatting with an AI and start running a system that compounds.

#AI#Agents#Workflow#Engineering