Transposing the Agent Loop
Posted: 2026-04-15
Introduction
Instead of few long-running agent loops, aim for many short agent loops, each moving one step closer to the solution to your complex problems. This massive decomposition avoids many of the problems of context decay.
The goal is: make each loop as short as possible. Running a fleet (many loops) is just the means to achieve that.
I call this “Transposing the agent loop”, where you run a loop of (very simple) agent loops:
Said differently, each time we append a new message to a long conversation, we feed the entire conversation back into the LLM. Surely we can do better!
An example
Suppose you start with a prompt that consists of some instructions and a sequence of four input files (contents).
If the AI signals that it needs the contents of a fifth file, your workflow throws away the conversation and starts a new one. The new prompt consists of the instructions and a sequence of five files.
If the AI decides to overwrite one of the files, it provides the new contents and a summary of the changes. Your workflow starts a new conversation with:
- The original instructions.
- The five files (including the updated one).
- If small enough, a diff for the modified file.
- The AI-provided summary of changes for the modified file.
Managing project structure and validation
This requires explicit management of project structure. Rather than a long AI conversation formulating a plan and trying to stick to it, you explicitly capture the plan and use it to control the whole workflow (the sequence of agent loops). You generate prompts dynamically, modeling semantics about the source of each part.
An example of how the model may look: you have software products, where you execute projects. Each project consists of a sequence of tasks of different task types, for which isolated attempts are issued. The attempts generate artifacts. You put all of this in a database and an orchestrator program coordinates the execution — invoking appropriate agents at each step. This is a simplified view; the actual model may be significantly more complex.
Another important requirement to transpose the loops is robust validation steps at each task. This helps prevent compounding failures (see probability of success).
You’ll also want to adjust the semantics of your tools accordingly. Rather than just append output to the conversation, they should trigger an update to the original prompt (or underlying state) and an execution of a new agent. You need to think about how to collect a meaningful summary of the changes for the future loops.
You gain control
This gives you a lot of control.
Restrict the set of tools for each agent. You can specify the exact set of tools available to each agent, depending (among other things) on their type of task. This lets you drastically increase focus.
- “You can only read files and search.”
- “You can’t read any files; your prompt has all you need.”
Uplevel your tools. You can generate tools targeted for a specific task, with semantics that match the domain naturally.
Generate better prompts. You generate very specific prompts for each task. Your structure informs exactly what each agent sees.
For example, for a coding task:
- Run agent loops with the simple goal of detecting the set of files (paths) that are relevant to a coding task;
- Run loops that receive the overall task goal and a single input file (path). They output a summary of the relevant portions of the file, removing all extraneous information;
- Finally attempt the coding task. The difficult loop starts on the shoulders of others, rather than drown in a sea of details.
An agent focused on a specific task won’t see any information about all other tasks in the project, including information about previous failed attempts (unless you deliberately choose to surface it).
Your generated prompts for a task exceed some size threshold? This may happen when each prompt is generated from various sources, likely produced by previous agents. Compress! Run agents that summarize or drop less relevant information. You can do this on each part of the prompt in isolation.
Use the right models. Use expensive models only for the parts of your project. Switch dynamically: A task has two failed attempts? Try with an expensive model.
Risks & applicability
There are a few risks from this technique that are worth keeping in mind.
The first is loss of synthesis. Reducing the amount of information given to each agent increases focus, but can prevent insights in one loop from reaching others. By holding everything in a single “working memory”, long-running conversations can see the big picture, which might be somewhat lost when the loop is transposed.
You also face significant complexity in the orchestration and supporting model. You’re taking some of the “intelligence” out of the LLM into your state machine. For simple high-context tasks, where the entire task fits into a context window without quality decay, this technique only brings unnecessary overhead.
Cost reductions
There’s also an interesting observation on costs. Costs grow quadratically as conversations get longer. Partitioning the complex project into semi-isolated tasks may drastically reduce the total cost.
This happens because LLMs are “stateless”. In a conversation, each time you send a new message, the entire conversation history must be sent back to the model as part of the new input.
Consider a conversation of 10 turns, each like this:
- User: … 1000 input tokens …
- Gemini: … 1000 output tokens …
The costs in 1K tokens (input plus output) are:
| Turn | Turn Tokens | Cumulative |
|---|---|---|
| 1 | 2 | 2 |
| 2 | 4 | 6 |
| 3 | 6 | 12 |
| 4 | 8 | 20 |
| 5 | 10 | 30 |
| 6 | 12 | 42 |
| 7 | 14 | 56 |
| 8 | 16 | 72 |
| 9 | 18 | 90 |
| 10 | 20 | 110 |
Only 20K tokens worth of requests and responses were actually generated, but the total cost is 110K. At 32 turns, you are already paying for 1M.
Context caching may be used to mitigate this, but transposing the agent loop may be significantly more effective. You pay a “setup tax” because you’re sending more information in the initial prompts, but because you isolate information (you don’t include the whole context), the compounding effect is significantly limited.
Probability of success
Given a complex task, one way to think about the effect of transposing the loop is:
Single long-running agent: You have a low probability of success, say 20%.
Transposing the agent loop: Each agent has a high probability of success, say 98%. With these numbers you can afford to have up to 80 separate agents —at that point your compound probability of success drops below 20%.
Obviously, this is a very simplistic model, ignoring additional effects. The long-running agent may be able to self-correct, but, on the other hand, the transposed workflow can use deterministic validation steps and self-correction strategies (e.g., multiple attempts per task, dynamically decide to break down failed tasks, etc.).
Conclusion
There are many good reasons to run shorter agent loops. Transposing the loop helps you keep agents focused, which can improve the quality of your AI-assisted workflows.
Related
- Up: Essays on AI