11 KiB
The longer version: how we got here
You don't have to listen to me
Whether you're new to agents or an ornery old veteran like me, I'm going to try to convince you to throw out most of what you think about AI Agents, take a step back, and rethink them from first principles. (spoiler alert if you didn't catch the OpenAI responses launch a few weeks back, but pushing MORE agent logic behind an API ain't it)
Agents are software, and a brief history thereof
let's talk about how we got here
60 years ago
We're gonna talk a lot about Directed Graphs (DGs) and their Acyclic friends, DAGs. I'll start by pointing out that...well...software is a directed graph. There's a reason we used to represent programs as flow charts.
20 years ago
Around 20 years ago, we started to see DAG orchestrators become popular. We're talking classics like Airflow, Prefect, some predecessors, and some newer ones like (dagster, inggest, windmill). These followed the same graph pattern, with the added benefit of observability, modularity, retries, administration, etc.
10-15 years ago
When ML models started to get good enough to be useful, we started to see DAGs with ML models sprinkled in. You might imagine steps like "summarize the text in this column into a new column" or "classify the support issues by severity or sentiment".
But at the end of the day, it's still mostly the same good old deterministic software.
The promise of agents
I'm not the first person to say this, but my biggest takeaway when I started learning about agents, was that you get to throw the DAG away. Instead of software engineers coding each step and edge case, you can give the agent a goal and a set of transitions:
And let the LLM make decisions in real time to figure out the path
The promise here is that you write less software, you just give the LLM the "edges" of the graph and let it figure out the nodes. You can recover from errors, you can write less code, and you may find that LLMs find novel solutions to problems.
Agents as loops
Put another way, you've got this loop consisting of 3 steps:
- LLM determines the next step in the workflow, outputting structured json ("tool calling")
- Deterministic code executes the tool call
- The result is appended to the context window
- repeat until the next step is determined to be "done"
initial_event = {"message": "..."}
context = [initial_event]
while True:
next_step = await llm.determine_next_step(context)
context.append(next_step)
if (next_step.intent === "done"):
return next_step.final_answer
result = await execute_step(next_step)
context.append(result)
Our initial context is just the starting event (maybe a user message, maybe a cron fired, maybe a webhook, etc), and we ask the llm to choose the next step (tool) or to determine that we're done.
Here's a multi-step example:
And the "materialized" DAG that was generated would look something like:
The problem with this "loop until you solve it" pattern
The biggest problems with this pattern:
- Agents get lost when the context window gets too long - they spin out trying the same broken approach over and over again
- literally thats it, but that's enough to kneecap the approach
Even if you haven't hand-rolled an agent, you've probably seen this long-context problem in working with agentic coding tools. They just get lost after a while and you need to start a new chat.
I'll even perhaps posit something I've heard in passing quite a bit, and that YOU probably have developed your own intuition around:
Even as models support longer and longer context windows, you'll ALWAYS get better results with a small, focused prompt and context
Most builders I've talked to pushed the "tool calling loop" idea to the side when they realized that anything more than 10-20 turns becomes a big mess that the LLM can't recover from. Even if the agent gets it right 90% of the time, that's miles away from "good enough to put in customer hands". Can you imagine a web app that crashed on 10% of page loads?
Update 2025-06-09 - I really like how @swyx put this:
What actually works - micro agents
One thing that I have seen in the wild quite a bit is taking the agent pattern and sprinkling it into a broader more deterministic DAG.
You might be asking - "why use agents at all in this case?" - we'll get into that shortly, but basically, having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback, translating it into workflow steps without spinning out into context error loops. (factor 1, factor 3 factor 7).
having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback...without spinning out into context error loops
A real life micro agent
Here's an example of how deterministic code might run one micro agent responsible for handling the human-in-the-loop steps for deployment.
- Human Merges PR to GitHub main branch
- Deterministic Code Deploys to staging env
- Deterministic Code Runs end-to-end (e2e) tests against staging
- Deterministic Code Hands to agent for prod deployment, with initial context: "deploy SHA 4af9ec0 to production"
- Agent calls
deploy_frontend_to_prod(4af9ec0) - Deterministic code requests human approval on this action
- Human Rejects the action with feedback "can you deploy the backend first?"
- Agent calls
deploy_backend_to_prod(4af9ec0) - Deterministic code requests human approval on this action
- Human approves the action
- Deterministic code executed the backend deployment
- Agent calls
deploy_frontend_to_prod(4af9ec0) - Deterministic code requests human approval on this action
- Human approves the action
- Deterministic code executed the frontend deployment
- Agent determines that the task was completed successfully, we're done!
- Deterministic code run the end-to-end tests against production
- Deterministic code task completed, OR pass to rollback agent to review failures and potentially roll back
This example is based on a real life OSS agent we've shipped to manage our deployments at Humanlayer - here is a real conversation I had with it last week:
We haven't given this agent a huge pile of tools or tasks. The primary value in the LLM is parsing the human's plaintext feedback and proposing an updated course of action. We isolate tasks and contexts as much as possible to keep the LLM focused on a small, 5-10 step workflow.
Here's another more classic support / chatbot demo.
So what's an agent really?
- prompt - tell an LLM how to behave, and what "tools" it has available. The output of the prompt is a JSON object that describe the next step in the workflow (the "tool call" or "function call"). (factor 2)
- switch statement - based on the JSON that the LLM returns, decide what to do with it. (part of factor 8)
- accumulated context - store the list of steps that have happened and their results (factor 3)
- for loop - until the LLM emits some sort of "Terminal" tool call (or plaintext response), add the result of the switch statement to the context window and ask the LLM to choose the next step. (factor 8)
In the "deploybot" example, we gain a couple benefits from owning the control flow and context accumulation:
- In our switch statement and for loop, we can hijack control flow to pause for human input or to wait for completion of long-running tasks
- We can trivially serialize the context window for pause+resume
- In our prompt, we can optimize the heck out of how we pass instructions and "what happened so far" to the LLM
Part II will formalize these patterns so they can be applied to add impressive AI features to any software project, without needing to go all in on conventional implementations/definitions of "AI agent".











