12-factor-agents/brief-history-of-software.md at main

alihan/12-factor-agents

Fork 0

mirror of https://github.com/humanlayer/12-factor-agents.git synced 2025-08-20 18:59:53 +03:00

Files

Dex f070b6d535 Update brief-history-of-software.md

2025-07-02 13:52:58 -05:00

11 KiB

Raw Permalink Blame History

← Back to README

The longer version: how we got here

You don't have to listen to me

Whether you're new to agents or an ornery old veteran like me, I'm going to try to convince you to throw out most of what you think about AI Agents, take a step back, and rethink them from first principles. (spoiler alert if you didn't catch the OpenAI responses launch a few weeks back, but pushing MORE agent logic behind an API ain't it)

Agents are software, and a brief history thereof

let's talk about how we got here

60 years ago

We're gonna talk a lot about Directed Graphs (DGs) and their Acyclic friends, DAGs. I'll start by pointing out that...well...software is a directed graph. There's a reason we used to represent programs as flow charts.

20 years ago

Around 20 years ago, we started to see DAG orchestrators become popular. We're talking classics like Airflow, Prefect, some predecessors, and some newer ones like (dagster, inggest, windmill). These followed the same graph pattern, with the added benefit of observability, modularity, retries, administration, etc.

10-15 years ago

When ML models started to get good enough to be useful, we started to see DAGs with ML models sprinkled in. You might imagine steps like "summarize the text in this column into a new column" or "classify the support issues by severity or sentiment".

But at the end of the day, it's still mostly the same good old deterministic software.

The promise of agents

I'm not the first person to say this, but my biggest takeaway when I started learning about agents, was that you get to throw the DAG away. Instead of software engineers coding each step and edge case, you can give the agent a goal and a set of transitions:

And let the LLM make decisions in real time to figure out the path

The promise here is that you write less software, you just give the LLM the "edges" of the graph and let it figure out the nodes. You can recover from errors, you can write less code, and you may find that LLMs find novel solutions to problems.

Agents as loops

Put another way, you've got this loop consisting of 3 steps:

LLM determines the next step in the workflow, outputting structured json ("tool calling")
Deterministic code executes the tool call
The result is appended to the context window
repeat until the next step is determined to be "done"

initial_event = {"message": "..."}
context = [initial_event]
while True:
  next_step = await llm.determine_next_step(context)
  context.append(next_step)

  if (next_step.intent === "done"):
    return next_step.final_answer

  result = await execute_step(next_step)
  context.append(result)

Our initial context is just the starting event (maybe a user message, maybe a cron fired, maybe a webhook, etc), and we ask the llm to choose the next step (tool) or to determine that we're done.

Here's a multi-step example:

GIF Version

]

And the "materialized" DAG that was generated would look something like:

The problem with this "loop until you solve it" pattern

The biggest problems with this pattern:

Agents get lost when the context window gets too long - they spin out trying the same broken approach over and over again
literally thats it, but that's enough to kneecap the approach

Even if you haven't hand-rolled an agent, you've probably seen this long-context problem in working with agentic coding tools. They just get lost after a while and you need to start a new chat.

I'll even perhaps posit something I've heard in passing quite a bit, and that YOU probably have developed your own intuition around:

Even as models support longer and longer context windows, you'll ALWAYS get better results with a small, focused prompt and context

Most builders I've talked to pushed the "tool calling loop" idea to the side when they realized that anything more than 10-20 turns becomes a big mess that the LLM can't recover from. Even if the agent gets it right 90% of the time, that's miles away from "good enough to put in customer hands". Can you imagine a web app that crashed on 10% of page loads?

Update 2025-06-09 - I really like how @swyx put this:

What actually works - micro agents

One thing that I have seen in the wild quite a bit is taking the agent pattern and sprinkling it into a broader more deterministic DAG.

You might be asking - "why use agents at all in this case?" - we'll get into that shortly, but basically, having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback, translating it into workflow steps without spinning out into context error loops. (factor 1, factor 3 factor 7).

having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback...without spinning out into context error loops

A real life micro agent

Here's an example of how deterministic code might run one micro agent responsible for handling the human-in-the-loop steps for deployment.

Human Merges PR to GitHub main branch
Deterministic Code Deploys to staging env
Deterministic Code Runs end-to-end (e2e) tests against staging
Deterministic Code Hands to agent for prod deployment, with initial context: "deploy SHA 4af9ec0 to production"
Agent calls deploy_frontend_to_prod(4af9ec0)
Deterministic code requests human approval on this action
Human Rejects the action with feedback "can you deploy the backend first?"
Agent calls deploy_backend_to_prod(4af9ec0)
Deterministic code requests human approval on this action
Human approves the action
Deterministic code executed the backend deployment
Agent calls deploy_frontend_to_prod(4af9ec0)
Deterministic code requests human approval on this action
Human approves the action
Deterministic code executed the frontend deployment
Agent determines that the task was completed successfully, we're done!
Deterministic code run the end-to-end tests against production
Deterministic code task completed, OR pass to rollback agent to review failures and potentially roll back

GIF Version

]

This example is based on a real life OSS agent we've shipped to manage our deployments at Humanlayer - here is a real conversation I had with it last week:

We haven't given this agent a huge pile of tools or tasks. The primary value in the LLM is parsing the human's plaintext feedback and proposing an updated course of action. We isolate tasks and contexts as much as possible to keep the LLM focused on a small, 5-10 step workflow.

Here's another more classic support / chatbot demo.

So what's an agent really?

prompt - tell an LLM how to behave, and what "tools" it has available. The output of the prompt is a JSON object that describe the next step in the workflow (the "tool call" or "function call"). (factor 2)
switch statement - based on the JSON that the LLM returns, decide what to do with it. (part of factor 8)
accumulated context - store the list of steps that have happened and their results (factor 3)
for loop - until the LLM emits some sort of "Terminal" tool call (or plaintext response), add the result of the switch statement to the context window and ask the LLM to choose the next step. (factor 8)

In the "deploybot" example, we gain a couple benefits from owning the control flow and context accumulation:

In our switch statement and for loop, we can hijack control flow to pause for human input or to wait for completion of long-running tasks
We can trivially serialize the context window for pause+resume
In our prompt, we can optimize the heck out of how we pass instructions and "what happened so far" to the LLM

Part II will formalize these patterns so they can be applied to add impressive AI features to any software project, without needing to go all in on conventional implementations/definitions of "AI agent".

Factor 1 - Natural Language to Tool Calls →

11 KiB Raw Permalink Blame History