wip

2025-08-20 18:59:53 +03:00 · 2025-07-16 14:19:11 -07:00
parent 79cb11d1ad
commit e646386ba8
17 changed files with 4083 additions and 2045 deletions
--- a/workshops/2025-07-16/CLAUDE.md
+++ b/workshops/2025-07-16/CLAUDE.md
@@ -0,0 +1,69 @@
+# Workshop 2025-07-16: Python/Jupyter Notebook Implementation
+
+• **Main Tool**: `hack/walkthroughgen_py.py` - Converts TypeScript walkthrough to Jupyter notebooks
+• **Config**: `hack/walkthrough_python.yaml` - Defines notebook structure and content
+• **Output**: `hack/workshop_final.ipynb` - Generated notebook with Chapters 0-7
+• **Testing**: `hack/test_notebook_colab_sim.sh` - Simulates Google Colab environment
+
+## Key Implementation Learnings
+
+• **No async/await in notebooks** - All BAML calls must be synchronous, remove all async patterns
+• **No sys.argv** - Main functions accept parameters directly: `main("hello")` not command line args
+• **Global namespace** - Functions defined in cells persist globally, no module imports between cells
+• **BAML setup is optional** - Use `baml_setup: true` step only when introducing BAML (Chapter 1+)
+• **get_baml_client() pattern** - Required workaround for Google Colab import cache issues
+• **BAML files from GitHub** - Fetch with curl since Colab can't display local BAML files
+• **Regenerate BAML** - Use `regenerate_baml: true` in run_main when BAML files change
+• **Import removal** - Remove `from baml_client import get_baml_client` imports from Python files
+• **IN_COLAB detection** - Use try/except on google.colab import to detect environment
+• **Human input handling** - get_human_input() uses real input() in Colab, auto-responses locally
+
+## Implementation Patterns
+
+• **walkthroughgen_py.py enhancements** - Added kwargs support for run_main steps
+• **Test simulation** - test_notebook_colab_sim.sh creates clean venv with all dependencies
+• **Debug artifacts** - Test runs preserved in ./tmp/test_TIMESTAMP/ directories
+• **BAML test support** - baml-cli test works fine in notebooks, contrary to initial assumption
+• **Tool execution** - All calculator operations (add/subtract/multiply/divide) in agent loop
+• **Clarification flow** - ClarificationRequest tool for handling ambiguous inputs
+• **Serialization formats** - JSON vs XML for thread history (XML more token-efficient)
+• **Progressive complexity** - Start with hello world, gradually add BAML, tools, loops, tests
+
+## Chapter Implementation Status
+
+• **Chapter 0**: Hello World - Simple Python program, no BAML ✅
+• **Chapter 1**: CLI and Agent - BAML introduction, basic agent ✅
+• **Chapter 2**: Calculator Tools - Tool definitions without execution ✅
+• **Chapter 3**: Tool Loop - Full agent loop with tool execution ✅
+• **Chapter 4**: BAML Tests - Test cases with assertions ✅
+• **Chapter 5**: Human Tools - Clarification requests with input handling ✅
+• **Chapter 6**: Improved Prompting - Reasoning steps in prompts ✅
+• **Chapter 7**: Context Serialization - JSON/XML thread formats ✅
+• **Chapters 8-12**: Skipped - Server-based features not suitable for notebooks ⚠️
+
+## Common Pitfalls Avoided
+
+• **Import errors** - baml_client imports fail in notebooks, use global get_baml_client
+• **Async patterns** - Notebooks can't handle async/await, everything must be sync
+• **File paths** - Use absolute paths from notebook directory, handle ./ prefixes
+• **BAML file conflicts** - Each chapter updates same files (agent.baml) not chapter-specific
+• **Tool registration** - Ensure all tool types handled in agent loop switch statement
+• **Test expectations** - BAML tests may have varying outputs, assertions verify key properties
+• **Environment differences** - Code must work in both Colab and local testing environments
+
+## Testing Commands
+
+• Generate notebook: `uv run python hack/walkthroughgen_py.py hack/walkthrough_python.yaml -o hack/test.ipynb`
+• Test locally: `uv run python hack/test_notebook.py hack/test.ipynb`
+• Full Colab sim: `cd hack && ./test_notebook_colab_sim.sh`
+• Run BAML tests: `baml-cli test` (from directory with baml_src)
+
+## File Structure
+
+• `walkthrough/*.py` - Python implementations of each chapter's code
+• `walkthrough/*.baml` - BAML files fetched from GitHub during notebook execution
+• `hack/walkthroughgen_py.py` - Main conversion tool
+• `hack/walkthrough_python.yaml` - Notebook definition with all chapters
+• `hack/test_notebook.py` - Local testing script (skips pip/baml-cli init)
+• `hack/test_notebook_colab_sim.sh` - Full Colab environment simulation
+• `hack/workshop_final.ipynb` - Final generated notebook ready for workshop
--- a/workshops/2025-07-16/hack/chapter0-1_test.ipynb
+++ b/workshops/2025-07-16/hack/chapter0-1_test.ipynb
@@ -1,331 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "be04e2a1",
-   "metadata": {},
-   "source": [
-    "# Building the 12-factor agent template from scratch in Python"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bd951167",
-   "metadata": {},
-   "source": [
-    "Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8c8bd795",
-   "metadata": {},
-   "source": [
-    "## Chapter 0 - Hello World"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "65edc632",
-   "metadata": {},
-   "source": [
-    "Let's start with a basic Python setup and a hello world program."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "349851d1",
-   "metadata": {},
-   "source": [
-    "This guide will walk you through building agents in Python with BAML.\n",
-    "\n",
-    "We'll start simple with a hello world program and gradually build up to a full agent.\n",
-    "\n",
-    "For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "81c54be5",
-   "metadata": {},
-   "source": [
-    "Here's our simple hello world program:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0ce40eda",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/00-main.py\n",
-    "def hello():\n",
-    "    print('hello, world!')\n",
-    "\n",
-    "def main():\n",
-    "    hello()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0a5adaf8",
-   "metadata": {},
-   "source": [
-    "Let's run it to verify it works:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5068756a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "main()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f3f75943",
-   "metadata": {},
-   "source": [
-    "## Chapter 1 - CLI and Agent Loop"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3a41e811",
-   "metadata": {},
-   "source": [
-    "Now let's add BAML and create our first agent with a CLI interface."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f390af8c",
-   "metadata": {},
-   "source": [
-    "In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.\n",
-    "\n",
-    "First, let's set up BAML support in our notebook.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3d3cf8ff",
-   "metadata": {},
-   "source": [
-    "### BAML Setup\n",
-    "\n",
-    "Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
-    "- BAML is a tool for working with language models\n",
-    "- We need some special setup code to make it work nicely in Google Colab\n",
-    "- The `get_baml_client()` function will be used to interact with AI models"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d6787b0e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip install baml-py"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0a87bf65",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import subprocess\n",
-    "from google.colab import userdata\n",
-    "import os\n",
-    "\n",
-    "def baml_generate():\n",
-    "    try:\n",
-    "        result = subprocess.run(\n",
-    "            [\"baml-cli\", \"generate\"],\n",
-    "            check=True,\n",
-    "            capture_output=True,\n",
-    "            text=True\n",
-    "        )\n",
-    "        if result.stdout:\n",
-    "            print(\"[baml-cli generate]\\n\", result.stdout)\n",
-    "        if result.stderr:\n",
-    "            print(\"[baml-cli generate]\\n\", result.stderr)\n",
-    "    except subprocess.CalledProcessError as e:\n",
-    "        msg = (\n",
-    "            f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
-    "            f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
-    "            f\"--- STDERR ---\\n{e.stderr}\"\n",
-    "        )\n",
-    "        raise RuntimeError(msg) from None\n",
-    "\n",
-    "def get_baml_client():\n",
-    "    \"\"\"\n",
-    "    a bunch of fun jank to work around the google colab import cache\n",
-    "    \"\"\"\n",
-    "    os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
-    "    \n",
-    "    baml_generate()\n",
-    "    \n",
-    "    import importlib\n",
-    "    import baml_client\n",
-    "    importlib.reload(baml_client)\n",
-    "    return baml_client.sync_client.b\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d59b175f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!baml-cli init"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "15a0e941",
-   "metadata": {},
-   "source": [
-    "Now let's create our agent that will use BAML to process user input.\n",
-    "\n",
-    "First, we'll define the core agent logic:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "70570b76",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/01-agent.py\n",
-    "import json\n",
-    "from typing import Dict, Any, List\n",
-    "\n",
-    "# tool call or a respond to human tool\n",
-    "AgentResponse = Any  # This will be the return type from b.DetermineNextStep\n",
-    "\n",
-    "class Event:\n",
-    "    def __init__(self, type: str, data: Any):\n",
-    "        self.type = type\n",
-    "        self.data = data\n",
-    "\n",
-    "class Thread:\n",
-    "    def __init__(self, events: List[Dict[str, Any]]):\n",
-    "        self.events = events\n",
-    "    \n",
-    "    def serialize_for_llm(self):\n",
-    "        # can change this to whatever custom serialization you want to do, XML, etc\n",
-    "        # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
-    "        return json.dumps(self.events)\n",
-    "\n",
-    "# right now this just runs one turn with the LLM, but\n",
-    "# we'll update this function to handle all the agent logic\n",
-    "def agent_loop(thread: Thread) -> AgentResponse:\n",
-    "    b = get_baml_client()  # This will be defined by the BAML setup\n",
-    "    next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
-    "    return next_step"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ed9ef001",
-   "metadata": {},
-   "source": [
-    "Next, we need to define the BAML function that our agent will use.\n",
-    "\n",
-    "This BAML file defines what our agent can do:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "87d1ffc3",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!curl -fsSL -o baml_src/01-agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/01-agent.baml"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cf84ac22",
-   "metadata": {},
-   "source": [
-    "Now let's create our main function that simulates command-line arguments:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "430d840b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/01-main.py\n",
-    "import sys\n",
-    "\n",
-    "def main():\n",
-    "    # Set default args if none provided\n",
-    "    if len(sys.argv) < 2:\n",
-    "        sys.argv = [\"notebook\", \"hello from the notebook!\"]\n",
-    "    \n",
-    "    # Get command line arguments, skipping the first one (script name)\n",
-    "    args = sys.argv[1:]\n",
-    "    \n",
-    "    if len(args) == 0:\n",
-    "        print(\"Error: Please provide a message as a command line argument\", file=sys.stderr)\n",
-    "        return\n",
-    "    \n",
-    "    # Join all arguments into a single message\n",
-    "    message = \" \".join(args)\n",
-    "    \n",
-    "    # Create a new thread with the user's message as the initial event\n",
-    "    thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
-    "    \n",
-    "    # Run the agent loop with the thread\n",
-    "    result = agent_loop(thread)\n",
-    "    print(result)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "938ca8b7",
-   "metadata": {},
-   "source": [
-    "Let's test our agent! You can modify the sys.argv line in the cell above to send different messages.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8ea2980e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "baml_generate()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6a10500c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "main()"
-   ]
-  }
- ],
- "metadata": {},
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/workshops/2025-07-16/hack/chapter0-1_test_v2.ipynb
+++ b/workshops/2025-07-16/hack/chapter0-1_test_v2.ipynb
@@ -1,318 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "88f860b5",
-   "metadata": {},
-   "source": [
-    "# Building the 12-factor agent template from scratch in Python"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7ee779c8",
-   "metadata": {},
-   "source": [
-    "Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ee1814f3",
-   "metadata": {},
-   "source": [
-    "## Chapter 0 - Hello World"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bf8c2eed",
-   "metadata": {},
-   "source": [
-    "Let's start with a basic Python setup and a hello world program."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bd3f0e2d",
-   "metadata": {},
-   "source": [
-    "This guide will walk you through building agents in Python with BAML.\n",
-    "\n",
-    "We'll start simple with a hello world program and gradually build up to a full agent.\n",
-    "\n",
-    "For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6596bb81",
-   "metadata": {},
-   "source": [
-    "Here's our simple hello world program:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e4f0678b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/00-main.py\n",
-    "def hello():\n",
-    "    print('hello, world!')\n",
-    "\n",
-    "def main():\n",
-    "    hello()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ca56296d",
-   "metadata": {},
-   "source": [
-    "Let's run it to verify it works:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b94b2080",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "main()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2e89c0c8",
-   "metadata": {},
-   "source": [
-    "## Chapter 1 - CLI and Agent Loop"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ced8aec5",
-   "metadata": {},
-   "source": [
-    "Now let's add BAML and create our first agent with a CLI interface."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "eb350b00",
-   "metadata": {},
-   "source": [
-    "In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.\n",
-    "\n",
-    "First, let's set up BAML support in our notebook.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cc8413c7",
-   "metadata": {},
-   "source": [
-    "### BAML Setup\n",
-    "\n",
-    "Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
-    "- BAML is a tool for working with language models\n",
-    "- We need some special setup code to make it work nicely in Google Colab\n",
-    "- The `get_baml_client()` function will be used to interact with AI models"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d3daef39",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip install baml-py"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0f28859f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import subprocess\n",
-    "from google.colab import userdata\n",
-    "import os\n",
-    "\n",
-    "def baml_generate():\n",
-    "    try:\n",
-    "        result = subprocess.run(\n",
-    "            [\"baml-cli\", \"generate\"],\n",
-    "            check=True,\n",
-    "            capture_output=True,\n",
-    "            text=True\n",
-    "        )\n",
-    "        if result.stdout:\n",
-    "            print(\"[baml-cli generate]\\n\", result.stdout)\n",
-    "        if result.stderr:\n",
-    "            print(\"[baml-cli generate]\\n\", result.stderr)\n",
-    "    except subprocess.CalledProcessError as e:\n",
-    "        msg = (\n",
-    "            f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
-    "            f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
-    "            f\"--- STDERR ---\\n{e.stderr}\"\n",
-    "        )\n",
-    "        raise RuntimeError(msg) from None\n",
-    "\n",
-    "def get_baml_client():\n",
-    "    \"\"\"\n",
-    "    a bunch of fun jank to work around the google colab import cache\n",
-    "    \"\"\"\n",
-    "    os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
-    "    \n",
-    "    baml_generate()\n",
-    "    \n",
-    "    import importlib\n",
-    "    import baml_client\n",
-    "    importlib.reload(baml_client)\n",
-    "    return baml_client.sync_client.b\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bacf7469",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!baml-cli init"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "10d6c7b0",
-   "metadata": {},
-   "source": [
-    "Now let's create our agent that will use BAML to process user input.\n",
-    "\n",
-    "First, we'll define the core agent logic:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "acbfe988",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/01-agent.py\n",
-    "import json\n",
-    "from typing import Dict, Any, List\n",
-    "\n",
-    "# tool call or a respond to human tool\n",
-    "AgentResponse = Any  # This will be the return type from b.DetermineNextStep\n",
-    "\n",
-    "class Event:\n",
-    "    def __init__(self, type: str, data: Any):\n",
-    "        self.type = type\n",
-    "        self.data = data\n",
-    "\n",
-    "class Thread:\n",
-    "    def __init__(self, events: List[Dict[str, Any]]):\n",
-    "        self.events = events\n",
-    "    \n",
-    "    def serialize_for_llm(self):\n",
-    "        # can change this to whatever custom serialization you want to do, XML, etc\n",
-    "        # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
-    "        return json.dumps(self.events)\n",
-    "\n",
-    "# right now this just runs one turn with the LLM, but\n",
-    "# we'll update this function to handle all the agent logic\n",
-    "def agent_loop(thread: Thread) -> AgentResponse:\n",
-    "    b = get_baml_client()  # This will be defined by the BAML setup\n",
-    "    next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
-    "    return next_step"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3e0876e7",
-   "metadata": {},
-   "source": [
-    "Next, we need to define the BAML function that our agent will use.\n",
-    "\n",
-    "This BAML file defines what our agent can do:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bc8da38e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!curl -fsSL -o baml_src/01-agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/01-agent.baml"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0740f7aa",
-   "metadata": {},
-   "source": [
-    "Now let's create our main function that accepts a message parameter:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "93fc3916",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/01-main.py\n",
-    "def main(message=\"hello from the notebook!\"):\n",
-    "    # Create a new thread with the user's message as the initial event\n",
-    "    thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
-    "    \n",
-    "    # Run the agent loop with the thread\n",
-    "    result = agent_loop(thread)\n",
-    "    print(result)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f0d4bf23",
-   "metadata": {},
-   "source": [
-    "Let's test our agent! Try calling main() with different messages:\n",
-    "- `main(\"What's the weather like?\")`\n",
-    "- `main(\"Tell me a joke\")`\n",
-    "- `main(\"How are you doing today?\")`\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5a6685c6",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "baml_generate()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0f28951c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "main(\"Hello from the Python notebook!\")"
-   ]
-  }
- ],
- "metadata": {},
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/workshops/2025-07-16/hack/chapter0-2_test.ipynb
+++ b/workshops/2025-07-16/hack/chapter0-2_test.ipynb
@@ -1,477 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "b102542f",
-   "metadata": {},
-   "source": [
-    "# Building the 12-factor agent template from scratch in Python"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "102afe9c",
-   "metadata": {},
-   "source": [
-    "Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "22eeba5a",
-   "metadata": {},
-   "source": [
-    "## Chapter 0 - Hello World"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f1cdd18a",
-   "metadata": {},
-   "source": [
-    "Let's start with a basic Python setup and a hello world program."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dddbde46",
-   "metadata": {},
-   "source": [
-    "This guide will walk you through building agents in Python with BAML.\n",
-    "\n",
-    "We'll start simple with a hello world program and gradually build up to a full agent.\n",
-    "\n",
-    "For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "eec93380",
-   "metadata": {},
-   "source": [
-    "Here's our simple hello world program:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "915a5235",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/00-main.py\n",
-    "def hello():\n",
-    "    print('hello, world!')\n",
-    "\n",
-    "def main():\n",
-    "    hello()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cc7fa89a",
-   "metadata": {},
-   "source": [
-    "Let's run it to verify it works:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "737420ce",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "main()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0e2b54ff",
-   "metadata": {},
-   "source": [
-    "## Chapter 1 - CLI and Agent Loop"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "65b83189",
-   "metadata": {},
-   "source": [
-    "Now let's add BAML and create our first agent with a CLI interface."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e1ad9ff1",
-   "metadata": {},
-   "source": [
-    "In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.\n",
-    "\n",
-    "## What is BAML?\n",
-    "\n",
-    "BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by [BoundaryML](https://www.boundaryml.com/) (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.\n",
-    "\n",
-    "### Why BAML?\n",
-    "\n",
-    "- **Type-safe outputs**: Get fully type-safe outputs from LLMs, even when streaming\n",
-    "- **Language agnostic**: Works with Python, TypeScript, Ruby, Go, and more\n",
-    "- **LLM agnostic**: Works with any LLM provider (OpenAI, Anthropic, etc.)\n",
-    "- **Better performance**: State-of-the-art structured outputs that outperform even OpenAI's native function calling\n",
-    "- **Developer-friendly**: Native VSCode extension with syntax highlighting, autocomplete, and interactive playground\n",
-    "\n",
-    "### Learn More\n",
-    "\n",
-    "- 📚 [Official Documentation](https://docs.boundaryml.com/home)\n",
-    "- 💻 [GitHub Repository](https://github.com/BoundaryML/baml)\n",
-    "- 🎯 [What is BAML?](https://docs.boundaryml.com/guide/introduction/what-is-baml)\n",
-    "- 📖 [BAML Examples](https://github.com/BoundaryML/baml-examples)\n",
-    "- 🏢 [Company Website](https://www.boundaryml.com/)\n",
-    "- 📰 [Blog: AI Agents Need a New Syntax](https://www.boundaryml.com/blog/ai-agents-need-new-syntax)\n",
-    "\n",
-    "BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.\n",
-    "\n",
-    "### Note on Developer Experience\n",
-    "\n",
-    "BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.\n",
-    "\n",
-    "First, let's set up BAML support in our notebook.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5686c992",
-   "metadata": {},
-   "source": [
-    "### BAML Setup\n",
-    "\n",
-    "Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
-    "- BAML is a tool for working with language models\n",
-    "- We need some special setup code to make it work nicely in Google Colab\n",
-    "- The `get_baml_client()` function will be used to interact with AI models"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "325b7b32",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip install baml-py"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a0517855",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import subprocess\n",
-    "import os\n",
-    "\n",
-    "# Try to import Google Colab userdata, but don't fail if not in Colab\n",
-    "try:\n",
-    "    from google.colab import userdata\n",
-    "    IN_COLAB = True\n",
-    "except ImportError:\n",
-    "    IN_COLAB = False\n",
-    "\n",
-    "def baml_generate():\n",
-    "    try:\n",
-    "        result = subprocess.run(\n",
-    "            [\"baml-cli\", \"generate\"],\n",
-    "            check=True,\n",
-    "            capture_output=True,\n",
-    "            text=True\n",
-    "        )\n",
-    "        if result.stdout:\n",
-    "            print(\"[baml-cli generate]\\n\", result.stdout)\n",
-    "        if result.stderr:\n",
-    "            print(\"[baml-cli generate]\\n\", result.stderr)\n",
-    "    except subprocess.CalledProcessError as e:\n",
-    "        msg = (\n",
-    "            f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
-    "            f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
-    "            f\"--- STDERR ---\\n{e.stderr}\"\n",
-    "        )\n",
-    "        raise RuntimeError(msg) from None\n",
-    "\n",
-    "def get_baml_client():\n",
-    "    \"\"\"\n",
-    "    a bunch of fun jank to work around the google colab import cache\n",
-    "    \"\"\"\n",
-    "    # Set API key from Colab secrets or environment\n",
-    "    if IN_COLAB:\n",
-    "        os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
-    "    elif 'OPENAI_API_KEY' not in os.environ:\n",
-    "        print(\"Warning: OPENAI_API_KEY not set. Please set it in your environment.\")\n",
-    "    \n",
-    "    baml_generate()\n",
-    "    \n",
-    "    import importlib\n",
-    "    import baml_client\n",
-    "    importlib.reload(baml_client)\n",
-    "    return baml_client.sync_client.b\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2a104bd2",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!baml-cli init"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c2e54d6f",
-   "metadata": {},
-   "source": [
-    "Now let's create our agent that will use BAML to process user input.\n",
-    "\n",
-    "First, we'll define the core agent logic:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "44538d6c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/01-agent.py\n",
-    "import json\n",
-    "from typing import Dict, Any, List\n",
-    "\n",
-    "# tool call or a respond to human tool\n",
-    "AgentResponse = Any  # This will be the return type from b.DetermineNextStep\n",
-    "\n",
-    "class Event:\n",
-    "    def __init__(self, type: str, data: Any):\n",
-    "        self.type = type\n",
-    "        self.data = data\n",
-    "\n",
-    "class Thread:\n",
-    "    def __init__(self, events: List[Dict[str, Any]]):\n",
-    "        self.events = events\n",
-    "    \n",
-    "    def serialize_for_llm(self):\n",
-    "        # can change this to whatever custom serialization you want to do, XML, etc\n",
-    "        # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
-    "        return json.dumps(self.events)\n",
-    "\n",
-    "# right now this just runs one turn with the LLM, but\n",
-    "# we'll update this function to handle all the agent logic\n",
-    "def agent_loop(thread: Thread) -> AgentResponse:\n",
-    "    b = get_baml_client()  # This will be defined by the BAML setup\n",
-    "    next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
-    "    return next_step"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "36a153a9",
-   "metadata": {},
-   "source": [
-    "Next, we need to define the BAML function that our agent will use.\n",
-    "\n",
-    "### Understanding BAML Syntax\n",
-    "\n",
-    "BAML files define:\n",
-    "- **Classes**: Structured output schemas (like `DoneForNow` below)\n",
-    "- **Functions**: AI-powered functions that take inputs and return structured outputs\n",
-    "- **Tests**: Example inputs/outputs to validate your prompts\n",
-    "\n",
-    "This BAML file defines what our agent can do:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4d4969f9",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/agent.baml"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3922544b",
-   "metadata": {},
-   "source": [
-    "Now let's create our main function that accepts a message parameter:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3f9cdec5",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/01-main.py\n",
-    "def main(message=\"hello from the notebook!\"):\n",
-    "    # Create a new thread with the user's message as the initial event\n",
-    "    thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
-    "    \n",
-    "    # Run the agent loop with the thread\n",
-    "    result = agent_loop(thread)\n",
-    "    print(result)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "66cecd2f",
-   "metadata": {},
-   "source": [
-    "Let's test our agent! Try calling main() with different messages:\n",
-    "- `main(\"What's the weather like?\")`\n",
-    "- `main(\"Tell me a joke\")`\n",
-    "- `main(\"How are you doing today?\")`\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b8cf2af4",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "baml_generate()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5c75216a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "main(\"Hello from the Python notebook!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f3c2722b",
-   "metadata": {},
-   "source": [
-    "## Chapter 2 - Add Calculator Tools"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4a192efe",
-   "metadata": {},
-   "source": [
-    "Let's add some calculator tools to our agent."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4f39630b",
-   "metadata": {},
-   "source": [
-    "Let's start by adding a tool definition for the calculator.\n",
-    "\n",
-    "These are simple structured outputs that we'll ask the model to\n",
-    "return as a \"next step\" in the agentic loop.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "683816a3",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!curl -fsSL -o baml_src/tool_calculator.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-tool_calculator.baml && cat baml_src/tool_calculator.baml"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "36417465",
-   "metadata": {},
-   "source": [
-    "Now, let's update the agent's DetermineNextStep method to\n",
-    "expose the calculator tools as potential next steps.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5a7db686",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-agent.baml && cat baml_src/agent.baml"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9dc26e7d",
-   "metadata": {},
-   "source": [
-    "Now let's update our main function to show the tool call:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "abc2341f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/02-main.py\n",
-    "def main(message=\"hello from the notebook!\"):\n",
-    "    # Create a new thread with the user's message\n",
-    "    thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
-    "    \n",
-    "    # Get BAML client\n",
-    "    b = get_baml_client()\n",
-    "    \n",
-    "    # Get the next step from the agent - just show the tool call\n",
-    "    next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
-    "    \n",
-    "    # Print the raw response to show the tool call\n",
-    "    print(next_step)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "78a6f953",
-   "metadata": {},
-   "source": [
-    "Let's try out the calculator! The agent should recognize that you want to perform a calculation\n",
-    "and return the appropriate tool call instead of just a message.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9fdd6c71",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "baml_generate()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "06373364",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "main(\"can you add 3 and 4\")"
-   ]
-  }
- ],
- "metadata": {},
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/workshops/2025-07-16/hack/chapter0_test.ipynb
+++ b/workshops/2025-07-16/hack/chapter0_test.ipynb
@@ -1,92 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "889aab5a",
-   "metadata": {},
-   "source": [
-    "# Building the 12-factor agent template from scratch in Python"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "154f7699",
-   "metadata": {},
-   "source": [
-    "Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "18d91ea3",
-   "metadata": {},
-   "source": [
-    "## Chapter 0 - Hello World"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0f435832",
-   "metadata": {},
-   "source": [
-    "Let's start with a basic Python setup and a hello world program."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a1a09ab1",
-   "metadata": {},
-   "source": [
-    "This guide will walk you through building agents in Python with BAML.\n",
-    "\n",
-    "We'll start simple with a hello world program and gradually build up to a full agent.\n",
-    "\n",
-    "For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6311a11e",
-   "metadata": {},
-   "source": [
-    "Here's our simple hello world program:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "41d5d158",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# ./walkthrough/00-main.py\n",
-    "def hello():\n",
-    "    print('hello, world!')\n",
-    "\n",
-    "def main():\n",
-    "    hello()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a925428d",
-   "metadata": {},
-   "source": [
-    "Let's run it to verify it works:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7e38f57f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "main()"
-   ]
-  }
- ],
- "metadata": {},
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/workshops/2025-07-16/hack/test_output.ipynb
+++ b/workshops/2025-07-16/hack/test_output.ipynb
@@ -1,122 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "fedb829e",
-   "metadata": {},
-   "source": [
-    "# Test Walkthrough"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3c45031e",
-   "metadata": {},
-   "source": [
-    "This is a test walkthrough to verify the script works."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6ce41a36",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!pip install baml-py"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ec52f527",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import subprocess\n",
-    "from google.colab import userdata\n",
-    "import os\n",
-    "\n",
-    "def baml_generate():\n",
-    "    try:\n",
-    "        result = subprocess.run(\n",
-    "            [\"baml-cli\", \"generate\"],\n",
-    "            check=True,\n",
-    "            capture_output=True,\n",
-    "            text=True\n",
-    "        )\n",
-    "        if result.stdout:\n",
-    "            print(\"[baml-cli generate]\\n\", result.stdout)\n",
-    "        if result.stderr:\n",
-    "            print(\"[baml-cli generate]\\n\", result.stderr)\n",
-    "    except subprocess.CalledProcessError as e:\n",
-    "        msg = (\n",
-    "            f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
-    "            f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
-    "            f\"--- STDERR ---\\n{e.stderr}\"\n",
-    "        )\n",
-    "        raise RuntimeError(msg) from None\n",
-    "\n",
-    "def get_baml_client():\n",
-    "    \"\"\"\n",
-    "    a bunch of fun jank to work around the google colab import cache\n",
-    "    \"\"\"\n",
-    "    os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
-    "    \n",
-    "    baml_generate()\n",
-    "    \n",
-    "    import importlib\n",
-    "    import baml_client\n",
-    "    importlib.reload(baml_client)\n",
-    "    return baml_client.sync_client.b\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "50969900",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!baml-cli init"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "296fdf4b",
-   "metadata": {},
-   "source": [
-    "## Test Section"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "54ee6e14",
-   "metadata": {},
-   "source": [
-    "This is a test section."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "87503005",
-   "metadata": {},
-   "source": [
-    "This is a test markdown cell"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ddb13aa9",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!echo 'Hello from command!'"
-   ]
-  }
- ],
- "metadata": {},
- "nbformat": 4,
- "nbformat_minor": 5
-}
--- a/workshops/2025-07-16/hack/walkthrough_python.yaml
+++ b/workshops/2025-07-16/hack/walkthrough_python.yaml
@@ -152,4 +152,184 @@ sections:
          - **Tool Execution**: Processing different tool types and returning results
          - **Agent Loop**: Continuing until the agent has a final answer
          
-          From here, we'll start incorporating more intermediate and advanced concepts for 12-factor agents.
+          From here, we'll start incorporating more intermediate and advanced concepts for 12-factor agents.
+  
+  - name: baml-tests
+    title: "Chapter 4 - Add Tests to agent.baml"
+    text: "Let's add some tests to our BAML agent."
+    steps:
+      - text: |
+          In this chapter, we'll learn about BAML testing - a powerful feature that helps ensure your agents behave correctly.
+          
+          ## Why Test BAML Functions?
+          
+          - **Catch regressions**: Ensure changes don't break existing behavior
+          - **Document behavior**: Tests serve as living documentation
+          - **Validate edge cases**: Test complex scenarios and conversation flows
+          - **CI/CD integration**: Run tests automatically in your pipeline
+          
+          Let's start with a simple test that checks the agent's ability to handle basic interactions:
+      - fetch_file: {src: ./walkthrough/04-agent.baml, dest: baml_src/agent.baml}
+      - text: |
+          Run the tests to see them in action:
+      - command: "!baml-cli test"
+      - text: |
+          Now let's improve the tests with assertions! Assertions let you verify specific properties of the agent's output.
+          
+          ## BAML Assertion Syntax
+          
+          Assertions use the `@@assert` directive:
+          ```
+          @@assert(name, {{condition}})
+          ```
+          
+          - `name`: A descriptive name for the assertion
+          - `condition`: A boolean expression using `this` to access the output
+      - fetch_file: {src: ./walkthrough/04b-agent.baml, dest: baml_src/agent.baml}
+      - text: |
+          Run the tests again to see assertions in action:
+      - command: "!baml-cli test"
+      - text: |
+          Finally, let's add more complex test cases that test multi-step conversations.
+          
+          These tests simulate an entire conversation flow, including:
+          - User input
+          - Tool calls made by the agent
+          - Tool responses
+          - Final agent response
+      - fetch_file: {src: ./walkthrough/04c-agent.baml, dest: baml_src/agent.baml}
+      - text: |
+          Run the comprehensive test suite:
+      - command: "!baml-cli test"
+      - text: |
+          ## Key Testing Concepts
+          
+          1. **Test Structure**: Each test specifies functions, arguments, and assertions
+          2. **Progressive Testing**: Start simple, then test complex scenarios
+          3. **Conversation History**: Test how the agent handles multi-turn conversations
+          4. **Tool Integration**: Verify the agent correctly uses tools in sequence
+          
+          With these tests in place, you can confidently modify your agent knowing that core functionality is protected by automated tests!
+  
+  - name: human-tools
+    title: "Chapter 5 - Multiple Human Tools"
+    text: |
+      In this section, we'll add support for multiple tools that serve to contact humans.
+    steps:
+      - text: |
+          So far, our agent only returns a final answer with "done_for_now". But what if the agent needs clarification?
+          
+          Let's add a new tool that allows the agent to request more information from the user.
+          
+          ## Why Human-in-the-Loop?
+          
+          - **Handle ambiguous inputs**: When user input is unclear or contains typos
+          - **Request missing information**: When the agent needs more context
+          - **Confirm sensitive operations**: Before performing important actions
+          - **Interactive workflows**: Build conversational agents that engage users
+          
+          First, let's update our BAML file to include a ClarificationRequest tool:
+      - fetch_file: {src: ./walkthrough/05-agent.baml, dest: baml_src/agent.baml}
+      - text: |
+          Now let's update our agent to handle clarification requests:
+      - file: {src: ./walkthrough/05-agent.py}
+      - text: |
+          Finally, let's create a main function that handles human interaction:
+      - file: {src: ./walkthrough/05-main.py}
+      - text: |
+          Let's test with an ambiguous input that should trigger a clarification request:
+      - run_main: {regenerate_baml: true, args: "can you multiply 3 and FD*(#F&&"}
+      - text: |
+          You should see:
+          1. The agent recognizes the input is unclear
+          2. It asks for clarification
+          3. In Colab, you'll be prompted to type a response
+          4. In local testing, an auto-response is provided
+          5. The agent continues with the clarified input
+          
+          ## Interactive Testing in Colab
+          
+          When running in Google Colab, the `input()` function will create an interactive text box where you can type your response. Try different clarifications to see how the agent adapts!
+          
+          ## Key Concepts
+          
+          - **Human Tools**: Special tool types that return control to the human
+          - **Conversation Flow**: The agent can pause execution to get human input
+          - **Context Preservation**: The full conversation history is maintained
+          - **Flexible Handling**: Different behaviors for different environments
+  
+  - name: customize-prompt
+    title: "Chapter 6 - Customize Your Prompt with Reasoning"
+    text: |
+      In this section, we'll explore how to customize the prompt of the agent with reasoning steps.
+      
+      This is core to [factor 2 - own your prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md)
+    steps:
+      - text: |
+          ## Why Add Reasoning to Prompts?
+          
+          Adding explicit reasoning steps to your prompts can significantly improve agent performance:
+          
+          - **Better decisions**: The model thinks through problems step-by-step
+          - **Transparency**: You can see the model's thought process
+          - **Fewer errors**: Structured thinking reduces mistakes
+          - **Debugging**: Easier to identify where reasoning went wrong
+          
+          Let's update our agent prompt to include a reasoning step:
+      - fetch_file: {src: ./walkthrough/06-agent.baml, dest: baml_src/agent.baml}
+      - text: |
+          Now let's test it with a simple calculation to see the reasoning in action:
+      - run_main: {regenerate_baml: true, args: "can you multiply 3 and 4"}
+      - text: |
+          You should notice in the BAML logs (if enabled) that the model now includes reasoning steps before deciding what to do.
+          
+          ## Advanced Prompt Engineering
+          
+          You can enhance your prompts further by:
+          - Adding specific reasoning templates for different tasks
+          - Including examples of good reasoning
+          - Structuring the reasoning with numbered steps
+          - Adding checks for common mistakes
+          
+          The key is to guide the model's thinking process while still allowing flexibility.
+  
+  - name: context-window
+    title: "Chapter 7 - Customize Your Context Window"
+    text: |
+      In this section, we'll explore how to customize the context window of the agent.
+      
+      This is core to [factor 3 - own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md)
+    steps:
+      - text: |
+          ## Context Window Serialization
+          
+          How you format your conversation history can significantly impact:
+          - **Token usage**: Some formats are more efficient
+          - **Model understanding**: Clear structure helps the model
+          - **Debugging**: Readable formats help development
+          
+          Let's implement two serialization formats: pretty-printed JSON and XML.
+      - file: {src: ./walkthrough/07-agent.py}
+      - text: |
+          Now let's create a main function that can switch between formats:
+      - file: {src: ./walkthrough/07-main.py}
+      - text: |
+          Let's test with JSON format first:
+      - run_main: {regenerate_baml: true, args: "can you multiply 3 and 4, then divide the result by 2", kwargs: {use_xml: false}}
+      - text: |
+          Now let's try the same with XML format:
+      - run_main: {regenerate_baml: false, args: "can you multiply 3 and 4, then divide the result by 2", kwargs: {use_xml: true}}
+      - text: |
+          ## XML vs JSON Trade-offs
+          
+          **XML Benefits**:
+          - More token-efficient for nested data
+          - Clear hierarchy with opening/closing tags
+          - Better for long conversations
+          
+          **JSON Benefits**:
+          - Familiar to most developers
+          - Easy to parse and debug
+          - Native to JavaScript/Python
+          
+          Choose based on your specific needs and token constraints!
--- a/workshops/2025-07-16/hack/walkthroughgen_py.py
+++ b/workshops/2025-07-16/hack/walkthroughgen_py.py
@@ -141,11 +141,25 @@ def process_step(nb, step, base_path, current_functions):
        if regenerate:
            nb.cells.append(new_code_cell("baml_generate()"))
        
+        # Build the main() call
+        call_parts = []
+        
        # Check if args are provided
        args = step['run_main'].get('args', '')
        if args:
-            # Pass the args as a string to main()
-            nb.cells.append(new_code_cell(f'main("{args}")'))
+            call_parts.append(f'"{args}"')
+        
+        # Check if kwargs are provided
+        kwargs = step['run_main'].get('kwargs', {})
+        for key, value in kwargs.items():
+            if isinstance(value, str):
+                call_parts.append(f'{key}="{value}"')
+            else:
+                call_parts.append(f'{key}={value}')
+        
+        # Generate the function call
+        if call_parts:
+            nb.cells.append(new_code_cell(f'main({", ".join(call_parts)})'))
        else:
            nb.cells.append(new_code_cell("main()"))

--- a/workshops/2025-07-16/hack/working_notebook_1.ipynb
+++ b/workshops/2025-07-16/hack/working_notebook_1.ipynb
@@ -1,412 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "id": "37c4763c",
-      "metadata": {
-        "id": "37c4763c"
-      },
-      "source": [
-        "# Workshop Notebook - July 16, 2025\n",
-        "\n",
-        "Welcome to today's workshop! This notebook contains some basic examples to get started.\n",
-        "\n",
-        "## Overview\n",
-        "- Basic Python operations\n",
-        "- Simple calculations\n",
-        "- String manipulation"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "def get_baml_client():\n",
-        "  \"\"\"\n",
-        "  a bunch of fun jank to work around the google colab import cache\n",
-        "  \"\"\"\n",
-        "\n",
-        "  from google.colab import userdata\n",
-        "  import os\n",
-        "  os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
-        "\n",
-        "  import importlib\n",
-        "  import baml_client\n",
-        "  importlib.reload(baml_client)\n",
-        "  return baml_client.sync_client.b\n",
-        "\n"
-      ],
-      "metadata": {
-        "id": "Wu1JAPuEv4AH"
-      },
-      "id": "Wu1JAPuEv4AH",
-      "execution_count": 8,
-      "outputs": []
-    },
-    {
-      "cell_type": "code",
-      "execution_count": 1,
-      "id": "57d39979",
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "57d39979",
-        "outputId": "5d8ed700-6f7f-42b0-bc09-49b84f1fca9f"
-      },
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "Requirement already satisfied: baml-py in /usr/local/lib/python3.11/dist-packages (0.201.0)\n"
-          ]
-        }
-      ],
-      "source": [
-        "!pip install baml-py\n"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!baml-cli init\n"
-      ],
-      "metadata": {
-        "id": "RKpwjadRtQ4E"
-      },
-      "id": "RKpwjadRtQ4E",
-      "execution_count": 2,
-      "outputs": []
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!baml-cli generate"
-      ],
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "uKU4rfG0tVuQ",
-        "outputId": "478ce0f8-41b5-4f69-e88a-491a769a3ce2"
-      },
-      "id": "uKU4rfG0tVuQ",
-      "execution_count": 3,
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "2025-07-16T01:14:07.315 [BAML \u001b[92mINFO\u001b[0m] Wrote 13 files to baml_client\n",
-            "2025-07-16T01:14:07.315 [BAML \u001b[92mINFO\u001b[0m] Generated 1 baml_client: ../baml_client\n"
-          ]
-        }
-      ]
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!baml-cli test"
-      ],
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "rsLwuS_cuUyO",
-        "outputId": "de29b395-b0a8-4e37-bd2e-76ab34f522eb"
-      },
-      "id": "rsLwuS_cuUyO",
-      "execution_count": 6,
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "Summary: 0 failures, 0 passes, 0 running, 0 pending, 0 done                     \r\u001b[2KSummary: 0/2 tests                                                              \r\u001b[2KSummary: 0/2 tests                                                              \r\u001b[2KSummary: 0/2 tests \n",
-            "\u001b[32m⠁\u001b[0m Running DetermineNextStep::HelloWorld                                         \u001b[1A\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[1ASummary: 0/2 tests \n",
-            "\u001b[32m⠁\u001b[0m Running DetermineNextStep::HelloWorld                                         \u001b[1A\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[1ASummary: 0/2 tests \n",
-            "\u001b[32m⠁\u001b[0m Running DetermineNextStep::HelloWorld                                         \u001b[1A\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[1ASummary: 0/2 tests \n",
-            "\u001b[32m⠁\u001b[0m Running DetermineNextStep::HelloWorld\n",
-            "\u001b[32m⠁\u001b[0m Running ExtractResume::vaibhav_resume                                         \u001b[2A\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[2ASummary: 0/2 tests \n",
-            "\u001b[32m⠁\u001b[0m Running DetermineNextStep::HelloWorld\n",
-            "\u001b[32m⠁\u001b[0m Running ExtractResume::vaibhav_resume                                         \u001b[2A\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[2ASummary: 0/2 tests \n",
-            "\u001b[32m⠁\u001b[0m Running DetermineNextStep::HelloWorld\n",
-            "\u001b[32m⠁\u001b[0m Running ExtractResume::vaibhav_resume                                         2025-07-16T01:14:25.520 [BAML \u001b[33mWARN\u001b[0m] \u001b[35mFunction DetermineNextStep\u001b[0m:\n",
-            "    \u001b[33mClient: Qwen3 (<unknown>) - 0ms\u001b[0m\n",
-            "    \u001b[34m---PROMPT---\u001b[0m\n",
-            "    \u001b[2m\u001b[43msystem: \u001b[0m\u001b[2m/nothink \n",
-            "    \n",
-            "    You are a helpful assistant that can help with tasks.\n",
-            "    \u001b[43muser: \u001b[0m\u001b[2mYou are working on the following thread:\n",
-            "    \n",
-            "    {\n",
-            "      \"type\": \"user_input\",\n",
-            "      \"data\": \"hello!\"\n",
-            "    }\n",
-            "    \n",
-            "    What should the next step be?\n",
-            "    \n",
-            "    Answer in JSON using this schema:\n",
-            "    {\n",
-            "      intent: \"done_for_now\",\n",
-            "      message: string,\n",
-            "    }\n",
-            "    \u001b[0m\n",
-            "    \u001b[34m---REQUEST OPTIONS---\u001b[0m\n",
-            "    \u001b[31m---ERROR (Unspecified error code: 2)---\u001b[0m\n",
-            "    \u001b[31mFailed to build request: reqwest::Error {\n",
-            "        kind: Builder,\n",
-            "        source: RelativeUrlWithoutBase,\n",
-            "    }\u001b[0m\n",
-            "\u001b[2A\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[2ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[32m⠁\u001b[0m Running DetermineNextStep::HelloWorld\n",
-            "\u001b[32m⠁\u001b[0m Running ExtractResume::vaibhav_resume                                         \u001b[2A\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[2ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[32m⠁\u001b[0m Running DetermineNextStep::HelloWorld\n",
-            "\u001b[32m⠁\u001b[0m Running ExtractResume::vaibhav_resume                                         \u001b[2A\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[2ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[32m⠁\u001b[0m Running ExtractResume::vaibhav_resume                                         \u001b[1A\r\u001b[2K\u001b[1B\r\u001b[2K\u001b[1A0.04s \u001b[91mERROR     \u001b[0m   DetermineNextStep::HelloWorld\n",
-            "  \u001b[2;31mUnspecified error code: 2 Failed to build request: reqwest::Error {\u001b[0m\n",
-            "  \u001b[2;31m    kind: Builder,\u001b[0m\n",
-            "  \u001b[2;31m    source: RelativeUrlWithoutBase,\u001b[0m\n",
-            "  \u001b[2;31m}\u001b[0m\n",
-            "  \u001b[2;31m\u001b[0m\n",
-            "  \u001b[2;31mRequest options: {}\u001b[0m\n",
-            "Summary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[2K\u001b[1ASummary: 1/2 tests - 1 🛑\n",
-            "\u001b[32m⠤\u001b[0m Running ExtractResume::vaibhav_resume                                         2025-07-16T01:14:26.555 [BAML \u001b[92mINFO\u001b[0m] \u001b[35mFunction ExtractResume\u001b[0m:\n",
-            "    \u001b[33mClient: openai/gpt-4o (gpt-4o-2024-08-06) - 1038ms. StopReason: stop. Tokens(in/out): 81/68\u001b[0m\n",
-            "    \u001b[34m---PROMPT---\u001b[0m\n",
-            "    \u001b[2m\u001b[43msystem: \u001b[0m\u001b[2mExtract from this content:\n",
-            "    Vaibhav Gupta\n",
-            "    vbv@boundaryml.com\n",
-            "    \n",
-            "    Experience:\n",
-            "    - Founder at BoundaryML\n",
-            "    - CV Engineer at Google\n",
-            "    - CV Engineer at Microsoft\n",
-            "    \n",
-            "    Skills:\n",
-            "    - Rust\n",
-            "    - C++\n",
-            "    \n",
-            "    Answer in JSON using this schema:\n",
-            "    {\n",
-            "      name: string,\n",
-            "      email: string,\n",
-            "      experience: string[],\n",
-            "      skills: string[],\n",
-            "    }\n",
-            "    \u001b[0m\n",
-            "    \u001b[34m---LLM REPLY---\u001b[0m\n",
-            "    \u001b[2m{\n",
-            "      \"name\": \"Vaibhav Gupta\",\n",
-            "      \"email\": \"vbv@boundaryml.com\",\n",
-            "      \"experience\": [\n",
-            "        \"Founder at BoundaryML\",\n",
-            "        \"CV Engineer at Google\",\n",
-            "        \"CV Engineer at Microsoft\"\n",
-            "      ],\n",
-            "      \"skills\": [\n",
-            "        \"Rust\",\n",
-            "        \"C++\"\n",
-            "      ]\n",
-            "    }\u001b[0m\n",
-            "    \u001b[34m---Parsed Response (class Resume)---\u001b[0m\n",
-            "    {\n",
-            "      \"name\": \"Vaibhav Gupta\",\n",
-            "      \"email\": \"vbv@boundaryml.com\",\n",
-            "      \"experience\": [\n",
-            "        \"Founder at BoundaryML\",\n",
-            "        \"CV Engineer at Google\",\n",
-            "        \"CV Engineer at Microsoft\"\n",
-            "      ],\n",
-            "      \"skills\": [\n",
-            "        \"Rust\",\n",
-            "        \"C++\"\n",
-            "      ]\n",
-            "    }\n",
-            "\u001b[2K\u001b[1ASummary: 2/2 tests - 1 ✅, 1 🛑\n",
-            "\u001b[2K\n",
-            "INFO: Test results:\n",
-            "---------------------------------------------------------\n",
-            "\u001b[1;34mfunction\u001b[0m \u001b[1;34mDetermineNextStep\u001b[0m\n",
-            "1 tests (1 🛑)\n",
-            "  0.04s \u001b[91mERROR     \u001b[0m   DetermineNextStep::HelloWorld\n",
-            "    \u001b[2m ./baml_src/agent.baml:40\u001b[0m\n",
-            "    \u001b[2;31mUnspecified error code: 2 Failed to build request: reqwest::Error {\u001b[0m\n",
-            "    \u001b[2;31m    kind: Builder,\u001b[0m\n",
-            "    \u001b[2;31m    source: RelativeUrlWithoutBase,\u001b[0m\n",
-            "    \u001b[2;31m}\u001b[0m\n",
-            "    \u001b[2;31m\u001b[0m\n",
-            "    \u001b[2;31mRequest options: {}\u001b[0m\n",
-            "\u001b[1;34mfunction\u001b[0m \u001b[1;34mExtractResume\u001b[0m\n",
-            "1 tests (1 ✅)\n",
-            "  1.08s \u001b[32mPASSED    \u001b[0m   ExtractResume::vaibhav_resume\n",
-            "    \u001b[2m ./baml_src/resume.baml:25\u001b[0m\n",
-            "---------------------------------------------------------\n",
-            "INFO: Test run completed, 2 tests (1 ✅, 1 🛑)\n",
-            "\n"
-          ]
-        }
-      ]
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!rm baml_src/resume.baml"
-      ],
-      "metadata": {
-        "id": "eJgErOv8zCR1"
-      },
-      "id": "eJgErOv8zCR1",
-      "execution_count": 7,
-      "outputs": []
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "now lets download the agent"
-      ],
-      "metadata": {
-        "id": "MVqkHHOFzFNz"
-      },
-      "id": "MVqkHHOFzFNz"
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/walkthrough/01-agent.baml && cat baml_src/agent.baml"
-      ],
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "wV0WipZytZNt",
-        "outputId": "1adecefd-2c8f-441f-da9a-84e975eebef6"
-      },
-      "id": "wV0WipZytZNt",
-      "execution_count": 13,
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "class DoneForNow {\n",
-            "  intent \"done_for_now\"\n",
-            "  message string \n",
-            "}\n",
-            "\n",
-            "function DetermineNextStep(\n",
-            "    thread: string \n",
-            ") -> DoneForNow {\n",
-            "    client \"openai/gpt-4o\"\n",
-            "\n",
-            "    // use /nothink for now because the thinking tokens (or streaming thereof) screw with baml (i think (no pun intended))\n",
-            "    prompt #\"\n",
-            "        {{ _.role(\"system\") }}\n",
-            "\n",
-            "        You are a helpful assistant that can help with tasks.\n",
-            "\n",
-            "        {{ _.role(\"user\") }}\n",
-            "\n",
-            "        You are working on the following thread:\n",
-            "\n",
-            "        {{ thread }}\n",
-            "\n",
-            "        What should the next step be?\n",
-            "\n",
-            "        {{ ctx.output_format }}\n",
-            "    \"#\n",
-            "}\n",
-            "\n",
-            "test HelloWorld {\n",
-            "  functions [DetermineNextStep]\n",
-            "  args {\n",
-            "    thread #\"\n",
-            "      {\n",
-            "        \"type\": \"user_input\",\n",
-            "        \"data\": \"hello!\"\n",
-            "      }\n",
-            "    \"#\n",
-            "  }\n",
-            "}"
-          ]
-        }
-      ]
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!baml-cli generate"
-      ],
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "JAfVRgy3wK4v",
-        "outputId": "d87fc300-cc8f-43ad-fa95-ab194db365a0"
-      },
-      "id": "JAfVRgy3wK4v",
-      "execution_count": 14,
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "2025-07-16T01:22:21.660 [BAML \u001b[92mINFO\u001b[0m] Wrote 13 files to baml_client\n",
-            "2025-07-16T01:22:21.660 [BAML \u001b[92mINFO\u001b[0m] Generated 1 baml_client: ../baml_client\n"
-          ]
-        }
-      ]
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "b = get_baml_client()\n",
-        "\n",
-        "step = b.DetermineNextStep(\"hi\")\n",
-        "print(step)"
-      ],
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/"
-        },
-        "id": "gyIrbt-ZuXrK",
-        "outputId": "28a83364-248f-4c36-afb3-faaf271aa485"
-      },
-      "id": "gyIrbt-ZuXrK",
-      "execution_count": 9,
-      "outputs": [
-        {
-          "output_type": "stream",
-          "name": "stdout",
-          "text": [
-            "intent='done_for_now' message='Hello! Please let me know if there is anything specific you need assistance with.'\n"
-          ]
-        }
-      ]
-    }
-  ],
-  "metadata": {
-    "colab": {
-      "provenance": []
-    },
-    "language_info": {
-      "name": "python"
-    },
-    "kernelspec": {
-      "name": "python3",
-      "display_name": "Python 3"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 5
-}
--- a/workshops/2025-07-16/hack/workshop_chapter4.ipynb
+++ b/workshops/2025-07-16/hack/workshop_chapter4.ipynb
@@ -0,0 +1,935 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "173dc42f",
+   "metadata": {},
+   "source": [
+    "# Building the 12-factor agent template from scratch in Python"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "add0a779",
+   "metadata": {},
+   "source": [
+    "Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a8df8d6",
+   "metadata": {},
+   "source": [
+    "## Chapter 0 - Hello World"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15b19657",
+   "metadata": {},
+   "source": [
+    "Let's start with a basic Python setup and a hello world program."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "251134f3",
+   "metadata": {},
+   "source": [
+    "This guide will walk you through building agents in Python with BAML.\n",
+    "\n",
+    "We'll start simple with a hello world program and gradually build up to a full agent.\n",
+    "\n",
+    "For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf3fd22d",
+   "metadata": {},
+   "source": [
+    "Here's our simple hello world program:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "da14ddcf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ./walkthrough/00-main.py\n",
+    "def hello():\n",
+    "    print('hello, world!')\n",
+    "\n",
+    "def main():\n",
+    "    hello()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cf83cf4",
+   "metadata": {},
+   "source": [
+    "Let's run it to verify it works:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "080218d1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "main()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7dc6b44",
+   "metadata": {},
+   "source": [
+    "## Chapter 1 - CLI and Agent Loop"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87a82a6f",
+   "metadata": {},
+   "source": [
+    "Now let's add BAML and create our first agent with a CLI interface."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd5af290",
+   "metadata": {},
+   "source": [
+    "In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.\n",
+    "\n",
+    "## What is BAML?\n",
+    "\n",
+    "BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by [BoundaryML](https://www.boundaryml.com/) (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.\n",
+    "\n",
+    "### Why BAML?\n",
+    "\n",
+    "- **Type-safe outputs**: Get fully type-safe outputs from LLMs, even when streaming\n",
+    "- **Language agnostic**: Works with Python, TypeScript, Ruby, Go, and more\n",
+    "- **LLM agnostic**: Works with any LLM provider (OpenAI, Anthropic, etc.)\n",
+    "- **Better performance**: State-of-the-art structured outputs that outperform even OpenAI's native function calling\n",
+    "- **Developer-friendly**: Native VSCode extension with syntax highlighting, autocomplete, and interactive playground\n",
+    "\n",
+    "### Learn More\n",
+    "\n",
+    "- 📚 [Official Documentation](https://docs.boundaryml.com/home)\n",
+    "- 💻 [GitHub Repository](https://github.com/BoundaryML/baml)\n",
+    "- 🎯 [What is BAML?](https://docs.boundaryml.com/guide/introduction/what-is-baml)\n",
+    "- 📖 [BAML Examples](https://github.com/BoundaryML/baml-examples)\n",
+    "- 🏢 [Company Website](https://www.boundaryml.com/)\n",
+    "- 📰 [Blog: AI Agents Need a New Syntax](https://www.boundaryml.com/blog/ai-agents-need-new-syntax)\n",
+    "\n",
+    "BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.\n",
+    "\n",
+    "### Note on Developer Experience\n",
+    "\n",
+    "BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.\n",
+    "\n",
+    "First, let's set up BAML support in our notebook.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1dd0665",
+   "metadata": {},
+   "source": [
+    "### BAML Setup\n",
+    "\n",
+    "Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
+    "- BAML is a tool for working with language models\n",
+    "- We need some special setup code to make it work nicely in Google Colab\n",
+    "- The `get_baml_client()` function will be used to interact with AI models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6df0dc4a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install baml-py==0.202.0 pydantic"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "01121d4c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import subprocess\n",
+    "import os\n",
+    "\n",
+    "# Try to import Google Colab userdata, but don't fail if not in Colab\n",
+    "try:\n",
+    "    from google.colab import userdata\n",
+    "    IN_COLAB = True\n",
+    "except ImportError:\n",
+    "    IN_COLAB = False\n",
+    "\n",
+    "def baml_generate():\n",
+    "    try:\n",
+    "        result = subprocess.run(\n",
+    "            [\"baml-cli\", \"generate\"],\n",
+    "            check=True,\n",
+    "            capture_output=True,\n",
+    "            text=True\n",
+    "        )\n",
+    "        if result.stdout:\n",
+    "            print(\"[baml-cli generate]\\n\", result.stdout)\n",
+    "        if result.stderr:\n",
+    "            print(\"[baml-cli generate]\\n\", result.stderr)\n",
+    "    except subprocess.CalledProcessError as e:\n",
+    "        msg = (\n",
+    "            f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
+    "            f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
+    "            f\"--- STDERR ---\\n{e.stderr}\"\n",
+    "        )\n",
+    "        raise RuntimeError(msg) from None\n",
+    "\n",
+    "def get_baml_client():\n",
+    "    \"\"\"\n",
+    "    a bunch of fun jank to work around the google colab import cache\n",
+    "    \"\"\"\n",
+    "    # Set API key from Colab secrets or environment\n",
+    "    if IN_COLAB:\n",
+    "        os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
+    "    elif 'OPENAI_API_KEY' not in os.environ:\n",
+    "        print(\"Warning: OPENAI_API_KEY not set. Please set it in your environment.\")\n",
+    "    \n",
+    "    baml_generate()\n",
+    "    \n",
+    "    # Force delete all baml_client modules from sys.modules\n",
+    "    import sys\n",
+    "    modules_to_delete = [key for key in sys.modules.keys() if key.startswith('baml_client')]\n",
+    "    for module in modules_to_delete:\n",
+    "        del sys.modules[module]\n",
+    "    \n",
+    "    # Now import fresh\n",
+    "    import baml_client\n",
+    "    return baml_client.sync_client.b\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e1c79b87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!baml-cli init"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4bd63c3",
+   "metadata": {},
+   "source": [
+    "Now let's create our agent that will use BAML to process user input.\n",
+    "\n",
+    "First, we'll define the core agent logic:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0e0617d2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ./walkthrough/01-agent.py\n",
+    "import json\n",
+    "from typing import Dict, Any, List\n",
+    "\n",
+    "# tool call or a respond to human tool\n",
+    "AgentResponse = Any  # This will be the return type from b.DetermineNextStep\n",
+    "\n",
+    "class Event:\n",
+    "    def __init__(self, type: str, data: Any):\n",
+    "        self.type = type\n",
+    "        self.data = data\n",
+    "\n",
+    "class Thread:\n",
+    "    def __init__(self, events: List[Dict[str, Any]]):\n",
+    "        self.events = events\n",
+    "    \n",
+    "    def serialize_for_llm(self):\n",
+    "        # can change this to whatever custom serialization you want to do, XML, etc\n",
+    "        # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
+    "        return json.dumps(self.events)\n",
+    "\n",
+    "# right now this just runs one turn with the LLM, but\n",
+    "# we'll update this function to handle all the agent logic\n",
+    "def agent_loop(thread: Thread) -> AgentResponse:\n",
+    "    b = get_baml_client()  # This will be defined by the BAML setup\n",
+    "    next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
+    "    return next_step"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6aa5e4fd",
+   "metadata": {},
+   "source": [
+    "Next, we need to define the BAML function that our agent will use.\n",
+    "\n",
+    "### Understanding BAML Syntax\n",
+    "\n",
+    "BAML files define:\n",
+    "- **Classes**: Structured output schemas (like `DoneForNow` below)\n",
+    "- **Functions**: AI-powered functions that take inputs and return structured outputs\n",
+    "- **Tests**: Example inputs/outputs to validate your prompts\n",
+    "\n",
+    "This BAML file defines what our agent can do:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "441ee4dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/agent.baml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6d985dc",
+   "metadata": {},
+   "source": [
+    "Now let's create our main function that accepts a message parameter:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0c715dc1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ./walkthrough/01-main.py\n",
+    "def main(message=\"hello from the notebook!\"):\n",
+    "    # Create a new thread with the user's message as the initial event\n",
+    "    thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
+    "    \n",
+    "    # Run the agent loop with the thread\n",
+    "    result = agent_loop(thread)\n",
+    "    print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "407bcd47",
+   "metadata": {},
+   "source": [
+    "Let's test our agent! Try calling main() with different messages:\n",
+    "- `main(\"What's the weather like?\")`\n",
+    "- `main(\"Tell me a joke\")`\n",
+    "- `main(\"How are you doing today?\")`\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "451d4f8f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "baml_generate()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1a99ef71",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "main(\"Hello from the Python notebook!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e46ec89d",
+   "metadata": {},
+   "source": [
+    "## Chapter 2 - Add Calculator Tools"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7861d1a8",
+   "metadata": {},
+   "source": [
+    "Let's add some calculator tools to our agent."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "16f65463",
+   "metadata": {},
+   "source": [
+    "Let's start by adding a tool definition for the calculator.\n",
+    "\n",
+    "These are simple structured outputs that we'll ask the model to\n",
+    "return as a \"next step\" in the agentic loop.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9dc2301b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -fsSL -o baml_src/tool_calculator.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-tool_calculator.baml && cat baml_src/tool_calculator.baml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0289131",
+   "metadata": {},
+   "source": [
+    "Now, let's update the agent's DetermineNextStep method to\n",
+    "expose the calculator tools as potential next steps.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bf1893ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-agent.baml && cat baml_src/agent.baml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a062bc68",
+   "metadata": {},
+   "source": [
+    "Now let's update our main function to show the tool call:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e4368aa4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ./walkthrough/02-main.py\n",
+    "def main(message=\"hello from the notebook!\"):\n",
+    "    # Create a new thread with the user's message\n",
+    "    thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
+    "    \n",
+    "    # Get BAML client\n",
+    "    b = get_baml_client()\n",
+    "    \n",
+    "    # Get the next step from the agent - just show the tool call\n",
+    "    next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
+    "    \n",
+    "    # Print the raw response to show the tool call\n",
+    "    print(next_step)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "251c9ec9",
+   "metadata": {},
+   "source": [
+    "Let's try out the calculator! The agent should recognize that you want to perform a calculation\n",
+    "and return the appropriate tool call instead of just a message.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "255fcb36",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "baml_generate()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b2b8da6f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "main(\"can you add 3 and 4\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35a95a6f",
+   "metadata": {},
+   "source": [
+    "## Chapter 3 - Process Tool Calls in a Loop"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7950fb6c",
+   "metadata": {},
+   "source": [
+    "Now let's add a real agentic loop that can run the tools and get a final answer from the LLM."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "353a9a2c",
+   "metadata": {},
+   "source": [
+    "In this chapter, we'll enhance our agent to process tool calls in a loop. This means:\n",
+    "- The agent can call multiple tools in sequence\n",
+    "- Each tool result is fed back to the agent\n",
+    "- The agent continues until it has a final answer\n",
+    "\n",
+    "Let's update our agent to handle tool calls properly:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f3d7643e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ./walkthrough/03-agent.py\n",
+    "import json\n",
+    "from typing import Dict, Any, List\n",
+    "\n",
+    "class Thread:\n",
+    "    def __init__(self, events: List[Dict[str, Any]]):\n",
+    "        self.events = events\n",
+    "    \n",
+    "    def serialize_for_llm(self):\n",
+    "        # can change this to whatever custom serialization you want to do, XML, etc\n",
+    "        # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
+    "        return json.dumps(self.events)\n",
+    "\n",
+    "\n",
+    "def agent_loop(thread: Thread) -> str:\n",
+    "    b = get_baml_client()\n",
+    "    \n",
+    "    while True:\n",
+    "        next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
+    "        print(\"nextStep\", next_step)\n",
+    "        \n",
+    "        if next_step.intent == \"done_for_now\":\n",
+    "            # response to human, return the next step object\n",
+    "            return next_step.message\n",
+    "        elif next_step.intent == \"add\":\n",
+    "            thread.events.append({\n",
+    "                \"type\": \"tool_call\",\n",
+    "                \"data\": next_step.__dict__\n",
+    "            })\n",
+    "            result = next_step.a + next_step.b\n",
+    "            print(\"tool_response\", result)\n",
+    "            thread.events.append({\n",
+    "                \"type\": \"tool_response\",\n",
+    "                \"data\": result\n",
+    "            })\n",
+    "            continue\n",
+    "        else:\n",
+    "            raise ValueError(f\"Unknown intent: {next_step.intent}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a88ac604",
+   "metadata": {},
+   "source": [
+    "Now let's update our main function to use the new agent loop:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6a6ca94b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ./walkthrough/03-main.py\n",
+    "def main(message=\"hello from the notebook!\"):\n",
+    "    # Create a new thread with the user's message\n",
+    "    thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
+    "    \n",
+    "    # Run the agent loop with full tool handling\n",
+    "    result = agent_loop(thread)\n",
+    "    \n",
+    "    # Print the final response\n",
+    "    print(f\"\\nFinal response: {result}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "296ad48e",
+   "metadata": {},
+   "source": [
+    "Let's try it out! The agent should now call the tool and return the calculated result:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d1491750",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "baml_generate()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "db0ead36",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "main(\"can you add 3 and 4\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a98ecceb",
+   "metadata": {},
+   "source": [
+    "You should see the agent:\n",
+    "1. Recognize it needs to use the add tool\n",
+    "2. Call the tool with the correct parameters\n",
+    "3. Get the result (7)\n",
+    "4. Generate a final response incorporating the result\n",
+    "\n",
+    "For more complex calculations, we need to handle all calculator operations. Let's add support for subtract, multiply, and divide:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c1c84079",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# ./walkthrough/03b-agent.py\n",
+    "import json\n",
+    "from typing import Dict, Any, List, Union\n",
+    "\n",
+    "class Thread:\n",
+    "    def __init__(self, events: List[Dict[str, Any]]):\n",
+    "        self.events = events\n",
+    "    \n",
+    "    def serialize_for_llm(self):\n",
+    "        # can change this to whatever custom serialization you want to do, XML, etc\n",
+    "        # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
+    "        return json.dumps(self.events)\n",
+    "\n",
+    "def handle_next_step(next_step, thread: Thread) -> Thread:\n",
+    "    result: float\n",
+    "    \n",
+    "    if next_step.intent == \"add\":\n",
+    "        result = next_step.a + next_step.b\n",
+    "        print(\"tool_response\", result)\n",
+    "        thread.events.append({\n",
+    "            \"type\": \"tool_response\",\n",
+    "            \"data\": result\n",
+    "        })\n",
+    "        return thread\n",
+    "    elif next_step.intent == \"subtract\":\n",
+    "        result = next_step.a - next_step.b\n",
+    "        print(\"tool_response\", result)\n",
+    "        thread.events.append({\n",
+    "            \"type\": \"tool_response\",\n",
+    "            \"data\": result\n",
+    "        })\n",
+    "        return thread\n",
+    "    elif next_step.intent == \"multiply\":\n",
+    "        result = next_step.a * next_step.b\n",
+    "        print(\"tool_response\", result)\n",
+    "        thread.events.append({\n",
+    "            \"type\": \"tool_response\",\n",
+    "            \"data\": result\n",
+    "        })\n",
+    "        return thread\n",
+    "    elif next_step.intent == \"divide\":\n",
+    "        result = next_step.a / next_step.b\n",
+    "        print(\"tool_response\", result)\n",
+    "        thread.events.append({\n",
+    "            \"type\": \"tool_response\",\n",
+    "            \"data\": result\n",
+    "        })\n",
+    "        return thread\n",
+    "\n",
+    "def agent_loop(thread: Thread) -> str:\n",
+    "    b = get_baml_client()\n",
+    "    \n",
+    "    while True:\n",
+    "        next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
+    "        print(\"nextStep\", next_step)\n",
+    "        \n",
+    "        thread.events.append({\n",
+    "            \"type\": \"tool_call\",\n",
+    "            \"data\": next_step.__dict__\n",
+    "        })\n",
+    "        \n",
+    "        if next_step.intent == \"done_for_now\":\n",
+    "            # response to human, return the next step object\n",
+    "            return next_step.message\n",
+    "        elif next_step.intent in [\"add\", \"subtract\", \"multiply\", \"divide\"]:\n",
+    "            thread = handle_next_step(next_step, thread)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97d6432d",
+   "metadata": {},
+   "source": [
+    "Now let's test subtraction:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6bd66f9f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "main(\"can you subtract 3 from 4\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf2fe3b5",
+   "metadata": {},
+   "source": [
+    "Test multiplication:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b6dc9442",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "main(\"can you multiply 3 and 4\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf4b333c",
+   "metadata": {},
+   "source": [
+    "Finally, let's test a complex multi-step calculation:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "669e7673",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "main(\"can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d942e63",
+   "metadata": {},
+   "source": [
+    "Congratulations! You've taken your first step into hand-rolling an agent loop.\n",
+    "\n",
+    "Key concepts you've learned:\n",
+    "- **Thread Management**: Tracking conversation history and tool calls\n",
+    "- **Tool Execution**: Processing different tool types and returning results\n",
+    "- **Agent Loop**: Continuing until the agent has a final answer\n",
+    "\n",
+    "From here, we'll start incorporating more intermediate and advanced concepts for 12-factor agents.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c97a02d7",
+   "metadata": {},
+   "source": [
+    "## Chapter 4 - Add Tests to agent.baml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a02c2e8",
+   "metadata": {},
+   "source": [
+    "Let's add some tests to our BAML agent."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7e31cd0",
+   "metadata": {},
+   "source": [
+    "In this chapter, we'll learn about BAML testing - a powerful feature that helps ensure your agents behave correctly.\n",
+    "\n",
+    "## Why Test BAML Functions?\n",
+    "\n",
+    "- **Catch regressions**: Ensure changes don't break existing behavior\n",
+    "- **Document behavior**: Tests serve as living documentation\n",
+    "- **Validate edge cases**: Test complex scenarios and conversation flows\n",
+    "- **CI/CD integration**: Run tests automatically in your pipeline\n",
+    "\n",
+    "Let's start with a simple test that checks the agent's ability to handle basic interactions:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "234b026c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/04-agent.baml && cat baml_src/agent.baml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3247eb5a",
+   "metadata": {},
+   "source": [
+    "Run the tests to see them in action:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e0c2f3d1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!baml-cli test"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90aedbc1",
+   "metadata": {},
+   "source": [
+    "Now let's improve the tests with assertions! Assertions let you verify specific properties of the agent's output.\n",
+    "\n",
+    "## BAML Assertion Syntax\n",
+    "\n",
+    "Assertions use the `@@assert` directive:\n",
+    "```\n",
+    "@@assert(name, {{condition}})\n",
+    "```\n",
+    "\n",
+    "- `name`: A descriptive name for the assertion\n",
+    "- `condition`: A boolean expression using `this` to access the output\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f1342588",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/04b-agent.baml && cat baml_src/agent.baml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b377a6c",
+   "metadata": {},
+   "source": [
+    "Run the tests again to see assertions in action:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "edbcb564",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!baml-cli test"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11c2e493",
+   "metadata": {},
+   "source": [
+    "Finally, let's add more complex test cases that test multi-step conversations.\n",
+    "\n",
+    "These tests simulate an entire conversation flow, including:\n",
+    "- User input\n",
+    "- Tool calls made by the agent\n",
+    "- Tool responses\n",
+    "- Final agent response\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e9c86e5e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/04c-agent.baml && cat baml_src/agent.baml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "836f106b",
+   "metadata": {},
+   "source": [
+    "Run the comprehensive test suite:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9b9d12cb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!baml-cli test"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ec0b03f",
+   "metadata": {},
+   "source": [
+    "## Key Testing Concepts\n",
+    "\n",
+    "1. **Test Structure**: Each test specifies functions, arguments, and assertions\n",
+    "2. **Progressive Testing**: Start simple, then test complex scenarios\n",
+    "3. **Conversation History**: Test how the agent handles multi-turn conversations\n",
+    "4. **Tool Integration**: Verify the agent correctly uses tools in sequence\n",
+    "\n",
+    "With these tests in place, you can confidently modify your agent knowing that core functionality is protected by automated tests!"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/workshops/2025-07-16/hack/workshop_chapter5.ipynb
+++ b/workshops/2025-07-16/hack/workshop_chapter5.ipynb
--- a/workshops/2025-07-16/hack/workshop_final.ipynb
+++ b/workshops/2025-07-16/hack/workshop_final.ipynb
--- a/workshops/2025-07-16/walkthrough/04-agent.baml
+++ b/workshops/2025-07-16/walkthrough/04-agent.baml
@@ -3,9 +3,33 @@ class DoneForNow {
  message string 
 }

+class AddTool {
+    intent "add"
+    a int | float
+    b int | float
+}
+
+class SubtractTool {
+    intent "subtract"
+    a int | float
+    b int | float
+}
+
+class MultiplyTool {
+    intent "multiply"
+    a int | float
+    b int | float
+}
+
+class DivideTool {
+    intent "divide"
+    a int | float
+    b int | float
+}
+
 function DetermineNextStep(
    thread: string 
-) -> CalculatorTools | DoneForNow {
+) -> DoneForNow | AddTool | SubtractTool | MultiplyTool | DivideTool {
    client "openai/gpt-4o"

    prompt #"
@@ -37,15 +61,11 @@ test HelloWorld {
  }
 }

-test MathOperation {
+test SimpleMath {
  functions [DetermineNextStep]
  args {
    thread #"
-      {
-        "type": "user_input",
-        "data": "can you multiply 3 and 4?"
-      }
+      [{"type": "user_input", "data": "can you multiply 3 and 4"}]
    "#
  }
-}
-
+}
--- a/workshops/2025-07-16/walkthrough/04b-agent.baml
+++ b/workshops/2025-07-16/walkthrough/04b-agent.baml
@@ -3,10 +3,34 @@ class DoneForNow {
  message string 
 }

+class AddTool {
+    intent "add"
+    a int | float
+    b int | float
+}
+
+class SubtractTool {
+    intent "subtract"
+    a int | float
+    b int | float
+}
+
+class MultiplyTool {
+    intent "multiply"
+    a int | float
+    b int | float
+}
+
+class DivideTool {
+    intent "divide"
+    a int | float
+    b int | float
+}
+
 function DetermineNextStep(
    thread: string 
-) -> CalculatorTools | DoneForNow {
-    client "openai/gpt-4o" 
+) -> DoneForNow | AddTool | SubtractTool | MultiplyTool | DivideTool {
+    client "openai/gpt-4o"

    prompt #"
        {{ _.role("system") }}
@@ -35,19 +59,17 @@ test HelloWorld {
      }
    "#
  }
-  @@assert(hello, {{this.intent == "done_for_now"}})
+  @@assert(intent_check, {{this.intent == "done_for_now"}})
 }

-test MathOperation {
+test SimpleMath {
  functions [DetermineNextStep]
  args {
    thread #"
-      {
-        "type": "user_input",
-        "data": "can you multiply 3 and 4?"
-      }
+      [{"type": "user_input", "data": "can you multiply 3 and 4"}]
    "#
  }
-  @@assert(math_operation, {{this.intent == "multiply"}})
-}
-
+  @@assert(intent_check, {{this.intent == "multiply"}})
+  @@assert(a_check, {{this.a == 3}})
+  @@assert(b_check, {{this.b == 4}})
+}
--- a/workshops/2025-07-16/walkthrough/04c-agent.baml
+++ b/workshops/2025-07-16/walkthrough/04c-agent.baml
@@ -3,9 +3,33 @@ class DoneForNow {
  message string 
 }

+class AddTool {
+    intent "add"
+    a int | float
+    b int | float
+}
+
+class SubtractTool {
+    intent "subtract"
+    a int | float
+    b int | float
+}
+
+class MultiplyTool {
+    intent "multiply"
+    a int | float
+    b int | float
+}
+
+class DivideTool {
+    intent "divide"
+    a int | float
+    b int | float
+}
+
 function DetermineNextStep(
    thread: string 
-) -> CalculatorTools | DoneForNow {
+) -> DoneForNow | AddTool | SubtractTool | MultiplyTool | DivideTool {
    client "openai/gpt-4o"

    prompt #"
@@ -35,20 +59,19 @@ test HelloWorld {
      }
    "#
  }
-  @@assert(intent, {{this.intent == "done_for_now"}})
+  @@assert(intent_check, {{this.intent == "done_for_now"}})
 }

-test MathOperation {
+test SimpleMath {
  functions [DetermineNextStep]
  args {
    thread #"
-      {
-        "type": "user_input",
-        "data": "can you multiply 3 and 4?"
-      }
+      [{"type": "user_input", "data": "can you multiply 3 and 4"}]
    "#
  }
-  @@assert(intent, {{this.intent == "multiply"}})
+  @@assert(intent_check, {{this.intent == "multiply"}})
+  @@assert(a_check, {{this.a == 3}})
+  @@assert(b_check, {{this.b == 4}})
 }

 test LongMath {
@@ -56,50 +79,34 @@ test LongMath {
  args {
    thread #"
      [
-        {
-          "type": "user_input",
-          "data": "can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result?"
-        },
-        {
-          "type": "tool_call",
-          "data": {
-            "intent": "multiply",
-            "a": 3,
-            "b": 4
-          }
-        },
-        {
-          "type": "tool_response",
-          "data": 12
-        },
-        {
-          "type": "tool_call", 
-          "data": {
-            "intent": "divide",
-            "a": 12,
-            "b": 2
-          }
-        },
-        {
-          "type": "tool_response",
-          "data": 6
-        },
-        {
-          "type": "tool_call",
-          "data": {
-            "intent": "add", 
-            "a": 6,
-            "b": 12
-          }
-        },
-        {
-          "type": "tool_response",
-          "data": 18
-        }
+        {"type": "user_input", "data": "can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result"},
+        {"type": "tool_call", "data": {"intent": "multiply", "a": 3, "b": 4}},
+        {"type": "tool_response", "data": 12},
+        {"type": "tool_call", "data": {"intent": "divide", "a": 12, "b": 2}},
+        {"type": "tool_response", "data": 6}
      ]
    "#
  }
-  @@assert(intent, {{this.intent == "done_for_now"}})
-  @@assert(answer, {{"18" in this.message}})
+  @@assert(intent_check, {{this.intent == "add"}})
+  @@assert(a_check, {{this.a == 6}})
+  @@assert(b_check, {{this.b == 12}})
 }

+test CompleteConversation {
+  functions [DetermineNextStep]
+  args {
+    thread #"
+      [
+        {"type": "user_input", "data": "can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result"},
+        {"type": "tool_call", "data": {"intent": "multiply", "a": 3, "b": 4}},
+        {"type": "tool_response", "data": 12},
+        {"type": "tool_call", "data": {"intent": "divide", "a": 12, "b": 2}},
+        {"type": "tool_response", "data": 6},
+        {"type": "tool_call", "data": {"intent": "add", "a": 6, "b": 12}},
+        {"type": "tool_response", "data": 18}
+      ]
+    "#
+  }
+  @@assert(intent_check, {{this.intent == "done_for_now"}})
+  @@assert(answer_check, {{"18" in this.message}})
+}
--- a/workshops/2025-07-16/walkthrough/05-agent.baml
+++ b/workshops/2025-07-16/walkthrough/05-agent.baml
@@ -1,22 +1,40 @@
-// human tools are async requests to a human
-type HumanTools = ClarificationRequest | DoneForNow
-
-class ClarificationRequest {
-  intent "request_more_information" @description("you can request more information from me")
-  message string
-}
-
 class DoneForNow {
  intent "done_for_now"
+  message string 
+}

-  message string @description(#"
-    message to send to the user about the work that was done. 
-  "#)
+class AddTool {
+    intent "add"
+    a int | float
+    b int | float
+}
+
+class SubtractTool {
+    intent "subtract"
+    a int | float
+    b int | float
+}
+
+class MultiplyTool {
+    intent "multiply"
+    a int | float
+    b int | float
+}
+
+class DivideTool {
+    intent "divide"
+    a int | float
+    b int | float
+}
+
+class ClarificationRequest {
+    intent "request_more_information"
+    message string @description("you can request more information from the user")
 }

 function DetermineNextStep(
    thread: string 
-) -> HumanTools | CalculatorTools {
+) -> DoneForNow | AddTool | SubtractTool | MultiplyTool | DivideTool | ClarificationRequest {
    client "openai/gpt-4o"

    prompt #"
@@ -34,84 +52,4 @@ function DetermineNextStep(

        {{ ctx.output_format }}
    "#
-}
-
-test HelloWorld {
-  functions [DetermineNextStep]
-  args {
-    thread #"
-      {
-        "type": "user_input",
-        "data": "hello!"
-      }
-    "#
-  }
-  @@assert(intent, {{this.intent == "done_for_now"}})
-}
-
-test MathOperation {
-  functions [DetermineNextStep]
-  args {
-    thread #"
-      {
-        "type": "user_input",
-        "data": "can you multiply 3 and 4?"
-      }
-    "#
-  }
-  @@assert(intent, {{this.intent == "multiply"}})
-}
-
-test LongMath {
-  functions [DetermineNextStep]
-  args {
-    thread #"
-      [
-        {
-          "type": "user_input",
-          "data": "can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result?"
-        },
-        {
-          "type": "tool_call",
-          "data": {
-            "intent": "multiply",
-            "a": 3,
-            "b": 4
-          }
-        },
-        {
-          "type": "tool_response",
-          "data": 12
-        },
-        {
-          "type": "tool_call", 
-          "data": {
-            "intent": "divide",
-            "a": 12,
-            "b": 2
-          }
-        },
-        {
-          "type": "tool_response",
-          "data": 6
-        },
-        {
-          "type": "tool_call",
-          "data": {
-            "intent": "add", 
-            "a": 6,
-            "b": 12
-          }
-        },
-        {
-          "type": "tool_response",
-          "data": 18
-        }
-      ]
-    "#
-  }
-  @@assert(intent, {{this.intent == "done_for_now"}})
-  @@assert(answer, {{"18" in this.message}})
-}
-
-
+}
--- a/workshops/2025-07-16/walkthrough/06-agent.baml
+++ b/workshops/2025-07-16/walkthrough/06-agent.baml
@@ -1,22 +1,40 @@
-// human tools are async requests to a human
-type HumanTools = ClarificationRequest | DoneForNow
-
-class ClarificationRequest {
-  intent "request_more_information" @description("you can request more information from me")
-  message string
-}
-
 class DoneForNow {
  intent "done_for_now"
+  message string 
+}

-  message string @description(#"
-    message to send to the user about the work that was done. 
-  "#)
+class AddTool {
+    intent "add"
+    a int | float
+    b int | float
+}
+
+class SubtractTool {
+    intent "subtract"
+    a int | float
+    b int | float
+}
+
+class MultiplyTool {
+    intent "multiply"
+    a int | float
+    b int | float
+}
+
+class DivideTool {
+    intent "divide"
+    a int | float
+    b int | float
+}
+
+class ClarificationRequest {
+    intent "request_more_information"
+    message string @description("you can request more information from the user")
 }

 function DetermineNextStep(
    thread: string 
-) -> HumanTools | CalculatorTools {
+) -> DoneForNow | AddTool | SubtractTool | MultiplyTool | DivideTool | ClarificationRequest {
    client "openai/gpt-4o"

    prompt #"
@@ -30,123 +48,18 @@ function DetermineNextStep(

        {{ thread }}

+        Before deciding on the next step, think through the situation:
+        1. What has been asked?
+        2. What information do I have?
+        3. What tools are available to me?
+        4. What is the most logical next step?
+
+        <reasoning>
+        Think step by step about what needs to be done next.
+        </reasoning>
+
        What should the next step be?

        {{ ctx.output_format }}
-
-        First, always plan out what to do next, for example:
-
-        - ...
-        - ...
-        - ...
-
-        {...} // schema
    "#
-}
-
-test HelloWorld {
-  functions [DetermineNextStep]
-  args {
-    thread #"
-      {
-        "type": "user_input",
-        "data": "hello!"
-      }
-    "#
-  }
-  @@assert(intent, {{this.intent == "request_more_information"}})
-}
-
-test MathOperation {
-  functions [DetermineNextStep]
-  args {
-    thread #"
-      {
-        "type": "user_input",
-        "data": "can you multiply 3 and 4?"
-      }
-    "#
-  }
-  @@assert(intent, {{this.intent == "multiply"}})
-}
-
-test LongMath {
-  functions [DetermineNextStep]
-  args {
-    thread #"
-      [
-        {
-          "type": "user_input",
-          "data": "can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result?"
-        },
-        {
-          "type": "tool_call",
-          "data": {
-            "intent": "multiply",
-            "a": 3,
-            "b": 4
-          }
-        },
-        {
-          "type": "tool_response",
-          "data": 12
-        },
-        {
-          "type": "tool_call", 
-          "data": {
-            "intent": "divide",
-            "a": 12,
-            "b": 2
-          }
-        },
-        {
-          "type": "tool_response",
-          "data": 6
-        },
-        {
-          "type": "tool_call",
-          "data": {
-            "intent": "add", 
-            "a": 6,
-            "b": 12
-          }
-        },
-        {
-          "type": "tool_response",
-          "data": 18
-        }
-      ]
-    "#
-  }
-  @@assert(intent, {{this.intent == "done_for_now"}})
-  @@assert(answer, {{"18" in this.message}})
-}
-
-
-
-test MathOperationWithClarification {
-  functions [DetermineNextStep]
-  args {
-    thread #"
-          [{"type":"user_input","data":"can you multiply 3 and feee9ff10"}]
-      "#
-  }
-  @@assert(intent, {{this.intent == "request_more_information"}})
-}
-
-test MathOperationPostClarification {
-  functions [DetermineNextStep]
-  args {
-    thread #"
-        [
-        {"type":"user_input","data":"can you multiply 3 and FD*(#F&& ?"},
-        {"type":"tool_call","data":{"intent":"request_more_information","message":"It seems like there was a typo or mistake in your request. Could you please clarify or provide the correct numbers you would like to multiply?"}},
-        {"type":"human_response","data":"lets try 12 instead"},
-      ]
-      "#
-  }
-  @@assert(intent, {{this.intent == "multiply"}})
-  @@assert(a, {{this.b == 12}})
-  @@assert(b, {{this.a == 3}})
-}
-        
+}