This commit is contained in:
dexhorthy
2025-07-16 18:16:40 -07:00
parent 4e8f3c3953
commit 2900ce9b50
34 changed files with 348 additions and 10577 deletions

View File

@@ -1,152 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "a1f8090b",
"metadata": {},
"source": [
"# Workshop Notebook - July 16, 2025\n",
"\n",
"Welcome to today's workshop! This notebook contains some basic examples to get started.\n",
"\n",
"## Overview\n",
"- Basic Python operations\n",
"- Simple calculations\n",
"- String manipulation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91427574",
"metadata": {},
"outputs": [],
"source": [
"# Classic Hello World\n",
"print(\"Hello, World!\")\n",
"print(\"Welcome to the workshop!\")"
]
},
{
"cell_type": "markdown",
"id": "65d3884e",
"metadata": {},
"source": [
"## Basic Mathematics\n",
"\n",
"Let's perform some simple calculations:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "75db09d8",
"metadata": {},
"outputs": [],
"source": [
"# Basic arithmetic operations\n",
"a = 42\n",
"b = 17\n",
"c = a + b\n",
"\n",
"print(f\"{a} + {b} = {c}\")\n",
"print(f\"{a} - {b} = {a - b}\")\n",
"print(f\"{a} * {b} = {a * b}\")\n",
"print(f\"{a} / {b} = {a / b:.2f}\")"
]
},
{
"cell_type": "markdown",
"id": "59fef134",
"metadata": {},
"source": [
"## Working with Lists\n",
"\n",
"Python lists are versatile data structures:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74c3aa8e",
"metadata": {},
"outputs": [],
"source": [
"# Working with lists\n",
"numbers = [1, 2, 3, 4, 5]\n",
"print(\"Original list:\", numbers)\n",
"\n",
"# Add more numbers\n",
"numbers.extend([6, 7, 8, 9, 10])\n",
"print(\"Extended list:\", numbers)\n",
"\n",
"# Calculate sum and average\n",
"total = sum(numbers)\n",
"average = total / len(numbers)\n",
"\n",
"print(f\"Sum: {total}\")\n",
"print(f\"Average: {average}\")\n",
"print(f\"Max: {max(numbers)}\")\n",
"print(f\"Min: {min(numbers)}\")"
]
},
{
"cell_type": "markdown",
"id": "fb1c8321",
"metadata": {},
"source": [
"## Creating Functions\n",
"\n",
"Let's define some simple functions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5dbca9d2",
"metadata": {},
"outputs": [],
"source": [
"# Define a simple function\n",
"def greet(name):\n",
" return f\"Hello, {name}! Welcome to the workshop.\"\n",
"\n",
"def calculate_area(length, width):\n",
" return length * width\n",
"\n",
"# Use the functions\n",
"print(greet(\"Python Developer\"))\n",
"print(f\"Area of a 5x3 rectangle: {calculate_area(5, 3)} square units\")"
]
},
{
"cell_type": "markdown",
"id": "85159b95",
"metadata": {},
"source": [
"## Next Steps\n",
"\n",
"Feel free to add your own cells below and experiment with Python!\n",
"\n",
"Some ideas to try:\n",
"- Create a function that calculates fibonacci numbers\n",
"- Work with dictionaries\n",
"- Try list comprehensions\n",
"- Import and use external libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e303725f",
"metadata": {},
"outputs": [],
"source": [
"# Your code here\n"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,132 +0,0 @@
# BAML Logging in Jupyter Notebooks
## Overview
BAML uses the `BAML_LOG` environment variable to control logging output. However, in Jupyter notebooks, these logs are sent to stderr and aren't automatically captured in the cell output. This guide explains how to capture and display BAML logs in Jupyter notebooks.
## The Problem
When you set `os.environ["BAML_LOG"] = "info"` and run BAML functions in a Jupyter notebook, the logs are written to stderr but don't appear in the notebook cell output. This is because:
1. BAML logs to stderr at the system level
2. Jupyter notebooks don't automatically capture subprocess stderr
3. The logs bypass Python's standard logging module
## The Solution
The solution is to use IPython's `capture_output` context manager to capture both stdout and stderr when running BAML functions.
### Basic Usage
```python
from IPython.utils.capture import capture_output
import os
# Set BAML logging level
os.environ['BAML_LOG'] = 'info'
# Helper function to run code with BAML log capture
def run_with_baml_logs(func, *args, **kwargs):
"""Run a function and display BAML logs in the notebook."""
with capture_output() as captured:
result = func(*args, **kwargs)
# Display the result
if result is not None:
print("=== Result ===")
print(result)
# Display BAML logs from stderr
if captured.stderr:
print("\n=== BAML Logs ===")
print(captured.stderr)
return result
# Use it like this:
run_with_baml_logs(main, "can you multiply 3 and 4")
```
## BAML Log Levels
Set `BAML_LOG` to one of these levels:
- `error`: Fatal errors only
- `warn`: Function failures (default)
- `info`: All function calls, prompts, and responses
- `debug`: Includes detailed parsing errors
- `trace`: Most comprehensive logging
- `off`: No logging
## Enhanced Reasoning Visualization
For sections that use reasoning prompts, you can extract and highlight the reasoning steps:
```python
import re
from IPython.display import display, HTML
def run_and_show_reasoning(func, *args, **kwargs):
"""Run a function and highlight reasoning steps."""
with capture_output() as captured:
result = func(*args, **kwargs)
if captured.stderr:
# Extract reasoning blocks
reasoning_pattern = r'<reasoning>(.*?)</reasoning>'
reasoning_matches = re.findall(reasoning_pattern, captured.stderr, re.DOTALL)
if reasoning_matches:
display(HTML("<h3>🧠 Model Reasoning:</h3>"))
for reasoning in reasoning_matches:
display(HTML(f'''
<div style='background-color: #f0f8ff;
border-left: 4px solid #4169e1;
padding: 10px; margin: 10px 0;'>
{reasoning.strip().replace(chr(10), '<br>')}
</div>
'''))
return result
```
## Implementation in Notebook Generator
The updated `walkthroughgen_py.py` automatically includes:
1. A logging helper cell after BAML setup
2. Automatic wrapping of `main()` calls with `run_with_baml_logs()`
3. Enhanced reasoning visualization for the reasoning chapter
4. Proper handling of different log levels with icons
## What You'll See
With logging enabled, you'll see:
- **Prompt sent to the model**: The full prompt including system and user messages
- **Raw model response**: The complete response from the LLM
- **Parsed output**: How BAML parsed the response into structured data
- **Reasoning steps**: If using reasoning prompts, the model's thought process
- **Timing information**: How long each call took
- **Token usage**: Number of tokens used (if available)
## Troubleshooting
If logs aren't appearing:
1. Verify `BAML_LOG` is set: `print(os.environ.get('BAML_LOG'))`
2. Ensure you're using the capture wrapper functions
3. Check that BAML is properly initialized
4. Try setting `BAML_LOG='debug'` for more verbose output
## Environment Variables
- `BAML_LOG`: Controls logging level (info, debug, trace, etc.)
- `BOUNDARY_MAX_LOG_CHUNK_CHARS`: Truncate log entries (e.g., 3000)
## Notes
- Logs are captured per cell execution
- Full logs can be quite verbose - start with 'info' level
- The reasoning visualization works best with prompts that include `<reasoning>` tags
- In Google Colab, the capture functions work the same way as local Jupyter

View File

@@ -1,82 +0,0 @@
#!/usr/bin/env python3
"""Helper utilities for capturing BAML logs in Jupyter notebooks."""
import os
import sys
import logging
import contextlib
from io import StringIO
# Configure Python logging to display in Jupyter
def setup_jupyter_logging():
"""Configure logging to work properly in Jupyter notebooks."""
# Remove any existing handlers
root_logger = logging.getLogger()
for handler in root_logger.handlers[:]:
root_logger.removeHandler(handler)
# Create a new handler that outputs to stdout
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s'))
# Set up the root logger
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# Also set up BAML-specific logger
baml_logger = logging.getLogger('baml')
baml_logger.setLevel(logging.DEBUG)
return root_logger
@contextlib.contextmanager
def capture_baml_output():
"""Context manager to capture BAML output in Jupyter notebooks."""
# Capture stdout and stderr
old_stdout = sys.stdout
old_stderr = sys.stderr
stdout_capture = StringIO()
stderr_capture = StringIO()
try:
# Redirect stdout and stderr
sys.stdout = stdout_capture
sys.stderr = stderr_capture
yield stdout_capture, stderr_capture
finally:
# Restore original stdout/stderr
sys.stdout = old_stdout
sys.stderr = old_stderr
# Print captured output
stdout_content = stdout_capture.getvalue()
stderr_content = stderr_capture.getvalue()
if stdout_content:
print("=== BAML Output ===")
print(stdout_content)
if stderr_content:
print("=== BAML Logs ===")
print(stderr_content)
def run_with_baml_logging(func, *args, **kwargs):
"""Run a function and capture its BAML output."""
# Ensure BAML_LOG is set
if 'BAML_LOG' not in os.environ:
os.environ['BAML_LOG'] = 'info'
print(f"BAML_LOG is set to: {os.environ.get('BAML_LOG')}")
with capture_baml_output() as (stdout_cap, stderr_cap):
result = func(*args, **kwargs)
return result
# Example usage in notebook:
# from baml_logging_notebook import run_with_baml_logging, setup_jupyter_logging
# setup_jupyter_logging()
# result = run_with_baml_logging(main, "can you multiply 3 and 4")

View File

@@ -1,352 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dd43ecb0",
"metadata": {},
"source": [
"# Building the 12-factor agent template from scratch in Python"
]
},
{
"cell_type": "markdown",
"id": "51248df6",
"metadata": {},
"source": [
"Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
]
},
{
"cell_type": "markdown",
"id": "2b3f7aa8",
"metadata": {},
"source": [
"## Chapter 0 - Hello World"
]
},
{
"cell_type": "markdown",
"id": "6cfe848d",
"metadata": {},
"source": [
"Let's start with a basic Python setup and a hello world program."
]
},
{
"cell_type": "markdown",
"id": "55dc9f35",
"metadata": {},
"source": [
"This guide will walk you through building agents in Python with BAML.\n",
"\n",
"We'll start simple with a hello world program and gradually build up to a full agent.\n",
"\n",
"For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
]
},
{
"cell_type": "markdown",
"id": "488986b7",
"metadata": {},
"source": [
"Here's our simple hello world program:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "401582ef",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/00-main.py\n",
"def hello():\n",
" print('hello, world!')\n",
"\n",
"def main():\n",
" hello()"
]
},
{
"cell_type": "markdown",
"id": "8971bfd3",
"metadata": {},
"source": [
"Let's run it to verify it works:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "afec6e79",
"metadata": {},
"outputs": [],
"source": [
"main()"
]
},
{
"cell_type": "markdown",
"id": "c4eb5aa5",
"metadata": {},
"source": [
"## Chapter 1 - CLI and Agent Loop"
]
},
{
"cell_type": "markdown",
"id": "3856fb4c",
"metadata": {},
"source": [
"Now let's add BAML and create our first agent with a CLI interface."
]
},
{
"cell_type": "markdown",
"id": "3aa2700d",
"metadata": {},
"source": [
"In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.\n",
"\n",
"## What is BAML?\n",
"\n",
"BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by [BoundaryML](https://www.boundaryml.com/) (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.\n",
"\n",
"### Why BAML?\n",
"\n",
"- **Type-safe outputs**: Get fully type-safe outputs from LLMs, even when streaming\n",
"- **Language agnostic**: Works with Python, TypeScript, Ruby, Go, and more\n",
"- **LLM agnostic**: Works with any LLM provider (OpenAI, Anthropic, etc.)\n",
"- **Better performance**: State-of-the-art structured outputs that outperform even OpenAI's native function calling\n",
"- **Developer-friendly**: Native VSCode extension with syntax highlighting, autocomplete, and interactive playground\n",
"\n",
"### Learn More\n",
"\n",
"- 📚 [Official Documentation](https://docs.boundaryml.com/home)\n",
"- 💻 [GitHub Repository](https://github.com/BoundaryML/baml)\n",
"- 🎯 [What is BAML?](https://docs.boundaryml.com/guide/introduction/what-is-baml)\n",
"- 📖 [BAML Examples](https://github.com/BoundaryML/baml-examples)\n",
"- 🏢 [Company Website](https://www.boundaryml.com/)\n",
"- 📰 [Blog: AI Agents Need a New Syntax](https://www.boundaryml.com/blog/ai-agents-need-new-syntax)\n",
"\n",
"BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.\n",
"\n",
"### Note on Developer Experience\n",
"\n",
"BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.\n",
"\n",
"First, let's set up BAML support in our notebook.\n"
]
},
{
"cell_type": "markdown",
"id": "44c77bbd",
"metadata": {},
"source": [
"### BAML Setup\n",
"\n",
"Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
"- BAML is a tool for working with language models\n",
"- We need some special setup code to make it work nicely in Google Colab\n",
"- The `get_baml_client()` function will be used to interact with AI models"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9e719c4a",
"metadata": {},
"outputs": [],
"source": [
"!pip install baml-py"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b8fb003",
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"from google.colab import userdata\n",
"import os\n",
"\n",
"def baml_generate():\n",
" try:\n",
" result = subprocess.run(\n",
" [\"baml-cli\", \"generate\"],\n",
" check=True,\n",
" capture_output=True,\n",
" text=True\n",
" )\n",
" if result.stdout:\n",
" print(\"[baml-cli generate]\\n\", result.stdout)\n",
" if result.stderr:\n",
" print(\"[baml-cli generate]\\n\", result.stderr)\n",
" except subprocess.CalledProcessError as e:\n",
" msg = (\n",
" f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
" f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
" f\"--- STDERR ---\\n{e.stderr}\"\n",
" )\n",
" raise RuntimeError(msg) from None\n",
"\n",
"def get_baml_client():\n",
" \"\"\"\n",
" a bunch of fun jank to work around the google colab import cache\n",
" \"\"\"\n",
" os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
" \n",
" baml_generate()\n",
" \n",
" import importlib\n",
" import baml_client\n",
" importlib.reload(baml_client)\n",
" return baml_client.sync_client.b\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2de498d9",
"metadata": {},
"outputs": [],
"source": [
"!baml-cli init"
]
},
{
"cell_type": "markdown",
"id": "41cba04b",
"metadata": {},
"source": [
"Now let's create our agent that will use BAML to process user input.\n",
"\n",
"First, we'll define the core agent logic:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71ca1e48",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-agent.py\n",
"import json\n",
"from typing import Dict, Any, List\n",
"\n",
"# tool call or a respond to human tool\n",
"AgentResponse = Any # This will be the return type from b.DetermineNextStep\n",
"\n",
"class Event:\n",
" def __init__(self, type: str, data: Any):\n",
" self.type = type\n",
" self.data = data\n",
"\n",
"class Thread:\n",
" def __init__(self, events: List[Dict[str, Any]]):\n",
" self.events = events\n",
" \n",
" def serialize_for_llm(self):\n",
" # can change this to whatever custom serialization you want to do, XML, etc\n",
" # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
" return json.dumps(self.events)\n",
"\n",
"# right now this just runs one turn with the LLM, but\n",
"# we'll update this function to handle all the agent logic\n",
"def agent_loop(thread: Thread) -> AgentResponse:\n",
" b = get_baml_client() # This will be defined by the BAML setup\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" return next_step"
]
},
{
"cell_type": "markdown",
"id": "774aaa2c",
"metadata": {},
"source": [
"Next, we need to define the BAML function that our agent will use.\n",
"\n",
"### Understanding BAML Syntax\n",
"\n",
"BAML files define:\n",
"- **Classes**: Structured output schemas (like `DoneForNow` below)\n",
"- **Functions**: AI-powered functions that take inputs and return structured outputs\n",
"- **Tests**: Example inputs/outputs to validate your prompts\n",
"\n",
"This BAML file defines what our agent can do:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e29d0763",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/01-agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/01-agent.baml"
]
},
{
"cell_type": "markdown",
"id": "9ffae736",
"metadata": {},
"source": [
"Now let's create our main function that accepts a message parameter:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5cd3057c",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message as the initial event\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Run the agent loop with the thread\n",
" result = agent_loop(thread)\n",
" print(result)"
]
},
{
"cell_type": "markdown",
"id": "4ac99573",
"metadata": {},
"source": [
"Let's test our agent! Try calling main() with different messages:\n",
"- `main(\"What's the weather like?\")`\n",
"- `main(\"Tell me a joke\")`\n",
"- `main(\"How are you doing today?\")`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "967e398c",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5b128932",
"metadata": {},
"outputs": [],
"source": [
"main(\"Hello from the Python notebook!\")"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,477 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "acc41186",
"metadata": {},
"source": [
"# Building the 12-factor agent template from scratch in Python"
]
},
{
"cell_type": "markdown",
"id": "8fe9763f",
"metadata": {},
"source": [
"Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
]
},
{
"cell_type": "markdown",
"id": "4123e288",
"metadata": {},
"source": [
"## Chapter 0 - Hello World"
]
},
{
"cell_type": "markdown",
"id": "94bee0c3",
"metadata": {},
"source": [
"Let's start with a basic Python setup and a hello world program."
]
},
{
"cell_type": "markdown",
"id": "48330d87",
"metadata": {},
"source": [
"This guide will walk you through building agents in Python with BAML.\n",
"\n",
"We'll start simple with a hello world program and gradually build up to a full agent.\n",
"\n",
"For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
]
},
{
"cell_type": "markdown",
"id": "8ed91b3c",
"metadata": {},
"source": [
"Here's our simple hello world program:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e407a39f",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/00-main.py\n",
"def hello():\n",
" print('hello, world!')\n",
"\n",
"def main():\n",
" hello()"
]
},
{
"cell_type": "markdown",
"id": "1308eecd",
"metadata": {},
"source": [
"Let's run it to verify it works:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4f8da08",
"metadata": {},
"outputs": [],
"source": [
"main()"
]
},
{
"cell_type": "markdown",
"id": "e3d31d30",
"metadata": {},
"source": [
"## Chapter 1 - CLI and Agent Loop"
]
},
{
"cell_type": "markdown",
"id": "bed806b8",
"metadata": {},
"source": [
"Now let's add BAML and create our first agent with a CLI interface."
]
},
{
"cell_type": "markdown",
"id": "d4609f1f",
"metadata": {},
"source": [
"In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.\n",
"\n",
"## What is BAML?\n",
"\n",
"BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by [BoundaryML](https://www.boundaryml.com/) (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.\n",
"\n",
"### Why BAML?\n",
"\n",
"- **Type-safe outputs**: Get fully type-safe outputs from LLMs, even when streaming\n",
"- **Language agnostic**: Works with Python, TypeScript, Ruby, Go, and more\n",
"- **LLM agnostic**: Works with any LLM provider (OpenAI, Anthropic, etc.)\n",
"- **Better performance**: State-of-the-art structured outputs that outperform even OpenAI's native function calling\n",
"- **Developer-friendly**: Native VSCode extension with syntax highlighting, autocomplete, and interactive playground\n",
"\n",
"### Learn More\n",
"\n",
"- 📚 [Official Documentation](https://docs.boundaryml.com/home)\n",
"- 💻 [GitHub Repository](https://github.com/BoundaryML/baml)\n",
"- 🎯 [What is BAML?](https://docs.boundaryml.com/guide/introduction/what-is-baml)\n",
"- 📖 [BAML Examples](https://github.com/BoundaryML/baml-examples)\n",
"- 🏢 [Company Website](https://www.boundaryml.com/)\n",
"- 📰 [Blog: AI Agents Need a New Syntax](https://www.boundaryml.com/blog/ai-agents-need-new-syntax)\n",
"\n",
"BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.\n",
"\n",
"### Note on Developer Experience\n",
"\n",
"BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.\n",
"\n",
"First, let's set up BAML support in our notebook.\n"
]
},
{
"cell_type": "markdown",
"id": "a1667330",
"metadata": {},
"source": [
"### BAML Setup\n",
"\n",
"Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
"- BAML is a tool for working with language models\n",
"- We need some special setup code to make it work nicely in Google Colab\n",
"- The `get_baml_client()` function will be used to interact with AI models"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f4bc29c7",
"metadata": {},
"outputs": [],
"source": [
"!pip install baml-py==0.202.0 pydantic"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8c2328cd",
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"import os\n",
"\n",
"# Try to import Google Colab userdata, but don't fail if not in Colab\n",
"try:\n",
" from google.colab import userdata\n",
" IN_COLAB = True\n",
"except ImportError:\n",
" IN_COLAB = False\n",
"\n",
"def baml_generate():\n",
" try:\n",
" result = subprocess.run(\n",
" [\"baml-cli\", \"generate\"],\n",
" check=True,\n",
" capture_output=True,\n",
" text=True\n",
" )\n",
" if result.stdout:\n",
" print(\"[baml-cli generate]\\n\", result.stdout)\n",
" if result.stderr:\n",
" print(\"[baml-cli generate]\\n\", result.stderr)\n",
" except subprocess.CalledProcessError as e:\n",
" msg = (\n",
" f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
" f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
" f\"--- STDERR ---\\n{e.stderr}\"\n",
" )\n",
" raise RuntimeError(msg) from None\n",
"\n",
"def get_baml_client():\n",
" \"\"\"\n",
" a bunch of fun jank to work around the google colab import cache\n",
" \"\"\"\n",
" # Set API key from Colab secrets or environment\n",
" if IN_COLAB:\n",
" os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
" elif 'OPENAI_API_KEY' not in os.environ:\n",
" print(\"Warning: OPENAI_API_KEY not set. Please set it in your environment.\")\n",
" \n",
" baml_generate()\n",
" \n",
" import importlib\n",
" import baml_client\n",
" importlib.reload(baml_client)\n",
" return baml_client.sync_client.b\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "236e47e5",
"metadata": {},
"outputs": [],
"source": [
"!baml-cli init"
]
},
{
"cell_type": "markdown",
"id": "283a46ca",
"metadata": {},
"source": [
"Now let's create our agent that will use BAML to process user input.\n",
"\n",
"First, we'll define the core agent logic:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "537ac878",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-agent.py\n",
"import json\n",
"from typing import Dict, Any, List\n",
"\n",
"# tool call or a respond to human tool\n",
"AgentResponse = Any # This will be the return type from b.DetermineNextStep\n",
"\n",
"class Event:\n",
" def __init__(self, type: str, data: Any):\n",
" self.type = type\n",
" self.data = data\n",
"\n",
"class Thread:\n",
" def __init__(self, events: List[Dict[str, Any]]):\n",
" self.events = events\n",
" \n",
" def serialize_for_llm(self):\n",
" # can change this to whatever custom serialization you want to do, XML, etc\n",
" # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
" return json.dumps(self.events)\n",
"\n",
"# right now this just runs one turn with the LLM, but\n",
"# we'll update this function to handle all the agent logic\n",
"def agent_loop(thread: Thread) -> AgentResponse:\n",
" b = get_baml_client() # This will be defined by the BAML setup\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" return next_step"
]
},
{
"cell_type": "markdown",
"id": "1682e8b7",
"metadata": {},
"source": [
"Next, we need to define the BAML function that our agent will use.\n",
"\n",
"### Understanding BAML Syntax\n",
"\n",
"BAML files define:\n",
"- **Classes**: Structured output schemas (like `DoneForNow` below)\n",
"- **Functions**: AI-powered functions that take inputs and return structured outputs\n",
"- **Tests**: Example inputs/outputs to validate your prompts\n",
"\n",
"This BAML file defines what our agent can do:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c5c5f245",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "4bc7b6e8",
"metadata": {},
"source": [
"Now let's create our main function that accepts a message parameter:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d6092ec",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message as the initial event\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Run the agent loop with the thread\n",
" result = agent_loop(thread)\n",
" print(result)"
]
},
{
"cell_type": "markdown",
"id": "9cbb5999",
"metadata": {},
"source": [
"Let's test our agent! Try calling main() with different messages:\n",
"- `main(\"What's the weather like?\")`\n",
"- `main(\"Tell me a joke\")`\n",
"- `main(\"How are you doing today?\")`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1943e86f",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3813fcea",
"metadata": {},
"outputs": [],
"source": [
"main(\"Hello from the Python notebook!\")"
]
},
{
"cell_type": "markdown",
"id": "6efa881e",
"metadata": {},
"source": [
"## Chapter 2 - Add Calculator Tools"
]
},
{
"cell_type": "markdown",
"id": "8ee92c1a",
"metadata": {},
"source": [
"Let's add some calculator tools to our agent."
]
},
{
"cell_type": "markdown",
"id": "b20ebd98",
"metadata": {},
"source": [
"Let's start by adding a tool definition for the calculator.\n",
"\n",
"These are simple structured outputs that we'll ask the model to\n",
"return as a \"next step\" in the agentic loop.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c3e2f2bb",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/tool_calculator.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-tool_calculator.baml && cat baml_src/tool_calculator.baml"
]
},
{
"cell_type": "markdown",
"id": "9bdddec7",
"metadata": {},
"source": [
"Now, let's update the agent's DetermineNextStep method to\n",
"expose the calculator tools as potential next steps.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8f027eae",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "485b6900",
"metadata": {},
"source": [
"Now let's update our main function to show the tool call:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10fc6d7e",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/02-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Get BAML client\n",
" b = get_baml_client()\n",
" \n",
" # Get the next step from the agent - just show the tool call\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" \n",
" # Print the raw response to show the tool call\n",
" print(next_step)"
]
},
{
"cell_type": "markdown",
"id": "130656ac",
"metadata": {},
"source": [
"Let's try out the calculator! The agent should recognize that you want to perform a calculation\n",
"and return the appropriate tool call instead of just a message.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e3e0e86",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "30bc78cc",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you add 3 and 4\")"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,488 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "b89b4f8e",
"metadata": {},
"source": [
"# Building the 12-factor agent template from scratch in Python"
]
},
{
"cell_type": "markdown",
"id": "37c97708",
"metadata": {},
"source": [
"Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
]
},
{
"cell_type": "markdown",
"id": "759d9b2e",
"metadata": {},
"source": [
"## Chapter 0 - Hello World"
]
},
{
"cell_type": "markdown",
"id": "2c4d3b42",
"metadata": {},
"source": [
"Let's start with a basic Python setup and a hello world program."
]
},
{
"cell_type": "markdown",
"id": "e72ea142",
"metadata": {},
"source": [
"This guide will walk you through building agents in Python with BAML.\n",
"\n",
"We'll start simple with a hello world program and gradually build up to a full agent.\n",
"\n",
"For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
]
},
{
"cell_type": "markdown",
"id": "48b8dece",
"metadata": {},
"source": [
"Here's our simple hello world program:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2abb7ddd",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/00-main.py\n",
"def hello():\n",
" print('hello, world!')\n",
"\n",
"def main():\n",
" hello()"
]
},
{
"cell_type": "markdown",
"id": "24e048e9",
"metadata": {},
"source": [
"Let's run it to verify it works:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f15d231e",
"metadata": {},
"outputs": [],
"source": [
"main()"
]
},
{
"cell_type": "markdown",
"id": "6907babb",
"metadata": {},
"source": [
"## Chapter 1 - CLI and Agent Loop"
]
},
{
"cell_type": "markdown",
"id": "7499056d",
"metadata": {},
"source": [
"Now let's add BAML and create our first agent with a CLI interface."
]
},
{
"cell_type": "markdown",
"id": "c3bc3c6f",
"metadata": {},
"source": [
"In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.\n",
"\n",
"## What is BAML?\n",
"\n",
"BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by [BoundaryML](https://www.boundaryml.com/) (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.\n",
"\n",
"### Why BAML?\n",
"\n",
"- **Type-safe outputs**: Get fully type-safe outputs from LLMs, even when streaming\n",
"- **Language agnostic**: Works with Python, TypeScript, Ruby, Go, and more\n",
"- **LLM agnostic**: Works with any LLM provider (OpenAI, Anthropic, etc.)\n",
"- **Better performance**: State-of-the-art structured outputs that outperform even OpenAI's native function calling\n",
"- **Developer-friendly**: Native VSCode extension with syntax highlighting, autocomplete, and interactive playground\n",
"\n",
"### Learn More\n",
"\n",
"- 📚 [Official Documentation](https://docs.boundaryml.com/home)\n",
"- 💻 [GitHub Repository](https://github.com/BoundaryML/baml)\n",
"- 🎯 [What is BAML?](https://docs.boundaryml.com/guide/introduction/what-is-baml)\n",
"- 📖 [BAML Examples](https://github.com/BoundaryML/baml-examples)\n",
"- 🏢 [Company Website](https://www.boundaryml.com/)\n",
"- 📰 [Blog: AI Agents Need a New Syntax](https://www.boundaryml.com/blog/ai-agents-need-new-syntax)\n",
"\n",
"BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.\n",
"\n",
"### Note on Developer Experience\n",
"\n",
"BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.\n",
"\n",
"First, let's set up BAML support in our notebook.\n"
]
},
{
"cell_type": "markdown",
"id": "c7cb4ae3",
"metadata": {},
"source": [
"### BAML Setup\n",
"\n",
"Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
"- BAML is a tool for working with language models\n",
"- We need some special setup code to make it work nicely in Google Colab\n",
"- The `get_baml_client()` function will be used to interact with AI models"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7a70ca2c",
"metadata": {},
"outputs": [],
"source": [
"!pip install baml-py==0.202.0 pydantic"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9cda7ecb",
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"import os\n",
"\n",
"# Try to import Google Colab userdata, but don't fail if not in Colab\n",
"try:\n",
" from google.colab import userdata\n",
" IN_COLAB = True\n",
"except ImportError:\n",
" IN_COLAB = False\n",
"\n",
"def baml_generate():\n",
" try:\n",
" result = subprocess.run(\n",
" [\"baml-cli\", \"generate\"],\n",
" check=True,\n",
" capture_output=True,\n",
" text=True\n",
" )\n",
" if result.stdout:\n",
" print(\"[baml-cli generate]\\n\", result.stdout)\n",
" if result.stderr:\n",
" print(\"[baml-cli generate]\\n\", result.stderr)\n",
" except subprocess.CalledProcessError as e:\n",
" msg = (\n",
" f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
" f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
" f\"--- STDERR ---\\n{e.stderr}\"\n",
" )\n",
" raise RuntimeError(msg) from None\n",
"\n",
"def get_baml_client():\n",
" \"\"\"\n",
" a bunch of fun jank to work around the google colab import cache\n",
" \"\"\"\n",
" # Set API key from Colab secrets or environment\n",
" if IN_COLAB:\n",
" os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
" elif 'OPENAI_API_KEY' not in os.environ:\n",
" print(\"Warning: OPENAI_API_KEY not set. Please set it in your environment.\")\n",
" \n",
" baml_generate()\n",
" \n",
" # Force delete all baml_client modules from sys.modules\n",
" import sys\n",
" modules_to_delete = [key for key in sys.modules.keys() if key.startswith('baml_client')]\n",
" for module in modules_to_delete:\n",
" del sys.modules[module]\n",
" \n",
" # Now import fresh\n",
" import baml_client\n",
" return baml_client.sync_client.b\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d7c6cf8b",
"metadata": {},
"outputs": [],
"source": [
"!baml-cli init"
]
},
{
"cell_type": "markdown",
"id": "de84a5ef",
"metadata": {},
"source": [
"Now let's create our agent that will use BAML to process user input.\n",
"\n",
"First, we'll define the core agent logic:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "67a8acf5",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-agent.py\n",
"import json\n",
"from typing import Dict, Any, List\n",
"\n",
"# tool call or a respond to human tool\n",
"AgentResponse = Any # This will be the return type from b.DetermineNextStep\n",
"\n",
"class Event:\n",
" def __init__(self, type: str, data: Any):\n",
" self.type = type\n",
" self.data = data\n",
"\n",
"class Thread:\n",
" def __init__(self, events: List[Dict[str, Any]]):\n",
" self.events = events\n",
" \n",
" def serialize_for_llm(self):\n",
" # can change this to whatever custom serialization you want to do, XML, etc\n",
" # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
" return json.dumps(self.events)\n",
"\n",
"# right now this just runs one turn with the LLM, but\n",
"# we'll update this function to handle all the agent logic\n",
"def agent_loop(thread: Thread) -> AgentResponse:\n",
" b = get_baml_client() # This will be defined by the BAML setup\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" return next_step"
]
},
{
"cell_type": "markdown",
"id": "094f2b2a",
"metadata": {},
"source": [
"Next, we need to define the BAML function that our agent will use.\n",
"\n",
"### Understanding BAML Syntax\n",
"\n",
"BAML files define:\n",
"- **Classes**: Structured output schemas (like `DoneForNow` below)\n",
"- **Functions**: AI-powered functions that take inputs and return structured outputs\n",
"- **Tests**: Example inputs/outputs to validate your prompts\n",
"\n",
"This BAML file defines what our agent can do:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d4aa5b7e",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "10b15666",
"metadata": {},
"source": [
"Now let's create our main function that accepts a message parameter:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74f5c039",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message as the initial event\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Run the agent loop with the thread\n",
" result = agent_loop(thread)\n",
" print(result)"
]
},
{
"cell_type": "markdown",
"id": "39192733",
"metadata": {},
"source": [
"Let's test our agent! Try calling main() with different messages:\n",
"- `main(\"What's the weather like?\")`\n",
"- `main(\"Tell me a joke\")`\n",
"- `main(\"How are you doing today?\")`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c64f5b4c",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dbe20354",
"metadata": {},
"outputs": [],
"source": [
"main(\"Hello from the Python notebook!\")"
]
},
{
"cell_type": "markdown",
"id": "4715c874",
"metadata": {},
"source": [
"## Chapter 2 - Add Calculator Tools"
]
},
{
"cell_type": "markdown",
"id": "91aefaf2",
"metadata": {},
"source": [
"Let's add some calculator tools to our agent."
]
},
{
"cell_type": "markdown",
"id": "faabf4e9",
"metadata": {},
"source": [
"Let's start by adding a tool definition for the calculator.\n",
"\n",
"These are simple structured outputs that we'll ask the model to\n",
"return as a \"next step\" in the agentic loop.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c51257cb",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/tool_calculator.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-tool_calculator.baml && cat baml_src/tool_calculator.baml"
]
},
{
"cell_type": "markdown",
"id": "6149ffa8",
"metadata": {},
"source": [
"Now, let's update the agent's DetermineNextStep method to\n",
"expose the calculator tools as potential next steps.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d3257406",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "3f643b40",
"metadata": {},
"source": [
"Now let's update our main function to show the tool call:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "efdab914",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/02-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Get BAML client\n",
" b = get_baml_client()\n",
" \n",
" # Get the next step from the agent - just show the tool call\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" \n",
" # Print the raw response to show the tool call\n",
" print(next_step)"
]
},
{
"cell_type": "markdown",
"id": "1ff754f0",
"metadata": {},
"source": [
"Let's try out the calculator! The agent should recognize that you want to perform a calculation\n",
"and return the appropriate tool call instead of just a message.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5af0f57b",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "067cfbac",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you add 3 and 4\")"
]
},
{
"cell_type": "markdown",
"id": "z37cg7y9j4",
"source": "# Building the 12-factor agent template from scratch in Python",
"metadata": {}
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,3 +0,0 @@
# Hack Directory Notes
This is a uv project - use `uv add` for dependencies and `uv run` to execute scripts.

View File

@@ -1,104 +0,0 @@
#!/usr/bin/env python3
"""Create a Jupyter notebook with example cells."""
import nbformat
from nbformat.v4 import new_notebook, new_markdown_cell, new_code_cell
def create_sample_notebook():
nb = new_notebook()
# Cell 1: Markdown header
nb.cells.append(new_markdown_cell("""# Workshop Notebook - July 16, 2025
Welcome to today's workshop! This notebook contains some basic examples to get started.
## Overview
- Basic Python operations
- Simple calculations
- String manipulation"""))
# Cell 2: Hello World
nb.cells.append(new_code_cell("""# Classic Hello World
print("Hello, World!")
print("Welcome to the workshop!")"""))
# Cell 3: Markdown for math section
nb.cells.append(new_markdown_cell("""## Basic Mathematics
Let's perform some simple calculations:"""))
# Cell 4: Basic addition
nb.cells.append(new_code_cell("""# Basic arithmetic operations
a = 42
b = 17
c = a + b
print(f"{a} + {b} = {c}")
print(f"{a} - {b} = {a - b}")
print(f"{a} * {b} = {a * b}")
print(f"{a} / {b} = {a / b:.2f}")"""))
# Cell 5: Markdown for list operations
nb.cells.append(new_markdown_cell("""## Working with Lists
Python lists are versatile data structures:"""))
# Cell 6: List operations
nb.cells.append(new_code_cell("""# Working with lists
numbers = [1, 2, 3, 4, 5]
print("Original list:", numbers)
# Add more numbers
numbers.extend([6, 7, 8, 9, 10])
print("Extended list:", numbers)
# Calculate sum and average
total = sum(numbers)
average = total / len(numbers)
print(f"Sum: {total}")
print(f"Average: {average}")
print(f"Max: {max(numbers)}")
print(f"Min: {min(numbers)}")"""))
# Cell 7: Markdown for functions
nb.cells.append(new_markdown_cell("""## Creating Functions
Let's define some simple functions:"""))
# Cell 8: Function definition
nb.cells.append(new_code_cell("""# Define a simple function
def greet(name):
return f"Hello, {name}! Welcome to the workshop."
def calculate_area(length, width):
return length * width
# Use the functions
print(greet("Python Developer"))
print(f"Area of a 5x3 rectangle: {calculate_area(5, 3)} square units")"""))
# Cell 9: Markdown conclusion
nb.cells.append(new_markdown_cell("""## Next Steps
Feel free to add your own cells below and experiment with Python!
Some ideas to try:
- Create a function that calculates fibonacci numbers
- Work with dictionaries
- Try list comprehensions
- Import and use external libraries"""))
# Cell 10: Empty code cell for user
nb.cells.append(new_code_cell("# Your code here\n"))
return nb
if __name__ == "__main__":
notebook = create_sample_notebook()
# Write the notebook
with open("2025-07-16-workshop.ipynb", "w") as f:
nbformat.write(notebook, f)
print("Notebook created successfully!")

View File

@@ -1,114 +0,0 @@
# BAML Output Capture in Notebooks - Debug Report
## Summary
The current implementation successfully captures BAML output in notebooks. Based on my investigation, the BAML logs are being captured correctly using the helper functions in `walkthroughgen_py.py`.
## Key Findings
### 1. BAML Logs Output to stderr
- BAML sends all logs (prompts, responses, reasoning) to stderr by default
- The log level is controlled by the `BAML_LOG` environment variable
- Options: error, warn, info, debug, trace
### 2. Current Capture Methods
The workshop notebooks use two primary methods:
#### Method A: IPython capture_output (Recommended)
```python
from IPython.utils.capture import capture_output
def run_with_baml_logs(func, *args, **kwargs):
"""Run a function and capture BAML logs in the notebook output."""
with capture_output() as captured:
result = func(*args, **kwargs)
# Display result
if result is not None:
print("=== Result ===")
print(result)
# Display BAML logs from stderr
if captured.stderr:
print("\n=== BAML Logs ===")
# Format logs for readability
log_lines = captured.stderr.strip().split('\n')
for line in log_lines:
if 'reasoning' in line.lower():
print(f"🤔 {line}")
else:
print(f" {line}")
return result
```
#### Method B: stderr Redirection (Real-time)
```python
@contextlib.contextmanager
def redirect_stderr_to_stdout():
"""Context manager to redirect stderr to stdout."""
old_stderr = sys.stderr
sys.stderr = sys.stdout
try:
yield
finally:
sys.stderr = old_stderr
def run_with_baml_logs_redirect(func, *args, **kwargs):
"""Run with stderr redirected to stdout for immediate display."""
with redirect_stderr_to_stdout():
result = func(*args, **kwargs)
return result
```
### 3. Test Results
From running `test_notebook_colab_sim.sh`:
- ✅ BAML logs are successfully captured and displayed
- ✅ Python BAML client is generated correctly
- ✅ All notebook cells execute without errors
- ✅ The logging helpers work in both local and Colab environments
### 4. Usage Pattern
The notebooks selectively enable logging for specific calls:
```python
# Normal execution (no logs)
main("Hello world")
# With log capture (when you want to see prompts/reasoning)
run_with_baml_logs(main, "Hello world")
```
### 5. Configuration in walkthrough_python.yaml
The YAML config uses `show_logs: true` to enable logging:
```yaml
steps:
- run_main:
args: "Hello"
show_logs: true # This triggers use of run_with_baml_logs()
```
## Recommendations
1. **The current implementation is working correctly** - BAML logs are being captured
2. **Use `run_with_baml_logs()` when you need to see prompts/reasoning** in notebooks
3. **Set `BAML_LOG=info` for optimal verbosity** (shows prompts without too much noise)
4. **For Colab testing, always validate with the sim script** before uploading
## Common Issues
1. **baml-cli generate failures**: Ensure baml_src directory exists and has valid BAML files
2. **Missing logs**: Check that `BAML_LOG` environment variable is set
3. **Import errors**: Use the `get_baml_client()` pattern to handle Colab's import cache
## Testing Workflow
1. Generate notebook: `uv run python hack/walkthroughgen_py.py hack/walkthrough_python.yaml -o hack/test.ipynb`
2. Test locally: `cd hack && ./test_notebook_colab_sim.sh test.ipynb`
3. Check preserved test directory in `./tmp/test_TIMESTAMP/` for debugging
4. Upload to Colab for final validation

View File

@@ -1,6 +0,0 @@
def main():
print("Hello from workshops!")
if __name__ == "__main__":
main()

View File

@@ -1,112 +0,0 @@
"""
Snippet to add to notebooks for capturing BAML logs.
Add this code cell after the BAML setup cells in the notebook:
"""
notebook_logging_cell = '''# Enable BAML logging capture in Jupyter
import os
import sys
from IPython.utils.capture import capture_output
# Set BAML logging level
os.environ['BAML_LOG'] = 'info'
# Helper function to run code with BAML log capture
def run_with_baml_logs(func, *args, **kwargs):
"""Run a function and display BAML logs in the notebook."""
print(f"Running with BAML_LOG={os.environ.get('BAML_LOG')}...")
# Capture all output
with capture_output() as captured:
result = func(*args, **kwargs)
# Display the result first
if result is not None:
print("=== Result ===")
print(result)
# Display captured stdout
if captured.stdout:
print("\\n=== Output ===")
print(captured.stdout)
# Display BAML logs from stderr
if captured.stderr:
print("\\n=== BAML Logs ===")
# Format the logs for better readability
log_lines = captured.stderr.strip().split('\\n')
for line in log_lines:
if 'reasoning' in line.lower() or '<reasoning>' in line:
print(f"🤔 {line}")
elif 'error' in line.lower():
print(f"{line}")
elif 'warn' in line.lower():
print(f"⚠️ {line}")
else:
print(f" {line}")
return result
# Alternative: Monkey-patch the main function to always capture logs
def with_baml_logging(original_func):
"""Decorator to add BAML logging to any function."""
def wrapper(*args, **kwargs):
return run_with_baml_logs(original_func, *args, **kwargs)
return wrapper
print("BAML logging helper functions loaded! Use run_with_baml_logs(main, 'your message') to see logs.")
'''
# For section 6 (reasoning), add this special cell
reasoning_logging_cell = '''# Special logging setup for reasoning visualization
import os
import re
from IPython.utils.capture import capture_output
from IPython.display import display, HTML
os.environ['BAML_LOG'] = 'info'
def run_and_show_reasoning(func, *args, **kwargs):
"""Run a function and highlight the reasoning steps from BAML logs."""
with capture_output() as captured:
result = func(*args, **kwargs)
# Extract and format reasoning from logs
if captured.stderr:
# Look for reasoning sections
log_text = captured.stderr
# Find reasoning blocks
reasoning_pattern = r'<reasoning>(.*?)</reasoning>'
reasoning_matches = re.findall(reasoning_pattern, log_text, re.DOTALL)
if reasoning_matches:
display(HTML("<h3>🧠 Model Reasoning:</h3>"))
for reasoning in reasoning_matches:
display(HTML(f"""
<div style='background-color: #f0f8ff; border-left: 4px solid #4169e1;
padding: 10px; margin: 10px 0; font-family: monospace;'>
{reasoning.strip().replace('\\n', '<br>')}
</div>
"""))
# Show the full response
display(HTML("<h3>📤 Response:</h3>"))
display(HTML(f"<pre>{str(result)}</pre>"))
# Optionally show full logs
if os.environ.get('SHOW_FULL_LOGS', 'false').lower() == 'true':
display(HTML("<details><summary>View Full BAML Logs</summary><pre style='font-size: 0.8em;'>" +
log_text + "</pre></details>"))
return result
print("Enhanced reasoning visualization loaded! Use run_and_show_reasoning(main, 'your message') to see reasoning steps.")
'''
print("Notebook logging snippets created. Add these to the notebook generator.")
print("\nUsage in notebook:")
print("1. Add notebook_logging_cell after BAML setup")
print("2. Use: run_with_baml_logs(main, 'can you multiply 3 and 4')")
print("3. For reasoning section, use reasoning_logging_cell")

View File

@@ -1,33 +0,0 @@
okay but we have complications - colab cant show baml files, i have hacked a few workaround to make this work, i need you to distill out all the changes that would need to happen in the logic flow from walkthrough.yaml to translate the steps so far into baml notebook - off the top of my head:
1) for each ts file, translate to python and drop in the walkthrough folder (claude will do this)
1b) update walkthrough.yaml to point to the python instead of ts file, lead the baml files unchanged
2) we wont build section-by-section, well build just one big notebook file (similar to the one-big-walkthrough-md target provided by the typescript library)
3) for each text section, add a markdown cell
4) for each code cell, add the full python file from the walkthrough, so that running the code cell will refresh and update any function definitions
5) rather than separate python files, we'll just update the function definitions in the notebook as we go - you might have to get creative / clever in just redefining what we used, and rather than commands to run each cell, you'll want two cells: 1 to update the function, and 1 to re-run the main() function (note: THIS is the part i'm the most unsure of and we might need to adjust the approach as we go!)
6) for each baml file, follow the example in the notebook, fetching the file from the public github url and printing it with cat
6b) if you need to update a baml file for some reason, i will need to push it to the public github repo
7) note that there may be some hackery where we need to re-import the baml_client after changing the baml sources!
other information - i have an example in the hack/ folder of a notebook that has everything working end to end for a subset of chapter 1, including setting secrets, installing/generating baml in a clean reusable way, fetching and displaying baml files
note that the implementation plan will need ways to run/verify the notebook after ever implementation change, so the implementation flow will be
1) make a change to the walkthrough.yaml OR make a change to hack/walkthroughgen_py.py
2) run the walkthroughgen_py.py script to generate the notebook
3) run the notebook to test that it works
3) read the notebook file and check for errors and that the outputs are expected
you will evolve each thing in parallel, targeting finishing a complete chapter in both the walkthrough.yaml and the walkthroughgen_py.py script before proceeding the the next chapter
### important notes
- There is a reference walkthrough from the typescript version in walkthrough-reference.yaml, which you can use to convert one chapter at a time
- `file: ` objects in the python / ipynb target will not have a dest, just a src
### before you start researching
first - review that plan, does it make sense? as you are researching, know that there will be things that I missed and that need to be adjusted in the plan
Ask any questions you have now before you start please.

View File

@@ -1,45 +0,0 @@
#!/usr/bin/env python3
"""Test script to verify BAML logging capture."""
import os
import sys
from contextlib import redirect_stderr
from io import StringIO
# Set BAML log level
os.environ['BAML_LOG'] = 'info'
print("Testing BAML logging capture methods...")
print("=" * 60)
# Method 1: Using sys.stderr redirection
print("\nMethod 1: Direct stderr redirection")
old_stderr = sys.stderr
sys.stderr = sys.stdout
# Simulate BAML logging to stderr
print("This would be a BAML log message", file=old_stderr)
print("BAML logs should appear here if redirected properly")
# Restore stderr
sys.stderr = old_stderr
# Method 2: Using context manager
print("\nMethod 2: Context manager with StringIO")
stderr_capture = StringIO()
with redirect_stderr(stderr_capture):
# Simulate BAML logging
print("This is a BAML log to stderr", file=sys.stderr)
print("Another BAML log message", file=sys.stderr)
# Get captured content
captured = stderr_capture.getvalue()
if captured:
print("Captured stderr content:")
print(captured)
else:
print("No stderr content captured")
print("\n" + "=" * 60)
print("Test complete!")

View File

@@ -1,215 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# BAML Logging Demo - Testing Log Capture in Notebooks\n",
"\n",
"This notebook demonstrates how BAML output is captured in Jupyter notebooks."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install baml-py==0.202.0 pydantic"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"import os\n",
"import sys\n",
"from IPython.utils.capture import capture_output\n",
"\n",
"# Set up environment\n",
"if 'OPENAI_API_KEY' not in os.environ:\n",
" print(\"Warning: OPENAI_API_KEY not set\")\n",
"\n",
"# Set BAML logging\n",
"os.environ['BAML_LOG'] = 'info'\n",
"\n",
"def baml_generate():\n",
" try:\n",
" result = subprocess.run(\n",
" [\"baml-cli\", \"generate\"],\n",
" check=True,\n",
" capture_output=True,\n",
" text=True\n",
" )\n",
" if result.stdout:\n",
" print(\"[baml-cli generate]\\n\", result.stdout)\n",
" if result.stderr:\n",
" print(\"[baml-cli generate stderr]\\n\", result.stderr)\n",
" except subprocess.CalledProcessError as e:\n",
" print(f\"baml-cli generate failed: {e}\")\n",
" raise\n",
"\n",
"def get_baml_client():\n",
" baml_generate()\n",
" import sys\n",
" modules_to_delete = [key for key in sys.modules.keys() if key.startswith('baml_client')]\n",
" for module in modules_to_delete:\n",
" del sys.modules[module]\n",
" import baml_client\n",
" return baml_client.sync_client.b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Initialize BAML\n",
"!baml-cli init"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a simple BAML file\n",
"baml_content = '''class DoneForNow {\n",
" intent \"done_for_now\"\n",
" message string\n",
"}\n",
"\n",
"function DetermineNextStep(thread string) -> DoneForNow {\n",
" client OpenAI/gpt-4o-mini\n",
" prompt #\"\n",
" Given the conversation thread, determine the next step.\n",
" \n",
" Thread:\n",
" {{ thread }}\n",
" \n",
" Respond with a message.\n",
" \"#\n",
"}\n",
"'''\n",
"\n",
"with open('baml_src/agent.baml', 'w') as f:\n",
" f.write(baml_content)\n",
" \n",
"print(\"Created agent.baml\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Helper function to capture BAML logs\n",
"def run_with_baml_logs(func, *args, **kwargs):\n",
" \"\"\"Run a function and capture BAML logs in the notebook output.\"\"\"\n",
" print(f\"Running with BAML_LOG={os.environ.get('BAML_LOG')}...\")\n",
" \n",
" # Capture both stdout and stderr\n",
" with capture_output() as captured:\n",
" result = func(*args, **kwargs)\n",
" \n",
" # Display the result first\n",
" if result is not None:\n",
" print(\"=== Result ===\")\n",
" print(result)\n",
" \n",
" # Display captured stdout if any\n",
" if captured.stdout:\n",
" print(\"\\n=== Stdout ===\")\n",
" print(captured.stdout)\n",
" \n",
" # Display BAML logs from stderr\n",
" if captured.stderr:\n",
" print(\"\\n=== BAML Logs (from stderr) ===\")\n",
" print(captured.stderr)\n",
" \n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test function that uses BAML\n",
"def test_baml_call():\n",
" b = get_baml_client()\n",
" thread = '[{\"type\": \"user_input\", \"data\": \"Hello, how are you?\"}]'\n",
" result = b.DetermineNextStep(thread)\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Run without log capture\n",
"print(\"=== Running WITHOUT log capture ===\")\n",
"result1 = test_baml_call()\n",
"print(f\"Result: {result1}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Run WITH log capture\n",
"print(\"=== Running WITH log capture ===\")\n",
"result2 = run_with_baml_logs(test_baml_call)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test with different log levels\n",
"print(\"\\n=== Testing with DEBUG log level ===\")\n",
"os.environ['BAML_LOG'] = 'debug'\n",
"result3 = run_with_baml_logs(test_baml_call)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"This notebook demonstrates:\n",
"1. BAML logs are written to stderr by default\n",
"2. Using `capture_output()` from IPython can capture these logs\n",
"3. The `run_with_baml_logs()` helper function makes it easy to see BAML logs in notebooks\n",
"4. Different log levels (info, debug) show different amounts of detail"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,281 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Minimal BAML Output Test\n",
"\n",
"This notebook tests different methods of capturing BAML output."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install baml-py==0.202.0 pydantic"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"import os\n",
"import sys\n",
"\n",
"def baml_generate():\n",
" result = subprocess.run(\n",
" [\"baml-cli\", \"generate\"],\n",
" check=True,\n",
" capture_output=True,\n",
" text=True\n",
" )\n",
" if result.stdout:\n",
" print(\"[baml-cli generate]\\n\", result.stdout)\n",
" if result.stderr:\n",
" print(\"[baml-cli generate]\\n\", result.stderr)\n",
"\n",
"def get_baml_client():\n",
" # Set API key\n",
" if 'OPENAI_API_KEY' not in os.environ:\n",
" print(\"Warning: OPENAI_API_KEY not set\")\n",
" \n",
" baml_generate()\n",
" \n",
" # Force delete all baml_client modules from sys.modules\n",
" modules_to_delete = [key for key in sys.modules.keys() if key.startswith('baml_client')]\n",
" for module in modules_to_delete:\n",
" del sys.modules[module]\n",
" \n",
" # Now import fresh\n",
" import baml_client\n",
" return baml_client.sync_client.b"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!baml-cli init"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a simple BAML function\n",
"baml_content = '''function DemoFunction {\n",
" input: string\n",
" output: string\n",
"}\n",
"\n",
"impl<llm, DemoFunction> DemoFunctionImpl {\n",
" prompt #\"\n",
" Say hello to {{input}}\n",
" \"#\n",
"}\n",
"\n",
"client<llm> MyClient {\n",
" provider openai\n",
" options {\n",
" model \"gpt-4o-mini\"\n",
" }\n",
"}\n",
"'''\n",
"\n",
"with open('baml_src/demo.baml', 'w') as f:\n",
" f.write(baml_content)\n",
"\n",
"print(\"Created demo.baml\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Method 1: Direct Execution (Default)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Direct execution - BAML logs go to stderr\n",
"os.environ['BAML_LOG'] = 'info'\n",
"b = get_baml_client()\n",
"result = b.DemoFunction(\"World\")\n",
"print(f\"Result: {result}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Method 2: Capture with IPython"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.utils.capture import capture_output\n",
"\n",
"os.environ['BAML_LOG'] = 'info'\n",
"print(\"Capturing output with IPython...\")\n",
"\n",
"with capture_output() as captured:\n",
" b = get_baml_client()\n",
" result = b.DemoFunction(\"IPython\")\n",
"\n",
"print(f\"Result: {result}\")\n",
"print(\"\\n=== Captured stdout ===\")\n",
"print(captured.stdout)\n",
"print(\"\\n=== Captured stderr (BAML logs) ===\")\n",
"print(captured.stderr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Method 3: Redirect stderr to stdout"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import contextlib\n",
"\n",
"@contextlib.contextmanager\n",
"def redirect_stderr_to_stdout():\n",
" old_stderr = sys.stderr\n",
" sys.stderr = sys.stdout\n",
" try:\n",
" yield\n",
" finally:\n",
" sys.stderr = old_stderr\n",
"\n",
"os.environ['BAML_LOG'] = 'info'\n",
"print(\"Redirecting stderr to stdout...\")\n",
"\n",
"with redirect_stderr_to_stdout():\n",
" b = get_baml_client()\n",
" result = b.DemoFunction(\"Redirect\")\n",
"\n",
"print(f\"\\nResult: {result}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Method 4: Cell Magic %%capture"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%capture captured_output\n",
"os.environ['BAML_LOG'] = 'info'\n",
"b = get_baml_client()\n",
"result = b.DemoFunction(\"Cell Magic\")\n",
"print(f\"Result: {result}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Display captured output\n",
"print(\"=== Captured stdout ===\")\n",
"print(captured_output.stdout)\n",
"print(\"\\n=== Captured stderr (BAML logs) ===\")\n",
"print(captured_output.stderr)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Method 5: Subprocess with combined output"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a Python script that runs BAML\n",
"script_content = '''import os\n",
"os.environ['BAML_LOG'] = 'info'\n",
"os.environ['OPENAI_API_KEY'] = os.environ.get('OPENAI_API_KEY', '')\n",
"\n",
"import baml_client\n",
"b = baml_client.sync_client.b\n",
"result = b.DemoFunction(\"Subprocess\")\n",
"print(f\"Result: {result}\")\n",
"'''\n",
"\n",
"with open('test_baml_script.py', 'w') as f:\n",
" f.write(script_content)\n",
"\n",
"# Run as subprocess with combined output\n",
"result = subprocess.run(\n",
" [sys.executable, 'test_baml_script.py'],\n",
" capture_output=True,\n",
" text=True,\n",
" stderr=subprocess.STDOUT # Combine stderr into stdout\n",
")\n",
"\n",
"print(\"=== Combined output ===\")\n",
"print(result.stdout)\n",
"\n",
"# Clean up\n",
"os.remove('test_baml_script.py')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,23 +0,0 @@
title: "Test ls commands"
text: "Quick test of ls baml_src commands"
targets:
- ipynb: "./test_ls_simple.ipynb"
sections:
- name: test-ls
title: "Test ls Commands"
text: "Testing ls after baml_setup and fetch_file"
steps:
- text: "Setting up BAML"
- baml_setup: true
- command: "!ls baml_src"
- text: "After setup, we should see the default BAML files"
- text: "Now fetching agent.baml"
- fetch_file: {src: ./walkthrough/01-agent.baml, dest: baml_src/agent.baml}
- command: "!ls baml_src"
- text: "Now we should see agent.baml added"
- text: "Fetching calculator tools"
- fetch_file: {src: ./walkthrough/02-tool_calculator.baml, dest: baml_src/tool_calculator.baml}
- command: "!ls baml_src"
- text: "Now we should see both agent.baml and tool_calculator.baml"

View File

@@ -1,81 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Minimal BAML Logging Test"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import sys\n",
"from IPython.utils.capture import capture_output\n",
"\n",
"# Set BAML log level\n",
"os.environ['BAML_LOG'] = 'info'\n",
"\n",
"# Test direct stderr write\n",
"print(\"Testing stderr capture...\")\n",
"with capture_output() as captured:\n",
" print(\"This goes to stdout\")\n",
" print(\"This goes to stderr\", file=sys.stderr)\n",
"\n",
"print(\"Captured stdout:\", captured.stdout)\n",
"print(\"Captured stderr:\", captured.stderr)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test with real BAML if available\n",
"try:\n",
" # Simple function that might generate BAML logs\n",
" def test_function():\n",
" print(\"Function output\")\n",
" # Simulate BAML log to stderr\n",
" print(\"[BAML INFO] Test log message\", file=sys.stderr)\n",
" return \"Result\"\n",
" \n",
" # Capture with helper\n",
" def run_with_capture(func, *args, **kwargs):\n",
" with capture_output() as captured:\n",
" result = func(*args, **kwargs)\n",
" \n",
" if result:\n",
" print(f\"Result: {result}\")\n",
" if captured.stdout:\n",
" print(f\"\\nStdout:\\n{captured.stdout}\")\n",
" if captured.stderr:\n",
" print(f\"\\nStderr (logs):\\n{captured.stderr}\")\n",
" \n",
" return result\n",
" \n",
" run_with_capture(test_function)\n",
"except Exception as e:\n",
" print(f\"Error: {e}\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.9.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -1,13 +0,0 @@
title: "Test Walkthrough"
text: "This is a test walkthrough to verify the script works."
targets:
- ipynb: "./test_output.ipynb"
sections:
- name: test-section
title: "Test Section"
text: "This is a test section."
steps:
- text: "This is a test markdown cell"
- command: "echo 'Hello from command!'"

View File

@@ -1,244 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "28d943f3",
"metadata": {},
"source": [
"# Test Tree Command After Fetch"
]
},
{
"cell_type": "markdown",
"id": "2ceb6af5",
"metadata": {},
"source": [
"Simple test to verify tree command works after fetch_file"
]
},
{
"cell_type": "markdown",
"id": "c29aaef8",
"metadata": {},
"source": [
"## Test Tree Command"
]
},
{
"cell_type": "markdown",
"id": "6004361d",
"metadata": {},
"source": [
"Testing tree command after fetch_file"
]
},
{
"cell_type": "markdown",
"id": "f608feb7",
"metadata": {},
"source": [
"Setting up BAML"
]
},
{
"cell_type": "markdown",
"id": "949493c7",
"metadata": {},
"source": [
"### BAML Setup\n",
"\n",
"Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
"- BAML is a tool for working with language models\n",
"- We need some special setup code to make it work nicely in Google Colab\n",
"- The `get_baml_client()` function will be used to interact with AI models"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3d2a355e",
"metadata": {},
"outputs": [],
"source": [
"!pip install baml-py==0.202.0 pydantic"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "50e508d6",
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"import os\n",
"\n",
"# Try to import Google Colab userdata, but don't fail if not in Colab\n",
"try:\n",
" from google.colab import userdata\n",
" IN_COLAB = True\n",
"except ImportError:\n",
" IN_COLAB = False\n",
"\n",
"def baml_generate():\n",
" try:\n",
" result = subprocess.run(\n",
" [\"baml-cli\", \"generate\"],\n",
" check=True,\n",
" capture_output=True,\n",
" text=True\n",
" )\n",
" if result.stdout:\n",
" print(\"[baml-cli generate]\\n\", result.stdout)\n",
" if result.stderr:\n",
" print(\"[baml-cli generate]\\n\", result.stderr)\n",
" except subprocess.CalledProcessError as e:\n",
" msg = (\n",
" f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
" f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
" f\"--- STDERR ---\\n{e.stderr}\"\n",
" )\n",
" raise RuntimeError(msg) from None\n",
"\n",
"def get_baml_client():\n",
" \"\"\"\n",
" a bunch of fun jank to work around the google colab import cache\n",
" \"\"\"\n",
" # Set API key from Colab secrets or environment\n",
" if IN_COLAB:\n",
" os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
" elif 'OPENAI_API_KEY' not in os.environ:\n",
" print(\"Warning: OPENAI_API_KEY not set. Please set it in your environment.\")\n",
" \n",
" baml_generate()\n",
" \n",
" # Force delete all baml_client modules from sys.modules\n",
" import sys\n",
" modules_to_delete = [key for key in sys.modules.keys() if key.startswith('baml_client')]\n",
" for module in modules_to_delete:\n",
" del sys.modules[module]\n",
" \n",
" # Now import fresh\n",
" import baml_client\n",
" return baml_client.sync_client.b\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e29841b6",
"metadata": {},
"outputs": [],
"source": [
"!baml-cli init"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a923e734",
"metadata": {},
"outputs": [],
"source": [
"# Helper function to capture BAML logs in notebook output\n",
"import os\n",
"from IPython.utils.capture import capture_output\n",
"\n",
"def run_with_baml_logs(func, *args, **kwargs):\n",
" \"\"\"Run a function and capture BAML logs in the notebook output.\"\"\"\n",
" # Capture both stdout and stderr\n",
" with capture_output() as captured:\n",
" result = func(*args, **kwargs)\n",
" \n",
" # Display the captured output\n",
" if captured.stdout:\n",
" print(captured.stdout)\n",
" if captured.stderr:\n",
" # BAML logs go to stderr - format them nicely\n",
" print(\"\\n=== BAML Logs ===\")\n",
" print(captured.stderr)\n",
" print(\"=================\\n\")\n",
" \n",
" return result\n",
"\n",
"# Set BAML log level (options: error, warn, info, debug, trace)\n",
"os.environ['BAML_LOG'] = 'info'\n"
]
},
{
"cell_type": "markdown",
"id": "10ef9f0e",
"metadata": {},
"source": [
"Fetching a BAML file"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "20691263",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "149932f4",
"metadata": {},
"outputs": [],
"source": [
"!tree -I baml_client"
]
},
{
"cell_type": "markdown",
"id": "38cb3a47",
"metadata": {},
"source": [
"The tree command above should show our file structure"
]
},
{
"cell_type": "markdown",
"id": "62eadcdf",
"metadata": {},
"source": [
"Let's fetch another file"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4e404be1",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/tool_calculator.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-tool_calculator.baml && cat baml_src/tool_calculator.baml"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e132a07",
"metadata": {},
"outputs": [],
"source": [
"!tree -I baml_client"
]
},
{
"cell_type": "markdown",
"id": "5209bcf6",
"metadata": {},
"source": [
"Now we should see both BAML files in the tree"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,21 +0,0 @@
title: "Test Tree Command After Fetch"
text: "Simple test to verify tree command works after fetch_file"
targets:
- ipynb: "./tmp_test.ipynb"
sections:
- name: test-tree
title: "Test Tree Command"
text: "Testing tree command after fetch_file"
steps:
- text: "Setting up BAML"
- baml_setup: true
- text: "Fetching a BAML file"
- fetch_file: {src: ./walkthrough/01-agent.baml, dest: baml_src/agent.baml}
- command: "!tree -I baml_client"
- text: "The tree command above should show our file structure"
- text: "Let's fetch another file"
- fetch_file: {src: ./walkthrough/02-tool_calculator.baml, dest: baml_src/tool_calculator.baml}
- command: "!tree -I baml_client"
- text: "Now we should see both BAML files in the tree"

View File

@@ -1,362 +0,0 @@
title: "Building the 12-factor agent template from scratch in Python"
text: "Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
targets:
- ipynb: "./build/workshop-2025-07-16.ipynb"
sections:
- name: hello-world
title: "Chapter 0 - Hello World"
text: "Let's start with a basic Python setup and a hello world program."
steps:
- text: |
This guide will walk you through building agents in Python with BAML.
We'll start simple with a hello world program and gradually build up to a full agent.
For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.
- text: "Here's our simple hello world program:"
- file: {src: ./walkthrough/00-main.py}
- text: "Let's run it to verify it works:"
- run_main: {regenerate_baml: false}
- name: cli-and-agent
title: "Chapter 1 - CLI and Agent Loop"
text: "Now let's add BAML and create our first agent with a CLI interface."
steps:
- text: |
In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.
## What is BAML?
BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by [BoundaryML](https://www.boundaryml.com/) (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.
### Why BAML?
- **Type-safe outputs**: Get fully type-safe outputs from LLMs, even when streaming
- **Language agnostic**: Works with Python, TypeScript, Ruby, Go, and more
- **LLM agnostic**: Works with any LLM provider (OpenAI, Anthropic, etc.)
- **Better performance**: State-of-the-art structured outputs that outperform even OpenAI's native function calling
- **Developer-friendly**: Native VSCode extension with syntax highlighting, autocomplete, and interactive playground
### Learn More
- 📚 [Official Documentation](https://docs.boundaryml.com/home)
- 💻 [GitHub Repository](https://github.com/BoundaryML/baml)
- 🎯 [What is BAML?](https://docs.boundaryml.com/guide/introduction/what-is-baml)
- 📖 [BAML Examples](https://github.com/BoundaryML/baml-examples)
- 🏢 [Company Website](https://www.boundaryml.com/)
- 📰 [Blog: AI Agents Need a New Syntax](https://www.boundaryml.com/blog/ai-agents-need-new-syntax)
BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.
### Note on Developer Experience
BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.
First, let's set up BAML support in our notebook.
- baml_setup: true
- command: "!ls baml_src"
- text: |
Now let's create our agent that will use BAML to process user input.
First, we'll define the core agent logic:
- file: {src: ./walkthrough/01-agent.py}
- text: |
Next, we need to define the BAML function that our agent will use.
### Understanding BAML Syntax
BAML files define:
- **Classes**: Structured output schemas (like `DoneForNow` below)
- **Functions**: AI-powered functions that take inputs and return structured outputs
- **Tests**: Example inputs/outputs to validate your prompts
This BAML file defines what our agent can do:
- fetch_file: {src: ./walkthrough/01-agent.baml, dest: baml_src/agent.baml}
- command: "!ls baml_src"
- text: |
Now let's create our main function that accepts a message parameter:
- file: {src: ./walkthrough/01-main.py}
- text: |
Let's test our agent! Try calling main() with different messages:
- `main("What's the weather like?")`
- `main("Tell me a joke")`
- `main("How are you doing today?")`
in this case, we'll use the baml_generate function to
generate the pydantic and python bindings from our
baml source, but in the future we'll skip this step as it
is done automatically by the get_baml_client() function
- run_main: {regenerate_baml: true, args: "Hello from the Python notebook!"}
- text: |
In a few cases, we'll enable the baml debug logs to see the inputs/outputs to and from the model.
- run_main: {regenerate_baml: false, args: "Hello from the Python notebook!", show_logs: true}
- text: |
what's most important there is that you can see the prompt and how the output_format is injected
to tell the model what kind of json we want to return.
- name: calculator-tools
title: "Chapter 2 - Add Calculator Tools"
text: "Let's add some calculator tools to our agent."
steps:
- text: |
Let's start by adding a tool definition for the calculator.
These are simple structured outputs that we'll ask the model to
return as a "next step" in the agentic loop.
- fetch_file: {src: ./walkthrough/02-tool_calculator.baml, dest: baml_src/tool_calculator.baml}
- command: "!ls baml_src"
- text: |
Now, let's update the agent's DetermineNextStep method to
expose the calculator tools as potential next steps.
- fetch_file: {src: ./walkthrough/02-agent.baml, dest: baml_src/agent.baml}
- text: |
Now let's update our main function to show the tool call:
- file: {src: ./walkthrough/02-main.py}
- text: |
Let's try out the calculator! The agent should recognize that you want to perform a calculation
and return the appropriate tool call instead of just a message.
- run_main: {regenerate_baml: false, args: "can you add 3 and 4"}
- name: tool-loop
title: "Chapter 3 - Process Tool Calls in a Loop"
text: "Now let's add a real agentic loop that can run the tools and get a final answer from the LLM."
steps:
- text: |
In this chapter, we'll enhance our agent to process tool calls in a loop. This means:
- The agent can call multiple tools in sequence
- Each tool result is fed back to the agent
- The agent continues until it has a final answer
Let's update our agent to handle tool calls properly:
- file: {src: ./walkthrough/03-agent.py}
- text: |
Now let's update our main function to use the new agent loop:
- file: {src: ./walkthrough/03-main.py}
- text: |
Let's try it out! The agent should now call the tool and return the calculated result:
- run_main: {regenerate_baml: false, args: "can you add 3 and 4"}
- text: |
you can run with baml_logs enabled to see how the prompt changed when we added the New
tool types to our union of response types.
- run_main: {regenerate_baml: false, args: "can you add 3 and 4", show_logs: true}
- text: |
You should see the agent:
1. Recognize it needs to use the add tool
2. Call the tool with the correct parameters
3. Get the result (7)
4. Generate a final response incorporating the result
For more complex calculations, we need to handle all calculator operations. Let's add support for subtract, multiply, and divide:
- file: {src: ./walkthrough/03b-agent.py}
- text: |
Now let's test subtraction:
- run_main: {regenerate_baml: false, args: "can you subtract 3 from 4"}
- text: |
Test multiplication:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and 4"}
- text: |
Finally, let's test a complex multi-step calculation:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result"}
- text: |
Congratulations! You've taken your first step into hand-rolling an agent loop.
Key concepts you've learned:
- **Thread Management**: Tracking conversation history and tool calls
- **Tool Execution**: Processing different tool types and returning results
- **Agent Loop**: Continuing until the agent has a final answer
From here, we'll start incorporating more intermediate and advanced concepts for 12-factor agents.
- name: baml-tests
title: "Chapter 4 - Add Tests to agent.baml"
text: "Let's add some tests to our BAML agent."
steps:
- text: |
In this chapter, we'll learn about BAML testing - a powerful feature that helps ensure your agents behave correctly.
## Why Test BAML Functions?
- **Catch regressions**: Ensure changes don't break existing behavior
- **Document behavior**: Tests serve as living documentation
- **Validate edge cases**: Test complex scenarios and conversation flows
- **CI/CD integration**: Run tests automatically in your pipeline
Let's start with a simple test that checks the agent's ability to handle basic interactions:
- fetch_file: {src: ./walkthrough/04-agent.baml, dest: baml_src/agent.baml}
- text: |
Run the tests to see them in action:
- command: "!baml-cli test"
- text: |
Now let's improve the tests with assertions! Assertions let you verify specific properties of the agent's output.
## BAML Assertion Syntax
Assertions use the `@@assert` directive:
```
@@assert(name, {{condition}})
```
- `name`: A descriptive name for the assertion
- `condition`: A boolean expression using `this` to access the output
- fetch_file: {src: ./walkthrough/04b-agent.baml, dest: baml_src/agent.baml}
- text: |
Run the tests again to see assertions in action:
- command: "!baml-cli test"
- text: |
Finally, let's add more complex test cases that test multi-step conversations.
These tests simulate an entire conversation flow, including:
- User input
- Tool calls made by the agent
- Tool responses
- Final agent response
- fetch_file: {src: ./walkthrough/04c-agent.baml, dest: baml_src/agent.baml}
- text: |
Run the comprehensive test suite:
- command: "!baml-cli test"
- text: |
## Key Testing Concepts
1. **Test Structure**: Each test specifies functions, arguments, and assertions
2. **Progressive Testing**: Start simple, then test complex scenarios
3. **Conversation History**: Test how the agent handles multi-turn conversations
4. **Tool Integration**: Verify the agent correctly uses tools in sequence
With these tests in place, you can confidently modify your agent knowing that core functionality is protected by automated tests!
- name: human-tools
title: "Chapter 5 - Multiple Human Tools"
text: |
In this section, we'll add support for multiple tools that serve to contact humans.
steps:
- text: |
So far, our agent only returns a final answer with "done_for_now". But what if the agent needs clarification?
Let's add a new tool that allows the agent to request more information from the user.
## Why Human-in-the-Loop?
- **Handle ambiguous inputs**: When user input is unclear or contains typos
- **Request missing information**: When the agent needs more context
- **Confirm sensitive operations**: Before performing important actions
- **Interactive workflows**: Build conversational agents that engage users
First, let's update our BAML file to include a ClarificationRequest tool:
- fetch_file: {src: ./walkthrough/05-agent.baml, dest: baml_src/agent.baml}
- text: |
Now let's update our agent to handle clarification requests:
- file: {src: ./walkthrough/05-agent.py}
- text: |
Finally, let's create a main function that handles human interaction:
- file: {src: ./walkthrough/05-main.py}
- text: |
Let's test with an ambiguous input that should trigger a clarification request:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and FD*(#F&&"}
- text: |
You should see:
1. The agent recognizes the input is unclear
2. It asks for clarification
3. In Colab, you'll be prompted to type a response
4. In local testing, an auto-response is provided
5. The agent continues with the clarified input
## Interactive Testing in Colab
When running in Google Colab, the `input()` function will create an interactive text box where you can type your response. Try different clarifications to see how the agent adapts!
## Key Concepts
- **Human Tools**: Special tool types that return control to the human
- **Conversation Flow**: The agent can pause execution to get human input
- **Context Preservation**: The full conversation history is maintained
- **Flexible Handling**: Different behaviors for different environments
- name: customize-prompt
title: "Chapter 6 - Customize Your Prompt with Reasoning"
text: |
In this section, we'll explore how to customize the prompt of the agent with reasoning steps.
This is core to [factor 2 - own your prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md)
steps:
- text: |
## Why Add Reasoning to Prompts?
Adding explicit reasoning steps to your prompts can significantly improve agent performance:
- **Better decisions**: The model thinks through problems step-by-step
- **Transparency**: You can see the model's thought process
- **Fewer errors**: Structured thinking reduces mistakes
- **Debugging**: Easier to identify where reasoning went wrong
Let's update our agent prompt to include a reasoning step:
- fetch_file: {src: ./walkthrough/06-agent.baml, dest: baml_src/agent.baml}
- text: |
Now let's test it with a simple calculation to see the reasoning in action:
**Note:** The BAML logs below will show the model's reasoning steps. Look for the `<reasoning>` tags in the logs to see how the model thinks through the problem before deciding what to do.
- run_main: {args: "can you multiply 3 and 4", show_logs: true}
- text: |
You should see the reasoning steps in the BAML logs above. The model explicitly thinks through what it needs to do before making a decision.
💡 **Tip:** If you want to see BAML logs for any other calls in this notebook, you can use the `run_with_baml_logs` helper function:
```python
# Instead of: main("your message")
# Use: run_with_baml_logs(main, "your message")
```
## Advanced Prompt Engineering
You can enhance your prompts further by:
- Adding specific reasoning templates for different tasks
- Including examples of good reasoning
- Structuring the reasoning with numbered steps
- Adding checks for common mistakes
The key is to guide the model's thinking process while still allowing flexibility.
- name: context-window
title: "Chapter 7 - Customize Your Context Window"
text: |
In this section, we'll explore how to customize the context window of the agent.
This is core to [factor 3 - own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md)
steps:
- text: |
## Context Window Serialization
How you format your conversation history can significantly impact:
- **Token usage**: Some formats are more efficient
- **Model understanding**: Clear structure helps the model
- **Debugging**: Readable formats help development
Let's implement two serialization formats: pretty-printed JSON and XML.
- file: {src: ./walkthrough/07-agent.py}
- text: |
Now let's create a main function that can switch between formats:
- file: {src: ./walkthrough/07-main.py}
- text: |
Let's test with JSON format first:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and 4, then divide the result by 2", kwargs: {use_xml: false}}
- text: |
Now let's try the same with XML format:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and 4, then divide the result by 2", kwargs: {use_xml: true}}
- text: |
## XML vs JSON Trade-offs
**XML Benefits**:
- More token-efficient for nested data
- Clear hierarchy with opening/closing tags
- Better for long conversations
**JSON Benefits**:
- Familiar to most developers
- Easy to parse and debug
- Native to JavaScript/Python
Choose based on your specific needs and token constraints!

View File

@@ -1,771 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9a5274e1",
"metadata": {},
"source": [
"# Building the 12-factor agent template from scratch in Python"
]
},
{
"cell_type": "markdown",
"id": "6a6efe20",
"metadata": {},
"source": [
"Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
]
},
{
"cell_type": "markdown",
"id": "f8c0592e",
"metadata": {},
"source": [
"## Chapter 0 - Hello World"
]
},
{
"cell_type": "markdown",
"id": "d0e804de",
"metadata": {},
"source": [
"Let's start with a basic Python setup and a hello world program."
]
},
{
"cell_type": "markdown",
"id": "083841c5",
"metadata": {},
"source": [
"This guide will walk you through building agents in Python with BAML.\n",
"\n",
"We'll start simple with a hello world program and gradually build up to a full agent.\n",
"\n",
"For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
]
},
{
"cell_type": "markdown",
"id": "627ee046",
"metadata": {},
"source": [
"Here's our simple hello world program:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eaeb2d23",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/00-main.py\n",
"def hello():\n",
" print('hello, world!')\n",
"\n",
"def main():\n",
" hello()"
]
},
{
"cell_type": "markdown",
"id": "fb5293e9",
"metadata": {},
"source": [
"Let's run it to verify it works:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4fedee3c",
"metadata": {},
"outputs": [],
"source": [
"main()"
]
},
{
"cell_type": "markdown",
"id": "974d58b8",
"metadata": {},
"source": [
"## Chapter 1 - CLI and Agent Loop"
]
},
{
"cell_type": "markdown",
"id": "6fdcf8bb",
"metadata": {},
"source": [
"Now let's add BAML and create our first agent with a CLI interface."
]
},
{
"cell_type": "markdown",
"id": "7aad128d",
"metadata": {},
"source": [
"In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.\n",
"\n",
"## What is BAML?\n",
"\n",
"BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by [BoundaryML](https://www.boundaryml.com/) (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.\n",
"\n",
"### Why BAML?\n",
"\n",
"- **Type-safe outputs**: Get fully type-safe outputs from LLMs, even when streaming\n",
"- **Language agnostic**: Works with Python, TypeScript, Ruby, Go, and more\n",
"- **LLM agnostic**: Works with any LLM provider (OpenAI, Anthropic, etc.)\n",
"- **Better performance**: State-of-the-art structured outputs that outperform even OpenAI's native function calling\n",
"- **Developer-friendly**: Native VSCode extension with syntax highlighting, autocomplete, and interactive playground\n",
"\n",
"### Learn More\n",
"\n",
"- 📚 [Official Documentation](https://docs.boundaryml.com/home)\n",
"- 💻 [GitHub Repository](https://github.com/BoundaryML/baml)\n",
"- 🎯 [What is BAML?](https://docs.boundaryml.com/guide/introduction/what-is-baml)\n",
"- 📖 [BAML Examples](https://github.com/BoundaryML/baml-examples)\n",
"- 🏢 [Company Website](https://www.boundaryml.com/)\n",
"- 📰 [Blog: AI Agents Need a New Syntax](https://www.boundaryml.com/blog/ai-agents-need-new-syntax)\n",
"\n",
"BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.\n",
"\n",
"### Note on Developer Experience\n",
"\n",
"BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.\n",
"\n",
"First, let's set up BAML support in our notebook.\n"
]
},
{
"cell_type": "markdown",
"id": "9f17a460",
"metadata": {},
"source": [
"### BAML Setup\n",
"\n",
"Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
"- BAML is a tool for working with language models\n",
"- We need some special setup code to make it work nicely in Google Colab\n",
"- The `get_baml_client()` function will be used to interact with AI models"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa2481ab",
"metadata": {},
"outputs": [],
"source": [
"!pip install baml-py==0.202.0 pydantic"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d6dbd418",
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"import os\n",
"\n",
"# Try to import Google Colab userdata, but don't fail if not in Colab\n",
"try:\n",
" from google.colab import userdata\n",
" IN_COLAB = True\n",
"except ImportError:\n",
" IN_COLAB = False\n",
"\n",
"def baml_generate():\n",
" try:\n",
" result = subprocess.run(\n",
" [\"baml-cli\", \"generate\"],\n",
" check=True,\n",
" capture_output=True,\n",
" text=True\n",
" )\n",
" if result.stdout:\n",
" print(\"[baml-cli generate]\\n\", result.stdout)\n",
" if result.stderr:\n",
" print(\"[baml-cli generate]\\n\", result.stderr)\n",
" except subprocess.CalledProcessError as e:\n",
" msg = (\n",
" f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
" f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
" f\"--- STDERR ---\\n{e.stderr}\"\n",
" )\n",
" raise RuntimeError(msg) from None\n",
"\n",
"def get_baml_client():\n",
" \"\"\"\n",
" a bunch of fun jank to work around the google colab import cache\n",
" \"\"\"\n",
" # Set API key from Colab secrets or environment\n",
" if IN_COLAB:\n",
" os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
" elif 'OPENAI_API_KEY' not in os.environ:\n",
" print(\"Warning: OPENAI_API_KEY not set. Please set it in your environment.\")\n",
" \n",
" baml_generate()\n",
" \n",
" # Force delete all baml_client modules from sys.modules\n",
" import sys\n",
" modules_to_delete = [key for key in sys.modules.keys() if key.startswith('baml_client')]\n",
" for module in modules_to_delete:\n",
" del sys.modules[module]\n",
" \n",
" # Now import fresh\n",
" import baml_client\n",
" return baml_client.sync_client.b\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b25f5c0c",
"metadata": {},
"outputs": [],
"source": [
"!baml-cli init"
]
},
{
"cell_type": "markdown",
"id": "64eb1d9b",
"metadata": {},
"source": [
"Now let's create our agent that will use BAML to process user input.\n",
"\n",
"First, we'll define the core agent logic:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2488f695",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-agent.py\n",
"import json\n",
"from typing import Dict, Any, List\n",
"\n",
"# tool call or a respond to human tool\n",
"AgentResponse = Any # This will be the return type from b.DetermineNextStep\n",
"\n",
"class Event:\n",
" def __init__(self, type: str, data: Any):\n",
" self.type = type\n",
" self.data = data\n",
"\n",
"class Thread:\n",
" def __init__(self, events: List[Dict[str, Any]]):\n",
" self.events = events\n",
" \n",
" def serialize_for_llm(self):\n",
" # can change this to whatever custom serialization you want to do, XML, etc\n",
" # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
" return json.dumps(self.events)\n",
"\n",
"# right now this just runs one turn with the LLM, but\n",
"# we'll update this function to handle all the agent logic\n",
"def agent_loop(thread: Thread) -> AgentResponse:\n",
" b = get_baml_client() # This will be defined by the BAML setup\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" return next_step"
]
},
{
"cell_type": "markdown",
"id": "6c0c0588",
"metadata": {},
"source": [
"Next, we need to define the BAML function that our agent will use.\n",
"\n",
"### Understanding BAML Syntax\n",
"\n",
"BAML files define:\n",
"- **Classes**: Structured output schemas (like `DoneForNow` below)\n",
"- **Functions**: AI-powered functions that take inputs and return structured outputs\n",
"- **Tests**: Example inputs/outputs to validate your prompts\n",
"\n",
"This BAML file defines what our agent can do:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a5462c6b",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "9ff08812",
"metadata": {},
"source": [
"Now let's create our main function that accepts a message parameter:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7c49c77",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message as the initial event\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Run the agent loop with the thread\n",
" result = agent_loop(thread)\n",
" print(result)"
]
},
{
"cell_type": "markdown",
"id": "ad905bd3",
"metadata": {},
"source": [
"Let's test our agent! Try calling main() with different messages:\n",
"- `main(\"What's the weather like?\")`\n",
"- `main(\"Tell me a joke\")`\n",
"- `main(\"How are you doing today?\")`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6211ff4e",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f41dba2f",
"metadata": {},
"outputs": [],
"source": [
"main(\"Hello from the Python notebook!\")"
]
},
{
"cell_type": "markdown",
"id": "81a15b53",
"metadata": {},
"source": [
"## Chapter 2 - Add Calculator Tools"
]
},
{
"cell_type": "markdown",
"id": "414016d0",
"metadata": {},
"source": [
"Let's add some calculator tools to our agent."
]
},
{
"cell_type": "markdown",
"id": "758fbf1b",
"metadata": {},
"source": [
"Let's start by adding a tool definition for the calculator.\n",
"\n",
"These are simple structured outputs that we'll ask the model to\n",
"return as a \"next step\" in the agentic loop.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21a94991",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/tool_calculator.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-tool_calculator.baml && cat baml_src/tool_calculator.baml"
]
},
{
"cell_type": "markdown",
"id": "85436def",
"metadata": {},
"source": [
"Now, let's update the agent's DetermineNextStep method to\n",
"expose the calculator tools as potential next steps.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "75b9c2f0",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "78ba8da1",
"metadata": {},
"source": [
"Now let's update our main function to show the tool call:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "26aa645f",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/02-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Get BAML client\n",
" b = get_baml_client()\n",
" \n",
" # Get the next step from the agent - just show the tool call\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" \n",
" # Print the raw response to show the tool call\n",
" print(next_step)"
]
},
{
"cell_type": "markdown",
"id": "a0e9730a",
"metadata": {},
"source": [
"Let's try out the calculator! The agent should recognize that you want to perform a calculation\n",
"and return the appropriate tool call instead of just a message.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19902e4d",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "01bf191e",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you add 3 and 4\")"
]
},
{
"cell_type": "markdown",
"id": "c2485c13",
"metadata": {},
"source": [
"## Chapter 3 - Process Tool Calls in a Loop"
]
},
{
"cell_type": "markdown",
"id": "dc99f949",
"metadata": {},
"source": [
"Now let's add a real agentic loop that can run the tools and get a final answer from the LLM."
]
},
{
"cell_type": "markdown",
"id": "fe9a3718",
"metadata": {},
"source": [
"In this chapter, we'll enhance our agent to process tool calls in a loop. This means:\n",
"- The agent can call multiple tools in sequence\n",
"- Each tool result is fed back to the agent\n",
"- The agent continues until it has a final answer\n",
"\n",
"Let's update our agent to handle tool calls properly:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f279921",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/03-agent.py\n",
"import json\n",
"from typing import Dict, Any, List\n",
"\n",
"class Thread:\n",
" def __init__(self, events: List[Dict[str, Any]]):\n",
" self.events = events\n",
" \n",
" def serialize_for_llm(self):\n",
" # can change this to whatever custom serialization you want to do, XML, etc\n",
" # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
" return json.dumps(self.events)\n",
"\n",
"\n",
"def agent_loop(thread: Thread) -> str:\n",
" b = get_baml_client()\n",
" \n",
" while True:\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" print(\"nextStep\", next_step)\n",
" \n",
" if next_step.intent == \"done_for_now\":\n",
" # response to human, return the next step object\n",
" return next_step.message\n",
" elif next_step.intent == \"add\":\n",
" thread.events.append({\n",
" \"type\": \"tool_call\",\n",
" \"data\": next_step.__dict__\n",
" })\n",
" result = next_step.a + next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" continue\n",
" else:\n",
" raise ValueError(f\"Unknown intent: {next_step.intent}\")"
]
},
{
"cell_type": "markdown",
"id": "3457e09f",
"metadata": {},
"source": [
"Now let's update our main function to use the new agent loop:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "92cc1194",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/03-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Run the agent loop with full tool handling\n",
" result = agent_loop(thread)\n",
" \n",
" # Print the final response\n",
" print(f\"\\nFinal response: {result}\")"
]
},
{
"cell_type": "markdown",
"id": "8f4f81e1",
"metadata": {},
"source": [
"Let's try it out! The agent should now call the tool and return the calculated result:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "71596095",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "192f038e",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you add 3 and 4\")"
]
},
{
"cell_type": "markdown",
"id": "c26d30b2",
"metadata": {},
"source": [
"You should see the agent:\n",
"1. Recognize it needs to use the add tool\n",
"2. Call the tool with the correct parameters\n",
"3. Get the result (7)\n",
"4. Generate a final response incorporating the result\n",
"\n",
"For more complex calculations, we need to handle all calculator operations. Let's add support for subtract, multiply, and divide:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c612395e",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/03b-agent.py\n",
"import json\n",
"from typing import Dict, Any, List, Union\n",
"\n",
"class Thread:\n",
" def __init__(self, events: List[Dict[str, Any]]):\n",
" self.events = events\n",
" \n",
" def serialize_for_llm(self):\n",
" # can change this to whatever custom serialization you want to do, XML, etc\n",
" # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
" return json.dumps(self.events)\n",
"\n",
"def handle_next_step(next_step, thread: Thread) -> Thread:\n",
" result: float\n",
" \n",
" if next_step.intent == \"add\":\n",
" result = next_step.a + next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" return thread\n",
" elif next_step.intent == \"subtract\":\n",
" result = next_step.a - next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" return thread\n",
" elif next_step.intent == \"multiply\":\n",
" result = next_step.a * next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" return thread\n",
" elif next_step.intent == \"divide\":\n",
" result = next_step.a / next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" return thread\n",
"\n",
"def agent_loop(thread: Thread) -> str:\n",
" b = get_baml_client()\n",
" \n",
" while True:\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" print(\"nextStep\", next_step)\n",
" \n",
" thread.events.append({\n",
" \"type\": \"tool_call\",\n",
" \"data\": next_step.__dict__\n",
" })\n",
" \n",
" if next_step.intent == \"done_for_now\":\n",
" # response to human, return the next step object\n",
" return next_step.message\n",
" elif next_step.intent in [\"add\", \"subtract\", \"multiply\", \"divide\"]:\n",
" thread = handle_next_step(next_step, thread)"
]
},
{
"cell_type": "markdown",
"id": "f2c066cb",
"metadata": {},
"source": [
"Now let's test subtraction:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d21dad8c",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you subtract 3 from 4\")"
]
},
{
"cell_type": "markdown",
"id": "69fc9590",
"metadata": {},
"source": [
"Test multiplication:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ae1bf622",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you multiply 3 and 4\")"
]
},
{
"cell_type": "markdown",
"id": "dada4d98",
"metadata": {},
"source": [
"Finally, let's test a complex multi-step calculation:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f29ef37",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result\")"
]
},
{
"cell_type": "markdown",
"id": "38a60e47",
"metadata": {},
"source": [
"Congratulations! You've taken your first step into hand-rolling an agent loop.\n",
"\n",
"Key concepts you've learned:\n",
"- **Thread Management**: Tracking conversation history and tool calls\n",
"- **Tool Execution**: Processing different tool types and returning results\n",
"- **Agent Loop**: Continuing until the agent has a final answer\n",
"\n",
"From here, we'll start incorporating more intermediate and advanced concepts for 12-factor agents."
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,935 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "173dc42f",
"metadata": {},
"source": [
"# Building the 12-factor agent template from scratch in Python"
]
},
{
"cell_type": "markdown",
"id": "add0a779",
"metadata": {},
"source": [
"Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
]
},
{
"cell_type": "markdown",
"id": "6a8df8d6",
"metadata": {},
"source": [
"## Chapter 0 - Hello World"
]
},
{
"cell_type": "markdown",
"id": "15b19657",
"metadata": {},
"source": [
"Let's start with a basic Python setup and a hello world program."
]
},
{
"cell_type": "markdown",
"id": "251134f3",
"metadata": {},
"source": [
"This guide will walk you through building agents in Python with BAML.\n",
"\n",
"We'll start simple with a hello world program and gradually build up to a full agent.\n",
"\n",
"For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.\n"
]
},
{
"cell_type": "markdown",
"id": "bf3fd22d",
"metadata": {},
"source": [
"Here's our simple hello world program:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da14ddcf",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/00-main.py\n",
"def hello():\n",
" print('hello, world!')\n",
"\n",
"def main():\n",
" hello()"
]
},
{
"cell_type": "markdown",
"id": "9cf83cf4",
"metadata": {},
"source": [
"Let's run it to verify it works:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "080218d1",
"metadata": {},
"outputs": [],
"source": [
"main()"
]
},
{
"cell_type": "markdown",
"id": "e7dc6b44",
"metadata": {},
"source": [
"## Chapter 1 - CLI and Agent Loop"
]
},
{
"cell_type": "markdown",
"id": "87a82a6f",
"metadata": {},
"source": [
"Now let's add BAML and create our first agent with a CLI interface."
]
},
{
"cell_type": "markdown",
"id": "fd5af290",
"metadata": {},
"source": [
"In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.\n",
"\n",
"## What is BAML?\n",
"\n",
"BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by [BoundaryML](https://www.boundaryml.com/) (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.\n",
"\n",
"### Why BAML?\n",
"\n",
"- **Type-safe outputs**: Get fully type-safe outputs from LLMs, even when streaming\n",
"- **Language agnostic**: Works with Python, TypeScript, Ruby, Go, and more\n",
"- **LLM agnostic**: Works with any LLM provider (OpenAI, Anthropic, etc.)\n",
"- **Better performance**: State-of-the-art structured outputs that outperform even OpenAI's native function calling\n",
"- **Developer-friendly**: Native VSCode extension with syntax highlighting, autocomplete, and interactive playground\n",
"\n",
"### Learn More\n",
"\n",
"- 📚 [Official Documentation](https://docs.boundaryml.com/home)\n",
"- 💻 [GitHub Repository](https://github.com/BoundaryML/baml)\n",
"- 🎯 [What is BAML?](https://docs.boundaryml.com/guide/introduction/what-is-baml)\n",
"- 📖 [BAML Examples](https://github.com/BoundaryML/baml-examples)\n",
"- 🏢 [Company Website](https://www.boundaryml.com/)\n",
"- 📰 [Blog: AI Agents Need a New Syntax](https://www.boundaryml.com/blog/ai-agents-need-new-syntax)\n",
"\n",
"BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.\n",
"\n",
"### Note on Developer Experience\n",
"\n",
"BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.\n",
"\n",
"First, let's set up BAML support in our notebook.\n"
]
},
{
"cell_type": "markdown",
"id": "b1dd0665",
"metadata": {},
"source": [
"### BAML Setup\n",
"\n",
"Don't worry too much about this setup code - it will make sense later! For now, just know that:\n",
"- BAML is a tool for working with language models\n",
"- We need some special setup code to make it work nicely in Google Colab\n",
"- The `get_baml_client()` function will be used to interact with AI models"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6df0dc4a",
"metadata": {},
"outputs": [],
"source": [
"!pip install baml-py==0.202.0 pydantic"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "01121d4c",
"metadata": {},
"outputs": [],
"source": [
"import subprocess\n",
"import os\n",
"\n",
"# Try to import Google Colab userdata, but don't fail if not in Colab\n",
"try:\n",
" from google.colab import userdata\n",
" IN_COLAB = True\n",
"except ImportError:\n",
" IN_COLAB = False\n",
"\n",
"def baml_generate():\n",
" try:\n",
" result = subprocess.run(\n",
" [\"baml-cli\", \"generate\"],\n",
" check=True,\n",
" capture_output=True,\n",
" text=True\n",
" )\n",
" if result.stdout:\n",
" print(\"[baml-cli generate]\\n\", result.stdout)\n",
" if result.stderr:\n",
" print(\"[baml-cli generate]\\n\", result.stderr)\n",
" except subprocess.CalledProcessError as e:\n",
" msg = (\n",
" f\"`baml-cli generate` failed with exit code {e.returncode}\\n\"\n",
" f\"--- STDOUT ---\\n{e.stdout}\\n\"\n",
" f\"--- STDERR ---\\n{e.stderr}\"\n",
" )\n",
" raise RuntimeError(msg) from None\n",
"\n",
"def get_baml_client():\n",
" \"\"\"\n",
" a bunch of fun jank to work around the google colab import cache\n",
" \"\"\"\n",
" # Set API key from Colab secrets or environment\n",
" if IN_COLAB:\n",
" os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')\n",
" elif 'OPENAI_API_KEY' not in os.environ:\n",
" print(\"Warning: OPENAI_API_KEY not set. Please set it in your environment.\")\n",
" \n",
" baml_generate()\n",
" \n",
" # Force delete all baml_client modules from sys.modules\n",
" import sys\n",
" modules_to_delete = [key for key in sys.modules.keys() if key.startswith('baml_client')]\n",
" for module in modules_to_delete:\n",
" del sys.modules[module]\n",
" \n",
" # Now import fresh\n",
" import baml_client\n",
" return baml_client.sync_client.b\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1c79b87",
"metadata": {},
"outputs": [],
"source": [
"!baml-cli init"
]
},
{
"cell_type": "markdown",
"id": "e4bd63c3",
"metadata": {},
"source": [
"Now let's create our agent that will use BAML to process user input.\n",
"\n",
"First, we'll define the core agent logic:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0e0617d2",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-agent.py\n",
"import json\n",
"from typing import Dict, Any, List\n",
"\n",
"# tool call or a respond to human tool\n",
"AgentResponse = Any # This will be the return type from b.DetermineNextStep\n",
"\n",
"class Event:\n",
" def __init__(self, type: str, data: Any):\n",
" self.type = type\n",
" self.data = data\n",
"\n",
"class Thread:\n",
" def __init__(self, events: List[Dict[str, Any]]):\n",
" self.events = events\n",
" \n",
" def serialize_for_llm(self):\n",
" # can change this to whatever custom serialization you want to do, XML, etc\n",
" # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
" return json.dumps(self.events)\n",
"\n",
"# right now this just runs one turn with the LLM, but\n",
"# we'll update this function to handle all the agent logic\n",
"def agent_loop(thread: Thread) -> AgentResponse:\n",
" b = get_baml_client() # This will be defined by the BAML setup\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" return next_step"
]
},
{
"cell_type": "markdown",
"id": "6aa5e4fd",
"metadata": {},
"source": [
"Next, we need to define the BAML function that our agent will use.\n",
"\n",
"### Understanding BAML Syntax\n",
"\n",
"BAML files define:\n",
"- **Classes**: Structured output schemas (like `DoneForNow` below)\n",
"- **Functions**: AI-powered functions that take inputs and return structured outputs\n",
"- **Tests**: Example inputs/outputs to validate your prompts\n",
"\n",
"This BAML file defines what our agent can do:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "441ee4dc",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/01-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "a6d985dc",
"metadata": {},
"source": [
"Now let's create our main function that accepts a message parameter:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0c715dc1",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/01-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message as the initial event\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Run the agent loop with the thread\n",
" result = agent_loop(thread)\n",
" print(result)"
]
},
{
"cell_type": "markdown",
"id": "407bcd47",
"metadata": {},
"source": [
"Let's test our agent! Try calling main() with different messages:\n",
"- `main(\"What's the weather like?\")`\n",
"- `main(\"Tell me a joke\")`\n",
"- `main(\"How are you doing today?\")`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "451d4f8f",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1a99ef71",
"metadata": {},
"outputs": [],
"source": [
"main(\"Hello from the Python notebook!\")"
]
},
{
"cell_type": "markdown",
"id": "e46ec89d",
"metadata": {},
"source": [
"## Chapter 2 - Add Calculator Tools"
]
},
{
"cell_type": "markdown",
"id": "7861d1a8",
"metadata": {},
"source": [
"Let's add some calculator tools to our agent."
]
},
{
"cell_type": "markdown",
"id": "16f65463",
"metadata": {},
"source": [
"Let's start by adding a tool definition for the calculator.\n",
"\n",
"These are simple structured outputs that we'll ask the model to\n",
"return as a \"next step\" in the agentic loop.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9dc2301b",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/tool_calculator.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-tool_calculator.baml && cat baml_src/tool_calculator.baml"
]
},
{
"cell_type": "markdown",
"id": "a0289131",
"metadata": {},
"source": [
"Now, let's update the agent's DetermineNextStep method to\n",
"expose the calculator tools as potential next steps.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf1893ce",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/02-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "a062bc68",
"metadata": {},
"source": [
"Now let's update our main function to show the tool call:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e4368aa4",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/02-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Get BAML client\n",
" b = get_baml_client()\n",
" \n",
" # Get the next step from the agent - just show the tool call\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" \n",
" # Print the raw response to show the tool call\n",
" print(next_step)"
]
},
{
"cell_type": "markdown",
"id": "251c9ec9",
"metadata": {},
"source": [
"Let's try out the calculator! The agent should recognize that you want to perform a calculation\n",
"and return the appropriate tool call instead of just a message.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "255fcb36",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2b8da6f",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you add 3 and 4\")"
]
},
{
"cell_type": "markdown",
"id": "35a95a6f",
"metadata": {},
"source": [
"## Chapter 3 - Process Tool Calls in a Loop"
]
},
{
"cell_type": "markdown",
"id": "7950fb6c",
"metadata": {},
"source": [
"Now let's add a real agentic loop that can run the tools and get a final answer from the LLM."
]
},
{
"cell_type": "markdown",
"id": "353a9a2c",
"metadata": {},
"source": [
"In this chapter, we'll enhance our agent to process tool calls in a loop. This means:\n",
"- The agent can call multiple tools in sequence\n",
"- Each tool result is fed back to the agent\n",
"- The agent continues until it has a final answer\n",
"\n",
"Let's update our agent to handle tool calls properly:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3d7643e",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/03-agent.py\n",
"import json\n",
"from typing import Dict, Any, List\n",
"\n",
"class Thread:\n",
" def __init__(self, events: List[Dict[str, Any]]):\n",
" self.events = events\n",
" \n",
" def serialize_for_llm(self):\n",
" # can change this to whatever custom serialization you want to do, XML, etc\n",
" # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
" return json.dumps(self.events)\n",
"\n",
"\n",
"def agent_loop(thread: Thread) -> str:\n",
" b = get_baml_client()\n",
" \n",
" while True:\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" print(\"nextStep\", next_step)\n",
" \n",
" if next_step.intent == \"done_for_now\":\n",
" # response to human, return the next step object\n",
" return next_step.message\n",
" elif next_step.intent == \"add\":\n",
" thread.events.append({\n",
" \"type\": \"tool_call\",\n",
" \"data\": next_step.__dict__\n",
" })\n",
" result = next_step.a + next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" continue\n",
" else:\n",
" raise ValueError(f\"Unknown intent: {next_step.intent}\")"
]
},
{
"cell_type": "markdown",
"id": "a88ac604",
"metadata": {},
"source": [
"Now let's update our main function to use the new agent loop:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a6ca94b",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/03-main.py\n",
"def main(message=\"hello from the notebook!\"):\n",
" # Create a new thread with the user's message\n",
" thread = Thread([{\"type\": \"user_input\", \"data\": message}])\n",
" \n",
" # Run the agent loop with full tool handling\n",
" result = agent_loop(thread)\n",
" \n",
" # Print the final response\n",
" print(f\"\\nFinal response: {result}\")"
]
},
{
"cell_type": "markdown",
"id": "296ad48e",
"metadata": {},
"source": [
"Let's try it out! The agent should now call the tool and return the calculated result:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d1491750",
"metadata": {},
"outputs": [],
"source": [
"baml_generate()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "db0ead36",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you add 3 and 4\")"
]
},
{
"cell_type": "markdown",
"id": "a98ecceb",
"metadata": {},
"source": [
"You should see the agent:\n",
"1. Recognize it needs to use the add tool\n",
"2. Call the tool with the correct parameters\n",
"3. Get the result (7)\n",
"4. Generate a final response incorporating the result\n",
"\n",
"For more complex calculations, we need to handle all calculator operations. Let's add support for subtract, multiply, and divide:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1c84079",
"metadata": {},
"outputs": [],
"source": [
"# ./walkthrough/03b-agent.py\n",
"import json\n",
"from typing import Dict, Any, List, Union\n",
"\n",
"class Thread:\n",
" def __init__(self, events: List[Dict[str, Any]]):\n",
" self.events = events\n",
" \n",
" def serialize_for_llm(self):\n",
" # can change this to whatever custom serialization you want to do, XML, etc\n",
" # e.g. https://github.com/got-agents/agents/blob/59ebbfa236fc376618f16ee08eb0f3bf7b698892/linear-assistant-ts/src/agent.ts#L66-L105\n",
" return json.dumps(self.events)\n",
"\n",
"def handle_next_step(next_step, thread: Thread) -> Thread:\n",
" result: float\n",
" \n",
" if next_step.intent == \"add\":\n",
" result = next_step.a + next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" return thread\n",
" elif next_step.intent == \"subtract\":\n",
" result = next_step.a - next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" return thread\n",
" elif next_step.intent == \"multiply\":\n",
" result = next_step.a * next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" return thread\n",
" elif next_step.intent == \"divide\":\n",
" result = next_step.a / next_step.b\n",
" print(\"tool_response\", result)\n",
" thread.events.append({\n",
" \"type\": \"tool_response\",\n",
" \"data\": result\n",
" })\n",
" return thread\n",
"\n",
"def agent_loop(thread: Thread) -> str:\n",
" b = get_baml_client()\n",
" \n",
" while True:\n",
" next_step = b.DetermineNextStep(thread.serialize_for_llm())\n",
" print(\"nextStep\", next_step)\n",
" \n",
" thread.events.append({\n",
" \"type\": \"tool_call\",\n",
" \"data\": next_step.__dict__\n",
" })\n",
" \n",
" if next_step.intent == \"done_for_now\":\n",
" # response to human, return the next step object\n",
" return next_step.message\n",
" elif next_step.intent in [\"add\", \"subtract\", \"multiply\", \"divide\"]:\n",
" thread = handle_next_step(next_step, thread)"
]
},
{
"cell_type": "markdown",
"id": "97d6432d",
"metadata": {},
"source": [
"Now let's test subtraction:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6bd66f9f",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you subtract 3 from 4\")"
]
},
{
"cell_type": "markdown",
"id": "bf2fe3b5",
"metadata": {},
"source": [
"Test multiplication:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b6dc9442",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you multiply 3 and 4\")"
]
},
{
"cell_type": "markdown",
"id": "cf4b333c",
"metadata": {},
"source": [
"Finally, let's test a complex multi-step calculation:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "669e7673",
"metadata": {},
"outputs": [],
"source": [
"main(\"can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result\")"
]
},
{
"cell_type": "markdown",
"id": "7d942e63",
"metadata": {},
"source": [
"Congratulations! You've taken your first step into hand-rolling an agent loop.\n",
"\n",
"Key concepts you've learned:\n",
"- **Thread Management**: Tracking conversation history and tool calls\n",
"- **Tool Execution**: Processing different tool types and returning results\n",
"- **Agent Loop**: Continuing until the agent has a final answer\n",
"\n",
"From here, we'll start incorporating more intermediate and advanced concepts for 12-factor agents.\n"
]
},
{
"cell_type": "markdown",
"id": "c97a02d7",
"metadata": {},
"source": [
"## Chapter 4 - Add Tests to agent.baml"
]
},
{
"cell_type": "markdown",
"id": "8a02c2e8",
"metadata": {},
"source": [
"Let's add some tests to our BAML agent."
]
},
{
"cell_type": "markdown",
"id": "d7e31cd0",
"metadata": {},
"source": [
"In this chapter, we'll learn about BAML testing - a powerful feature that helps ensure your agents behave correctly.\n",
"\n",
"## Why Test BAML Functions?\n",
"\n",
"- **Catch regressions**: Ensure changes don't break existing behavior\n",
"- **Document behavior**: Tests serve as living documentation\n",
"- **Validate edge cases**: Test complex scenarios and conversation flows\n",
"- **CI/CD integration**: Run tests automatically in your pipeline\n",
"\n",
"Let's start with a simple test that checks the agent's ability to handle basic interactions:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "234b026c",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/04-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "3247eb5a",
"metadata": {},
"source": [
"Run the tests to see them in action:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0c2f3d1",
"metadata": {},
"outputs": [],
"source": [
"!baml-cli test"
]
},
{
"cell_type": "markdown",
"id": "90aedbc1",
"metadata": {},
"source": [
"Now let's improve the tests with assertions! Assertions let you verify specific properties of the agent's output.\n",
"\n",
"## BAML Assertion Syntax\n",
"\n",
"Assertions use the `@@assert` directive:\n",
"```\n",
"@@assert(name, {{condition}})\n",
"```\n",
"\n",
"- `name`: A descriptive name for the assertion\n",
"- `condition`: A boolean expression using `this` to access the output\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1342588",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/04b-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "1b377a6c",
"metadata": {},
"source": [
"Run the tests again to see assertions in action:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "edbcb564",
"metadata": {},
"outputs": [],
"source": [
"!baml-cli test"
]
},
{
"cell_type": "markdown",
"id": "11c2e493",
"metadata": {},
"source": [
"Finally, let's add more complex test cases that test multi-step conversations.\n",
"\n",
"These tests simulate an entire conversation flow, including:\n",
"- User input\n",
"- Tool calls made by the agent\n",
"- Tool responses\n",
"- Final agent response\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e9c86e5e",
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL -o baml_src/agent.baml https://raw.githubusercontent.com/humanlayer/12-factor-agents/refs/heads/main/workshops/2025-07-16/./walkthrough/04c-agent.baml && cat baml_src/agent.baml"
]
},
{
"cell_type": "markdown",
"id": "836f106b",
"metadata": {},
"source": [
"Run the comprehensive test suite:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b9d12cb",
"metadata": {},
"outputs": [],
"source": [
"!baml-cli test"
]
},
{
"cell_type": "markdown",
"id": "5ec0b03f",
"metadata": {},
"source": [
"## Key Testing Concepts\n",
"\n",
"1. **Test Structure**: Each test specifies functions, arguments, and assertions\n",
"2. **Progressive Testing**: Start simple, then test complex scenarios\n",
"3. **Conversation History**: Test how the agent handles multi-turn conversations\n",
"4. **Tool Integration**: Verify the agent correctly uses tools in sequence\n",
"\n",
"With these tests in place, you can confidently modify your agent knowing that core functionality is protected by automated tests!"
]
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,664 +0,0 @@
title: "Building the 12-factor agent template from scratch in python"
text: "Steps to start from a bare python repo and build up a 12-factor agent. This walkthrough will guide you through creating a python agent that follows the 12-factor methodology with baml."
targets:
- ipynb: "./build/workshop-2025-07-16.ipynb"
sections:
- name: hello-world
title: "Chapter 0 - Hello World"
text: "Let's start with a basic TypeScript setup and a hello world program."
steps:
- text: |
This guide is written in TypeScript (yes, a python version is coming soon)
There are many checkpoints between the every file edit in theworkshop steps,
so even if you aren't super familiar with typescript,
you should be able to keep up and run each example.
To run this guide, you'll need a relatively recent version of nodejs and npm installed
You can use whatever nodejs version manager you want, [homebrew](https://formulae.brew.sh/formula/node) is fine
command:
brew install node@20
results:
- text: "You should see the node version"
code: |
node --version
- text: "Copy initial package.json"
file: {src: ./walkthrough/00-package.json, dest: package.json}
- text: "Install dependencies"
command: |
npm install
incremental: true
- text: "Copy tsconfig.json"
file: {src: ./walkthrough/00-tsconfig.json, dest: tsconfig.json}
- text: "add .gitignore"
file: {src: ./walkthrough/00-.gitignore, dest: .gitignore}
- text: "Create src folder"
dir: {create: true, path: src}
- text: "Add a simple hello world index.ts"
file: {src: ./walkthrough/00-index.ts, dest: src/index.ts}
- text: "Run it to verify"
command: |
npx tsx src/index.ts
results:
- text: "You should see:"
code: |
hello, world!
- name: cli-and-agent
title: "Chapter 1 - CLI and Agent Loop"
text: "Now let's add BAML and create our first agent with a CLI interface."
steps:
- text: |
First, we'll need to install [BAML](https://github.com/boundaryml/baml)
which is a tool for prompting and structured outputs.
command: |
npm install @boundaryml/baml
incremental: true
- text: "Initialize BAML"
command: |
npx baml-cli init
incremental: true
- text: "Remove default resume.baml"
command: |
rm baml_src/resume.baml
incremental: true
- text: "Add our starter agent, a single baml prompt that we'll build on"
file: {src: ./walkthrough/01-agent.baml, dest: baml_src/agent.baml}
- text: "Generate BAML client code"
command: |
npx baml-cli generate
incremental: true
- text: "Enable BAML logging for this section"
command: |
export BAML_LOG=debug
- text: "Add the CLI interface"
file: {src: ./walkthrough/01-cli.ts, dest: src/cli.ts}
- text: "Update index.ts to use the CLI"
file: {src: ./walkthrough/01-index.ts, dest: src/index.ts}
- text: "Add the agent implementation"
file: {src: ./walkthrough/01-agent.ts, dest: src/agent.ts}
- text: |
The the BAML code is configured to use BASETEN_API_KEY by default
To get a Baseten API key and URL, create an account at [baseten.co](https://baseten.co),
and then deploy [Qwen3 32B from the model library](https://www.baseten.co/library/qwen-3-32b/).
```rust
function DetermineNextStep(thread: string) -> DoneForNow {
client Qwen3
// ...
```
If you want to run the example with no changes, you can set the BASETEN_API_KEY env var to any valid baseten key.
If you want to try swapping out the model, you can change the `client` line.
[Docs on baml clients can be found here](https://docs.boundaryml.com/guide/baml-basics/switching-llms)
For example, you can configure [gemini](https://docs.boundaryml.com/ref/llm-client-providers/google-ai-gemini)
or [anthropic](https://docs.boundaryml.com/ref/llm-client-providers/anthropic) as your model provider.
For example, to use openai with an OPENAI_API_KEY, you can do:
client "openai/gpt-4o"
- text: Set your env vars
command: |
export BASETEN_API_KEY=...
export BASETEN_BASE_URL=...
- text: "Try it out"
command: |
npx tsx src/index.ts hello
results:
- text: you should see a familiar response from the model
code: |
{
intent: 'done_for_now',
message: 'Hello! How can I assist you today?'
}
- name: calculator-tools
title: "Chapter 2 - Add Calculator Tools"
text: "Let's add some calculator tools to our agent."
steps:
- text: |
Let's start by adding a tool definition for the calculator
These are simpile structured outputs that we'll ask the model to
return as a "next step" in the agentic loop.
file: {src: ./walkthrough/02-tool_calculator.baml, dest: baml_src/tool_calculator.baml}
- text: |
Now, let's update the agent's DetermineNextStep method to
expose the calculator tools as potential next steps
file: {src: ./walkthrough/02-agent.baml, dest: baml_src/agent.baml}
- text: "Generate updated BAML client"
command: |
npx baml-cli generate
incremental: true
- text: "Try out the calculator"
command: |
npx tsx src/index.ts 'can you add 3 and 4'
results:
- text: "You should see a tool call to the calculator"
code: |
{
intent: 'add',
a: 3,
b: 4
}
- name: tool-loop
title: "Chapter 3 - Process Tool Calls in a Loop"
text: "Now let's add a real agentic loop that can run the tools and get a final answer from the LLM."
steps:
- text: |
First, lets update the agent to handle the tool call
file: {src: ./walkthrough/03-agent.ts, dest: src/agent.ts}
- text: |
Now, lets try it out
command: |
npx tsx src/index.ts 'can you add 3 and 4'
results:
- text: you should see the agent call the tool and then return the result
code: |
{
intent: 'done_for_now',
message: 'The sum of 3 and 4 is 7.'
}
- text: "For the next step, we'll do a more complex calculation, let's turn off the baml logs for more concise output"
command: |
export BAML_LOG=off
- text: "Try a multi-step calculation"
command: |
npx tsx src/index.ts 'can you add 3 and 4, then add 6 to that result'
- text: "you'll notice that tools like multiply and divide are not available"
command: |
npx tsx src/index.ts 'can you multiply 3 and 4'
- text: |
next, let's add handlers for the rest of the calculator tools
file: {src: ./walkthrough/03b-agent.ts, dest: src/agent.ts}
- text: "Test subtraction"
command: |
npx tsx src/index.ts 'can you subtract 3 from 4'
- text: |
now, let's test the multiplication tool
command: |
npx tsx src/index.ts 'can you multiply 3 and 4'
- text: |
finally, let's test a more complex calculation with multiple operations
command: |
npx tsx src/index.ts 'can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result'
- text: |
congratulations, you've taking your first step into hand-rolling an agent loop.
from here, we're going to start incorporating some more intermediate and advanced
concepts for 12-factor agents.
- name: baml-tests
title: "Chapter 4 - Add Tests to agent.baml"
text: "Let's add some tests to our BAML agent."
steps:
- text: to start, leave the baml logs enabled
command: |
export BAML_LOG=debug
- text: |
next, let's add some tests to the agent
We'll start with a simple test that checks the agent's ability to handle
a basic calculation.
file: {src: ./walkthrough/04-agent.baml, dest: baml_src/agent.baml}
- text: "Run the tests"
command: |
npx baml-cli test
- text: |
now, let's improve the test with assertions!
Assertions are a great way to make sure the agent is working as expected,
and can easily be extended to check for more complex behavior.
file: {src: ./walkthrough/04b-agent.baml, dest: baml_src/agent.baml}
- text: "Run the tests"
command: |
npx baml-cli test
- text: |
as you add more tests, you can disable the logs to keep the output clean.
You may want to turn them on as you iterate on specific tests.
command: |
export BAML_LOG=off
- text: |
now, let's add some more complex test cases,
where we resume from in the middle of an in-progress
agentic context window
file: {src: ./walkthrough/04c-agent.baml, dest: baml_src/agent.baml}
- text: |
let's try to run it
command: |
npx baml-cli test
- name: human-tools
title: "Chapter 5 - Multiple Human Tools"
text: |
In this section, we'll add support for multiple tools that serve to
contact humans.
steps:
- text: "for this section, we'll disable the baml logs. You can optionally enable them if you want to see more details."
command: |
export BAML_LOG=off
- text: |
first, let's add a tool that can request clarification from a human
this will be different from the "done_for_now" tool,
and can be used to more flexibly handle different types of human interactions
in your agent.
file: {src: ./walkthrough/05-agent.baml, dest: baml_src/agent.baml}
- text: |
next, let's re-generate the client code
NOTE - if you're using the VSCode extension for BAML,
the client will be regenerated automatically when you save the file
in your editor.
command: |
npx baml-cli generate
incremental: true
- text: |
now, let's update the agent to use the new tool
file: {src: ./walkthrough/05-agent.ts, dest: src/agent.ts}
- text: |
next, let's update the CLI to handle clarification requests
by requesting input from the user on the CLI
file: {src: ./walkthrough/05-cli.ts, dest: src/cli.ts}
- text: |
let's try it out
command: |
npx tsx src/index.ts 'can you multiply 3 and FD*(#F&& '
- text: |
next, let's add a test that checks the agent's ability to handle
a clarification request
file: {src: ./walkthrough/05b-agent.baml, dest: baml_src/agent.baml}
- text: |
and now we can run the tests again
command: |
npx baml-cli test
- text: |
you'll notice the new test passes, but the hello world test fails
This is because the agent's default behavior is to return "done_for_now"
file: {src: ./walkthrough/05c-agent.baml, dest: baml_src/agent.baml}
- text: "Verify tests pass"
command: |
npx baml-cli test
- name: customize-prompt
title: "Chapter 6 - Customize Your Prompt with Reasoning"
text: |
In this section, we'll explore how to customize the prompt of the agent
with reasoning steps.
this is core to [factor 2 - own your prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md)
there's a deep dive on reasoning on AI That Works [reasoning models versus reasoning steps](https://github.com/hellovai/ai-that-works/tree/main/2025-04-07-reasoning-models-vs-prompts)
steps:
- text: "for this section, it will be helpful to leave the baml logs enabled"
command: |
export BAML_LOG=debug
- text: |
update the agent prompt to include a reasoning step
file: {src: ./walkthrough/06-agent.baml, dest: baml_src/agent.baml}
- text: generate the updated client
command: |
npx baml-cli generate
incremental: true
- text: |
now, you can try it out with a simple prompt
command: |
npx tsx src/index.ts 'can you multiply 3 and 4'
results:
- text: you should see output from the baml logs showing the reasoning steps
- text: |
#### optional challenge
add a field to your tool output format that includes the reasoning steps in the output!
- name: context-window
title: "Chapter 7 - Customize Your Context Window"
text: |
In this section, we'll explore how to customize the context window
of the agent.
this is core to [factor 3 - own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md)
steps:
- text: |
update the agent to pretty-print the Context window for the model
file: {src: ./walkthrough/07-agent.ts, dest: src/agent.ts}
- text: "Test the formatting"
command: |
BAML_LOG=info npx tsx src/index.ts 'can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result'
- text: |
next, let's update the agent to use XML formatting instead
this is a very popular format for passing data to a model,
among other things, because of the token efficiency of XML.
file: {src: ./walkthrough/07b-agent.ts, dest: src/agent.ts}
- text: |
let's try it out
command: |
BAML_LOG=info npx tsx src/index.ts 'can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result'
- text: |
lets update our tests to match the new output format
file: {src: ./walkthrough/07c-agent.baml, dest: baml_src/agent.baml}
- text: |
check out the updated tests
command: |
npx baml-cli test
- name: api-endpoints
title: "Chapter 8 - Adding API Endpoints"
text: "Add an Express server to expose the agent via HTTP."
steps:
- text: "for this section, we'll disable the baml logs. You can optionally enable them if you want to see more details."
command: |
export BAML_LOG=off
- text: "Install Express and types"
command: |
npm install express && npm install --save-dev @types/express supertest
incremental: true
- text: "Add the server implementation"
file: {src: ./walkthrough/08-server.ts, dest: src/server.ts}
- text: "Start the server"
command: |
npx tsx src/server.ts
- text: "Test with curl (in another terminal)"
command: |
curl -X POST http://localhost:3000/thread \
-H "Content-Type: application/json" \
-d '{"message":"can you add 3 and 4"}'
results:
- text: |
You should get an answer from the agent which includes the
agentic trace, ending in a message like:
code: |
{"intent":"done_for_now","message":"The sum of 3 and 4 is 7."}
- name: state-management
title: "Chapter 9 - In-Memory State and Async Clarification"
text: "Add state management and async clarification support."
steps:
- text: "for this section, we'll disable the baml logs. You can optionally enable them if you want to see more details."
command: |
export BAML_LOG=off
- text: "Add some simple in-memory state management for threads"
file: {src: ./walkthrough/09-state.ts, dest: src/state.ts}
- text: |
update the server to use the state management
* Add thread state management using `ThreadStore`
* return thread IDs and response URLs from the /thread endpoint
* implement GET /thread/:id
* implement POST /thread/:id/response
file: {src: ./walkthrough/09-server.ts, dest: src/server.ts}
- text: "Start the server"
command: |
npx tsx src/server.ts
- text: "Test clarification flow"
command: |
curl -X POST http://localhost:3000/thread \
-H "Content-Type: application/json" \
-d '{"message":"can you multiply 3 and xyz"}'
- name: human-approval
title: "Chapter 10 - Adding Human Approval"
text: "Add support for human approval of operations."
steps:
- text: "for this section, we'll disable the baml logs. You can optionally enable them if you want to see more details."
command: |
export BAML_LOG=off
- text: |
update the server to handle human approvals
* Import `handleNextStep` to execute approved actions
* Add two payload types to distinguish approvals from responses
* Handle responses and approvals differently in the endpoint
* Show better error messages when things go wrongs
file: {src: ./walkthrough/10-server.ts, dest: src/server.ts}
- text: "Add a few methods to the agent to handle approvals and responses"
file: {src: ./walkthrough/10-agent.ts, dest: src/agent.ts}
- text: "Start the server"
command: |
npx tsx src/server.ts
- text: "Test division with approval"
command: |
curl -X POST http://localhost:3000/thread \
-H "Content-Type: application/json" \
-d '{"message":"can you divide 3 by 4"}'
results:
- text: "You should see:"
code: |
{
"thread_id": "2b243b66-215a-4f37-8bc6-9ace3849043b",
"events": [
{
"type": "user_input",
"data": "can you divide 3 by 4"
},
{
"type": "tool_call",
"data": {
"intent": "divide",
"a": 3,
"b": 4,
"response_url": "/thread/2b243b66-215a-4f37-8bc6-9ace3849043b/response"
}
}
]
}
- text: "reject the request with another curl call, changing the thread ID"
command: |
curl -X POST 'http://localhost:3000/thread/{thread_id}/response' \
-H "Content-Type: application/json" \
-d '{"type": "approval", "approved": false, "comment": "I dont think thats right, use 5 instead of 4"}'
results:
- text: 'You should see: the last tool call is now `"intent":"divide","a":3,"b":5`'
code: |
{
"events": [
{
"type": "user_input",
"data": "can you divide 3 by 4"
},
{
"type": "tool_call",
"data": {
"intent": "divide",
"a": 3,
"b": 4,
"response_url": "/thread/2b243b66-215a-4f37-8bc6-9ace3849043b/response"
}
},
{
"type": "tool_response",
"data": "user denied the operation with feedback: \"I dont think thats right, use 5 instead of 4\""
},
{
"type": "tool_call",
"data": {
"intent": "divide",
"a": 3,
"b": 5,
"response_url": "/thread/1f1f5ff5-20d7-4114-97b4-3fc52d5e0816/response"
}
}
]
}
- text: "now you can approve the operation"
command: |
curl -X POST 'http://localhost:3000/thread/{thread_id}/response' \
-H "Content-Type: application/json" \
-d '{"type": "approval", "approved": true}'
results:
- text: "you should see the final message includes the tool response and final result!"
code: |
...
{
"type": "tool_response",
"data": 0.5
},
{
"type": "done_for_now",
"message": "I divided 3 by 6 and the result is 0.5. If you have any more operations or queries, feel free to ask!",
"response_url": "/thread/2b469403-c497-4797-b253-043aae830209/response"
}
- name: humanlayer-approval
title: "Chapter 11 - Human Approvals over email"
text: |
in this section, we'll add support for human approvals over email.
This will start a little bit contrived, just to get the concepts down -
We'll start by invoking the workflow from the CLI but approvals for `divide`
and `request_more_information` will be handled over email,
then the final `done_for_now` answer will be printed back to the CLI
While contrived, this is a great example of the flexibility you get from
[factor 7 - contact humans with tools](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-7-contact-humans-with-tools.md)
steps:
- text: "for this section, we'll disable the baml logs. You can optionally enable them if you want to see more details."
command: |
export BAML_LOG=off
- text: "Install HumanLayer"
command: |
npm install humanlayer
incremental: true
- text: "Update CLI to send `divide` and `request_more_information` to a human via email"
file: {src: ./walkthrough/11-cli.ts, dest: src/cli.ts}
- text: "Run the CLI"
command: |
npx tsx src/index.ts 'can you divide 4 by 5'
results:
- text: "The last line of your program should mention human review step"
code: |
nextStep { intent: 'divide', a: 4, b: 5 }
HumanLayer: Requested human approval from HumanLayer cloud
- text: |
go ahead and respond to the email with some feedback:
![reject-email](https://github.com/humanlayer/12-factor-agents/blob/main/workshops/2025-05/walkthrough/11-email-reject.png?raw=true)
- text: |
you should get another email with an updated attempt based on your feedback!
You can go ahead and approve this one:
![approve-email](https://github.com/humanlayer/12-factor-agents/blob/main/workshops/2025-05/walkthrough/11-email-approve.png?raw=true)
results:
- text: and your final output will look like
code: |
nextStep {
intent: 'done_for_now',
message: 'The division of 4 by 5 is 0.8. If you have any other calculations or questions, feel free to ask!'
}
The division of 4 by 5 is 0.8. If you have any other calculations or questions, feel free to ask!
- text: |
lets implement the `request_more_information` flow as well
file: {src: ./walkthrough/11b-cli.ts, dest: src/cli.ts}
- text: |
lets test the require_approval flow as by asking for a calculation
with garbled input:
command: |
npx tsx src/index.ts 'can you multiply 4 and xyz'
- text: "You should get an email with a request for clarification"
command: |
Can you clarify what 'xyz' represents in this context? Is it a specific number, variable, or something else?
- text: you can response with something like
command: |
use 8 instead of xyz
results:
- text: you should see a final result on the CLI like
code: |
I have multiplied 4 and xyz, using the value 8 for xyz, resulting in 32.
- text: |
as a final step, lets explore using a custom html template for the email
file: {src: ./walkthrough/11c-cli.ts, dest: src/cli.ts}
- text: |
first try with divide:
command: |
npx tsx src/index.ts 'can you divide 4 by 5'
results:
- text: |
you should see a slightly different email with the custom template
![custom-template-email](https://github.com/humanlayer/12-factor-agents/blob/main/workshops/2025-05/walkthrough/11-email-custom.png?raw=true)
feel free to run with the flow and then you can try updating the template to your liking
(if you're using cursor, something as simple as highlighting the template and asking to "make it better"
should do the trick)
try triggering "request_more_information" as well!
- text: |
thats it - in the next chapter, we'll build a fully email-driven
workflow agent that uses webhooks for human approval
- name: humanlayer-webhook
title: "Chapter XX - HumanLayer Webhook Integration"
text: |
the previous sections used the humanlayer SDK in "synchronous mode" - that
means every time we wait for human approval, we sit in a loop
polling until the human response if received.
That's obviously not ideal, especially for production workloads,
so in this section we'll implement [factor 6 - launch / pause / resume with simple APIs](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-6-launch-pause-resume.md)
by updating the server to end processing after contacting a human, and use webhooks to receive the results.
steps:
- text: |
add code to initialize humanlayer in the server
file: {src: ./walkthrough/12-1-server-init.ts, dest: src/server.ts}
- text: |
next, lets update the /thread endpoint to
1. handle requests asynchronously, returning immediately
2. create a human contact on request_more_information and done_for_now calls
# file: {src: }
- text: |
Update the server to be able to handle request_clarification responses
- remove the old /response endpoint and types
- update the /thread endpoint to run processing asynchronously, return immediately
- send a state.threadId when requesting human responses
- add a handleHumanResponse function to process the human response
- add a /webhook endpoint to handle the webhook response
file: {src: ./walkthrough/12a-server.ts, dest: src/server.ts}
- text: "Start the server in another terminal"
command: |
npx tsx src/server.ts
- text: |
now that the server is running, send a payload to the '/thread' endpoint
- text: __ do the response step
- text: __ now handle approvals for divide
- text: __ now also handle done_for_now

View File

@@ -1,5 +1,5 @@
title: "Building the 12-factor agent template from scratch in python"
text: "Steps to start from a bare python repo and build up a 12-factor agent. This walkthrough will guide you through creating a python agent that follows the 12-factor methodology with baml."
title: "Building the 12-factor agent template from scratch in Python"
text: "Steps to start from a bare Python repo and build up a 12-factor agent. This walkthrough will guide you through creating a Python agent that follows the 12-factor methodology with BAML."
targets:
- ipynb: "./build/workshop-2025-07-16.ipynb"
@@ -7,41 +7,356 @@ targets:
sections:
- name: hello-world
title: "Chapter 0 - Hello World"
text: "Let's start with a basic python setup and a hello world program."
text: "Let's start with a basic Python setup and a hello world program."
steps:
- text: "Let's start with a basic python setup and a hello world program."
file: {src: ./walkthrough/00-main.py}
- text: "lets run it"
command: <something to tell our script to run the main function>
- text: |
This guide will walk you through building agents in Python with BAML.
We'll start simple with a hello world program and gradually build up to a full agent.
For this notebook, you'll need to have your OpenAI API key saved in Google Colab secrets.
- text: "Here's our simple hello world program:"
- file: {src: ./walkthrough/00-main.py}
- text: "Let's run it to verify it works:"
- run_main: {regenerate_baml: false}
- name: cli-and-agent
title: "Chapter 1 - CLI and Agent Loop"
text: "Now let's add BAML and create our first agent with a CLI interface."
steps:
- text: |
First, we'll need to install [BAML](https://github.com/boundaryml/baml)
which is a tool for prompting and structured outputs.
command: |
!pip install baml-py
- text: "Initialize BAML"
command: |
!baml-cli init
- text: "Remove default resume.baml"
command: |
!rm baml_src/resume.baml
- text: "Add our starter agent, a single baml prompt that we'll build on"
file: {src: ./walkthrough/01-agent.baml, dest: baml_src/agent.baml}
- text: "Generate BAML client code"
command: |
!baml-cli generate
- text: "Enable BAML logging for this section"
command: |
export BAML_LOG=debug
- text: "Add the CLI interface"
file: {src: ./walkthrough/01-cli.py}
- text: "Add the agent implementation"
file: {src: ./walkthrough/01-agent.py}
- text: "update our main.py to use the CLI"
file: {src: ./walkthrough/01-main.py}
In this chapter, we'll integrate BAML to create an AI agent that can respond to user input.
## What is BAML?
BAML (Boundary Markup Language) is a domain-specific language designed to help developers build reliable AI workflows and agents. Created by [BoundaryML](https://www.boundaryml.com/) (a Y Combinator W23 company), BAML adds the engineering to prompt engineering.
### Why BAML?
- **Type-safe outputs**: Get fully type-safe outputs from LLMs, even when streaming
- **Language agnostic**: Works with Python, TypeScript, Ruby, Go, and more
- **LLM agnostic**: Works with any LLM provider (OpenAI, Anthropic, etc.)
- **Better performance**: State-of-the-art structured outputs that outperform even OpenAI's native function calling
- **Developer-friendly**: Native VSCode extension with syntax highlighting, autocomplete, and interactive playground
### Learn More
- 📚 [Official Documentation](https://docs.boundaryml.com/home)
- 💻 [GitHub Repository](https://github.com/BoundaryML/baml)
- 🎯 [What is BAML?](https://docs.boundaryml.com/guide/introduction/what-is-baml)
- 📖 [BAML Examples](https://github.com/BoundaryML/baml-examples)
- 🏢 [Company Website](https://www.boundaryml.com/)
- 📰 [Blog: AI Agents Need a New Syntax](https://www.boundaryml.com/blog/ai-agents-need-new-syntax)
BAML turns prompt engineering into schema engineering, where you focus on defining the structure of your data rather than wrestling with prompts. This approach leads to more reliable and maintainable AI applications.
### Note on Developer Experience
BAML works much better in VS Code with their official extension, which provides syntax highlighting, autocomplete, inline testing, and an interactive playground. However, for this notebook tutorial, we'll work with BAML files directly without the enhanced IDE features.
First, let's set up BAML support in our notebook.
- baml_setup: true
- command: "!ls baml_src"
- text: |
try it out
command: <something to tell our script to run the main.py>
Now let's create our agent that will use BAML to process user input.
First, we'll define the core agent logic:
- file: {src: ./walkthrough/01-agent.py}
- text: |
Next, we need to define the BAML function that our agent will use.
### Understanding BAML Syntax
BAML files define:
- **Classes**: Structured output schemas (like `DoneForNow` below)
- **Functions**: AI-powered functions that take inputs and return structured outputs
- **Tests**: Example inputs/outputs to validate your prompts
This BAML file defines what our agent can do:
- fetch_file: {src: ./walkthrough/01-agent.baml, dest: baml_src/agent.baml}
- command: "!ls baml_src"
- text: |
Now let's create our main function that accepts a message parameter:
- file: {src: ./walkthrough/01-main.py}
- text: |
Let's test our agent! Try calling main() with different messages:
- `main("What's the weather like?")`
- `main("Tell me a joke")`
- `main("How are you doing today?")`
in this case, we'll use the baml_generate function to
generate the pydantic and python bindings from our
baml source, but in the future we'll skip this step as it
is done automatically by the get_baml_client() function
- run_main: {regenerate_baml: true, args: "Hello from the Python notebook!"}
- text: |
In a few cases, we'll enable the baml debug logs to see the inputs/outputs to and from the model.
- run_main: {regenerate_baml: false, args: "Hello from the Python notebook!", show_logs: true}
- text: |
what's most important there is that you can see the prompt and how the output_format is injected
to tell the model what kind of json we want to return.
- name: calculator-tools
title: "Chapter 2 - Add Calculator Tools"
text: "Let's add some calculator tools to our agent."
steps:
- text: |
Let's start by adding a tool definition for the calculator.
These are simple structured outputs that we'll ask the model to
return as a "next step" in the agentic loop.
- fetch_file: {src: ./walkthrough/02-tool_calculator.baml, dest: baml_src/tool_calculator.baml}
- command: "!ls baml_src"
- text: |
Now, let's update the agent's DetermineNextStep method to
expose the calculator tools as potential next steps.
- fetch_file: {src: ./walkthrough/02-agent.baml, dest: baml_src/agent.baml}
- text: |
Now let's update our main function to show the tool call:
- file: {src: ./walkthrough/02-main.py}
- text: |
Let's try out the calculator! The agent should recognize that you want to perform a calculation
and return the appropriate tool call instead of just a message.
- run_main: {regenerate_baml: false, args: "can you add 3 and 4"}
- name: tool-loop
title: "Chapter 3 - Process Tool Calls in a Loop"
text: "Now let's add a real agentic loop that can run the tools and get a final answer from the LLM."
steps:
- text: |
In this chapter, we'll enhance our agent to process tool calls in a loop. This means:
- The agent can call multiple tools in sequence
- Each tool result is fed back to the agent
- The agent continues until it has a final answer
Let's update our agent to handle tool calls properly:
- file: {src: ./walkthrough/03-agent.py}
- text: |
Now let's update our main function to use the new agent loop:
- file: {src: ./walkthrough/03-main.py}
- text: |
Let's try it out! The agent should now call the tool and return the calculated result:
- run_main: {regenerate_baml: false, args: "can you add 3 and 4"}
- text: |
you can run with baml_logs enabled to see how the prompt changed when we added the New
tool types to our union of response types.
- run_main: {regenerate_baml: false, args: "can you add 3 and 4", show_logs: true}
- text: |
You should see the agent:
1. Recognize it needs to use the add tool
2. Call the tool with the correct parameters
3. Get the result (7)
4. Generate a final response incorporating the result
For more complex calculations, we need to handle all calculator operations. Let's add support for subtract, multiply, and divide:
- file: {src: ./walkthrough/03b-agent.py}
- text: |
Now let's test subtraction:
- run_main: {regenerate_baml: false, args: "can you subtract 3 from 4"}
- text: |
Test multiplication:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and 4"}
- text: |
Finally, let's test a complex multi-step calculation:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and 4, then divide the result by 2 and then add 12 to that result"}
- text: |
Congratulations! You've taken your first step into hand-rolling an agent loop.
Key concepts you've learned:
- **Thread Management**: Tracking conversation history and tool calls
- **Tool Execution**: Processing different tool types and returning results
- **Agent Loop**: Continuing until the agent has a final answer
From here, we'll start incorporating more intermediate and advanced concepts for 12-factor agents.
- name: baml-tests
title: "Chapter 4 - Add Tests to agent.baml"
text: "Let's add some tests to our BAML agent."
steps:
- text: |
In this chapter, we'll learn about BAML testing - a powerful feature that helps ensure your agents behave correctly.
## Why Test BAML Functions?
- **Catch regressions**: Ensure changes don't break existing behavior
- **Document behavior**: Tests serve as living documentation
- **Validate edge cases**: Test complex scenarios and conversation flows
- **CI/CD integration**: Run tests automatically in your pipeline
Let's start with a simple test that checks the agent's ability to handle basic interactions:
- fetch_file: {src: ./walkthrough/04-agent.baml, dest: baml_src/agent.baml}
- text: |
Run the tests to see them in action:
- command: "!baml-cli test"
- text: |
Now let's improve the tests with assertions! Assertions let you verify specific properties of the agent's output.
## BAML Assertion Syntax
Assertions use the `@@assert` directive:
```
@@assert(name, {{condition}})
```
- `name`: A descriptive name for the assertion
- `condition`: A boolean expression using `this` to access the output
- fetch_file: {src: ./walkthrough/04b-agent.baml, dest: baml_src/agent.baml}
- text: |
Run the tests again to see assertions in action:
- command: "!baml-cli test"
- text: |
Finally, let's add more complex test cases that test multi-step conversations.
These tests simulate an entire conversation flow, including:
- User input
- Tool calls made by the agent
- Tool responses
- Final agent response
- fetch_file: {src: ./walkthrough/04c-agent.baml, dest: baml_src/agent.baml}
- text: |
Run the comprehensive test suite:
- command: "!baml-cli test"
- text: |
## Key Testing Concepts
1. **Test Structure**: Each test specifies functions, arguments, and assertions
2. **Progressive Testing**: Start simple, then test complex scenarios
3. **Conversation History**: Test how the agent handles multi-turn conversations
4. **Tool Integration**: Verify the agent correctly uses tools in sequence
With these tests in place, you can confidently modify your agent knowing that core functionality is protected by automated tests!
- name: human-tools
title: "Chapter 5 - Multiple Human Tools"
text: |
In this section, we'll add support for multiple tools that serve to contact humans.
steps:
- text: |
So far, our agent only returns a final answer with "done_for_now". But what if the agent needs clarification?
Let's add a new tool that allows the agent to request more information from the user.
## Why Human-in-the-Loop?
- **Handle ambiguous inputs**: When user input is unclear or contains typos
- **Request missing information**: When the agent needs more context
- **Confirm sensitive operations**: Before performing important actions
- **Interactive workflows**: Build conversational agents that engage users
First, let's update our BAML file to include a ClarificationRequest tool:
- fetch_file: {src: ./walkthrough/05-agent.baml, dest: baml_src/agent.baml}
- text: |
Now let's update our agent to handle clarification requests:
- file: {src: ./walkthrough/05-agent.py}
- text: |
Finally, let's create a main function that handles human interaction:
- file: {src: ./walkthrough/05-main.py}
- text: |
Let's test with an ambiguous input that should trigger a clarification request:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and FD*(#F&&"}
- text: |
You should see:
1. The agent recognizes the input is unclear
2. It asks for clarification
3. In Colab, you'll be prompted to type a response
4. In local testing, an auto-response is provided
5. The agent continues with the clarified input
## Interactive Testing in Colab
When running in Google Colab, the `input()` function will create an interactive text box where you can type your response. Try different clarifications to see how the agent adapts!
## Key Concepts
- **Human Tools**: Special tool types that return control to the human
- **Conversation Flow**: The agent can pause execution to get human input
- **Context Preservation**: The full conversation history is maintained
- **Flexible Handling**: Different behaviors for different environments
- name: customize-prompt
title: "Chapter 6 - Customize Your Prompt with Reasoning"
text: |
In this section, we'll explore how to customize the prompt of the agent with reasoning steps.
This is core to [factor 2 - own your prompts](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-2-own-your-prompts.md)
steps:
- text: |
## Why Add Reasoning to Prompts?
Adding explicit reasoning steps to your prompts can significantly improve agent performance:
- **Better decisions**: The model thinks through problems step-by-step
- **Transparency**: You can see the model's thought process
- **Fewer errors**: Structured thinking reduces mistakes
- **Debugging**: Easier to identify where reasoning went wrong
Let's update our agent prompt to include a reasoning step:
- fetch_file: {src: ./walkthrough/06-agent.baml, dest: baml_src/agent.baml}
- text: |
Now let's test it with a simple calculation to see the reasoning in action:
**Note:** The BAML logs below will show the model's reasoning steps. Look for the `<reasoning>` tags in the logs to see how the model thinks through the problem before deciding what to do.
- run_main: {args: "can you multiply 3 and 4", show_logs: true}
- text: |
You should see the reasoning steps in the BAML logs above. The model explicitly thinks through what it needs to do before making a decision.
💡 **Tip:** If you want to see BAML logs for any other calls in this notebook, you can use the `run_with_baml_logs` helper function:
```python
# Instead of: main("your message")
# Use: run_with_baml_logs(main, "your message")
```
## Advanced Prompt Engineering
You can enhance your prompts further by:
- Adding specific reasoning templates for different tasks
- Including examples of good reasoning
- Structuring the reasoning with numbered steps
- Adding checks for common mistakes
The key is to guide the model's thinking process while still allowing flexibility.
- name: context-window
title: "Chapter 7 - Customize Your Context Window"
text: |
In this section, we'll explore how to customize the context window of the agent.
This is core to [factor 3 - own your context window](https://github.com/humanlayer/12-factor-agents/blob/main/content/factor-3-own-your-context-window.md)
steps:
- text: |
## Context Window Serialization
How you format your conversation history can significantly impact:
- **Token usage**: Some formats are more efficient
- **Model understanding**: Clear structure helps the model
- **Debugging**: Readable formats help development
Let's implement two serialization formats: pretty-printed JSON and XML.
- file: {src: ./walkthrough/07-agent.py}
- text: |
Now let's create a main function that can switch between formats:
- file: {src: ./walkthrough/07-main.py}
- text: |
Let's test with JSON format first:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and 4, then divide the result by 2", kwargs: {use_xml: false}}
- text: |
Now let's try the same with XML format:
- run_main: {regenerate_baml: false, args: "can you multiply 3 and 4, then divide the result by 2", kwargs: {use_xml: true}}
- text: |
## XML vs JSON Trade-offs
**XML Benefits**:
- More token-efficient for nested data
- Clear hierarchy with opening/closing tags
- Better for long conversations
**JSON Benefits**:
- Familiar to most developers
- Easy to parse and debug
- Native to JavaScript/Python
Choose based on your specific needs and token constraints!