414 lines
22 KiB
Plaintext
414 lines
22 KiB
Plaintext
<CourseFloatingBanner chapter={2}
|
||
classNames="absolute z-10 right-0 top-0"
|
||
notebooks={[
|
||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/multiagent_notebook.ipynb"},
|
||
]} />
|
||
|
||
# Multi-Agent Systems
|
||
|
||
Multi-agent systems enable **specialized agents to collaborate on complex tasks**, improving modularity, scalability, and robustness. Instead of relying on a single agent, tasks are distributed among agents with distinct capabilities.
|
||
|
||
In **smolagents**, different agents can be combined to generate Python code, call external tools, perform web searches, and more. By orchestrating these agents, we can create powerful workflows.
|
||
|
||
A typical setup might include:
|
||
- A **Manager Agent** for task delegation
|
||
- A **Code Interpreter Agent** for code execution
|
||
- A **Web Search Agent** for information retrieval
|
||
|
||
The diagram below illustrates a simple multi-agent architecture where a **Manager Agent** coordinates a **Code Interpreter Tool** and a **Web Search Agent**, which in turn utilizes tools like the `DuckDuckGoSearchTool` and `VisitWebpageTool` to gather relevant information.
|
||
|
||
<img src="https://mermaid.ink/img/pako:eNp1kc1qhTAQRl9FUiQb8wIpdNO76eKubrmFks1oRg3VSYgjpYjv3lFL_2hnMWQOJwn5sqgmelRWleUSKLAtFs09jqhtoWuYUFfFAa6QA9QDTnpzamheuhxn8pt40-6l13UtS0ddhtQXj6dbR4XUGQg6zEYasTF393KjeSDGnDJKNxzj8I_7hLW5IOSmP9CH9hv_NL-d94d4DVNg84p1EnK4qlIj5hGClySWbadT-6OdsrL02MI8sFOOVkciw8zx8kaNspxnrJQE0fXKtjBMMs3JA-MpgOQwftIE9Bzj14w-cMznI_39E9Z3p0uFoA?type=png" style='background: white;'>
|
||
|
||
## Multi-Agent Systems in Action
|
||
|
||
A multi-agent system consists of multiple specialized agents working together under the coordination of an **Orchestrator Agent**. This approach enables complex workflows by distributing tasks among agents with distinct roles.
|
||
|
||
For example, a **Multi-Agent RAG system** can integrate:
|
||
- A **Web Agent** for browsing the internet.
|
||
- A **Retriever Agent** for fetching information from knowledge bases.
|
||
- An **Image Generation Agent** for producing visuals.
|
||
|
||
All of these agents operate under an orchestrator that manages task delegation and interaction.
|
||
|
||
## Solving a complex task with a multi-agent hierarchy
|
||
|
||
<Tip>
|
||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/multiagent_notebook.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||
</Tip>
|
||
|
||
The reception is approaching! With your help, Alfred is now nearly finished with the preparations.
|
||
|
||
But now there's a problem: the Batmobile has disappeared. Alfred needs to find a replacement, and find it quickly.
|
||
|
||
Fortunately, a few biopics have been done on Bruce Wayne's life, so maybe Alfred could get a car left behind on one of the movie sets, and re-engineer it up to modern standards, which certainly would include a full self-driving option.
|
||
|
||
But this could be anywhere in the filming locations around the world - which could be numerous.
|
||
|
||
So Alfred wants your help. Could you build an agent able to solve this task?
|
||
|
||
> 👉 Find all Batman filming locations in the world, calculate the time to transfer via boat to there, and represent them on a map, with a color varying by boat transfer time. Also represent some supercar factories with the same boat transfer time.
|
||
|
||
Let's build this!
|
||
|
||
This example needs some additional packages, so let's install them first:
|
||
|
||
```bash
|
||
pip install 'smolagents[litellm]' matplotlib geopandas shapely kaleido -q
|
||
```
|
||
|
||
### We first make a tool to get the cargo plane transfer time.
|
||
|
||
```python
|
||
import math
|
||
from typing import Optional, Tuple
|
||
|
||
from smolagents import tool
|
||
|
||
|
||
@tool
|
||
def calculate_cargo_travel_time(
|
||
origin_coords: Tuple[float, float],
|
||
destination_coords: Tuple[float, float],
|
||
cruising_speed_kmh: Optional[float] = 750.0, # Average speed for cargo planes
|
||
) -> float:
|
||
"""
|
||
Calculate the travel time for a cargo plane between two points on Earth using great-circle distance.
|
||
|
||
Args:
|
||
origin_coords: Tuple of (latitude, longitude) for the starting point
|
||
destination_coords: Tuple of (latitude, longitude) for the destination
|
||
cruising_speed_kmh: Optional cruising speed in km/h (defaults to 750 km/h for typical cargo planes)
|
||
|
||
Returns:
|
||
float: The estimated travel time in hours
|
||
|
||
Example:
|
||
>>> # Chicago (41.8781° N, 87.6298° W) to Sydney (33.8688° S, 151.2093° E)
|
||
>>> result = calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093))
|
||
"""
|
||
|
||
def to_radians(degrees: float) -> float:
|
||
return degrees * (math.pi / 180)
|
||
|
||
# Extract coordinates
|
||
lat1, lon1 = map(to_radians, origin_coords)
|
||
lat2, lon2 = map(to_radians, destination_coords)
|
||
|
||
# Earth's radius in kilometers
|
||
EARTH_RADIUS_KM = 6371.0
|
||
|
||
# Calculate great-circle distance using the haversine formula
|
||
dlon = lon2 - lon1
|
||
dlat = lat2 - lat1
|
||
|
||
a = (
|
||
math.sin(dlat / 2) ** 2
|
||
+ math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
|
||
)
|
||
c = 2 * math.asin(math.sqrt(a))
|
||
distance = EARTH_RADIUS_KM * c
|
||
|
||
# Add 10% to account for non-direct routes and air traffic controls
|
||
actual_distance = distance * 1.1
|
||
|
||
# Calculate flight time
|
||
# Add 1 hour for takeoff and landing procedures
|
||
flight_time = (actual_distance / cruising_speed_kmh) + 1.0
|
||
|
||
# Format the results
|
||
return round(flight_time, 2)
|
||
|
||
|
||
print(calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093)))
|
||
```
|
||
|
||
### Setting up the agent
|
||
|
||
For the model provider, we use Together AI, one of the new [inference providers on the Hub](https://huggingface.co/blog/inference-providers)!
|
||
|
||
The GoogleSearchTool uses the [Serper API](https://serper.dev) to search the web, so this requires either having setup env variable `SERPAPI_API_KEY` and passing `provider="serpapi"` or having `SERPER_API_KEY` and passing `provider=serper`.
|
||
|
||
If you don't have any Serp API provider setup, you can use `DuckDuckGoSearchTool` but beware that it has a rate limit.
|
||
|
||
```python
|
||
import os
|
||
from PIL import Image
|
||
from smolagents import CodeAgent, GoogleSearchTool, HfApiModel, VisitWebpageTool
|
||
|
||
model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct", provider="together")
|
||
```
|
||
|
||
We can start by creating a simple agent as a baseline to give us a simple report.
|
||
|
||
```python
|
||
task = """Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W), and return them to me as a pandas dataframe.
|
||
Also give me some supercar factories with the same cargo plane transfer time."""
|
||
```
|
||
|
||
```python
|
||
agent = CodeAgent(
|
||
model=model,
|
||
tools=[GoogleSearchTool("serper"), VisitWebpageTool(), calculate_cargo_travel_time],
|
||
additional_authorized_imports=["pandas"],
|
||
max_steps=20,
|
||
)
|
||
```
|
||
|
||
```python
|
||
result = agent.run(task)
|
||
```
|
||
|
||
```python
|
||
result
|
||
```
|
||
|
||
In our case, it generates this output:
|
||
|
||
```python
|
||
| | Location | Travel Time to Gotham (hours) |
|
||
|--|------------------------------------------------------|------------------------------|
|
||
| 0 | Necropolis Cemetery, Glasgow, Scotland, UK | 8.60 |
|
||
| 1 | St. George's Hall, Liverpool, England, UK | 8.81 |
|
||
| 2 | Two Temple Place, London, England, UK | 9.17 |
|
||
| 3 | Wollaton Hall, Nottingham, England, UK | 9.00 |
|
||
| 4 | Knebworth House, Knebworth, Hertfordshire, UK | 9.15 |
|
||
| 5 | Acton Lane Power Station, Acton Lane, Acton, UK | 9.16 |
|
||
| 6 | Queensboro Bridge, New York City, USA | 1.01 |
|
||
| 7 | Wall Street, New York City, USA | 1.00 |
|
||
| 8 | Mehrangarh Fort, Jodhpur, Rajasthan, India | 18.34 |
|
||
| 9 | Turda Gorge, Turda, Romania | 11.89 |
|
||
| 10 | Chicago, USA | 2.68 |
|
||
| 11 | Hong Kong, China | 19.99 |
|
||
| 12 | Cardington Studios, Northamptonshire, UK | 9.10 |
|
||
| 13 | Warner Bros. Leavesden Studios, Hertfordshire, UK | 9.13 |
|
||
| 14 | Westwood, Los Angeles, CA, USA | 6.79 |
|
||
| 15 | Woking, UK (McLaren) | 9.13 |
|
||
```
|
||
|
||
We could already improve this a bit by throwing in some dedicated planning steps, and adding more prompting.
|
||
|
||
Planning steps allow the agent to think ahead and plan its next steps, which can be useful for more complex tasks.
|
||
|
||
```python
|
||
agent.planning_interval = 4
|
||
|
||
detailed_report = agent.run(f"""
|
||
You're an expert analyst. You make comprehensive reports after visiting many websites.
|
||
Don't hesitate to search for many queries at once in a for loop.
|
||
For each data point that you find, visit the source url to confirm numbers.
|
||
|
||
{task}
|
||
""")
|
||
|
||
print(detailed_report)
|
||
```
|
||
|
||
```python
|
||
detailed_report
|
||
```
|
||
|
||
In our case, it generates this output:
|
||
|
||
```python
|
||
| | Location | Travel Time (hours) |
|
||
|--|--------------------------------------------------|---------------------|
|
||
| 0 | Bridge of Sighs, Glasgow Necropolis, Glasgow, UK | 8.6 |
|
||
| 1 | Wishart Street, Glasgow, Scotland, UK | 8.6 |
|
||
```
|
||
|
||
|
||
Thanks to these quick changes, we obtained a much more concise report by simply providing our agent a detailed prompt, and giving it planning capabilities!
|
||
|
||
The model's context window is quickly filling up. So **if we ask our agent to combine the results of detailed search with another, it will be slower and quickly ramp up tokens and costs**.
|
||
|
||
➡️ We need to improve the structure of our system.
|
||
|
||
### ✌️ Splitting the task between two agents
|
||
|
||
Multi-agent structures allow to separate memories between different sub-tasks, with two great benefits:
|
||
- Each agent is more focused on its core task, thus more performant
|
||
- Separating memories reduces the count of input tokens at each step, thus reducing latency and cost.
|
||
|
||
Let's create a team with a dedicated web search agent, managed by another agent.
|
||
|
||
The manager agent should have plotting capabilities to write its final report: so let us give it access to additional imports, including `matplotlib`, and `geopandas` + `shapely` for spatial plotting.
|
||
|
||
```python
|
||
model = HfApiModel(
|
||
"Qwen/Qwen2.5-Coder-32B-Instruct", provider="together", max_tokens=8096
|
||
)
|
||
|
||
web_agent = CodeAgent(
|
||
model=model,
|
||
tools=[
|
||
GoogleSearchTool(provider="serper"),
|
||
VisitWebpageTool(),
|
||
calculate_cargo_travel_time,
|
||
],
|
||
name="web_agent",
|
||
description="Browses the web to find information",
|
||
verbosity_level=0,
|
||
max_steps=10,
|
||
)
|
||
```
|
||
|
||
The manager agent will need to do some mental heavy lifting.
|
||
|
||
So we give it the stronger model [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), and add a `planning_interval` to the mix.
|
||
|
||
```python
|
||
from smolagents.utils import encode_image_base64, make_image_url
|
||
from smolagents import OpenAIServerModel
|
||
|
||
|
||
def check_reasoning_and_plot(final_answer, agent_memory):
|
||
final_answer
|
||
multimodal_model = OpenAIServerModel("gpt-4o", max_tokens=8096)
|
||
filepath = "saved_map.png"
|
||
assert os.path.exists(filepath), "Make sure to save the plot under saved_map.png!"
|
||
image = Image.open(filepath)
|
||
prompt = (
|
||
f"Here is a user-given task and the agent steps: {agent_memory.get_succinct_steps()}. Now here is the plot that was made."
|
||
"Please check that the reasoning process and plot are correct: do they correctly answer the given task?"
|
||
"First list reasons why yes/no, then write your final decision: PASS in caps lock if it is satisfactory, FAIL if it is not."
|
||
"Don't be harsh: if the plot mostly solves the task, it should pass."
|
||
"To pass, a plot should be made using px.scatter_map and not any other method (scatter_map looks nicer)."
|
||
)
|
||
messages = [
|
||
{
|
||
"role": "user",
|
||
"content": [
|
||
{
|
||
"type": "text",
|
||
"text": prompt,
|
||
},
|
||
{
|
||
"type": "image_url",
|
||
"image_url": {"url": make_image_url(encode_image_base64(image))},
|
||
},
|
||
],
|
||
}
|
||
]
|
||
output = multimodal_model(messages).content
|
||
print("Feedback: ", output)
|
||
if "FAIL" in output:
|
||
raise Exception(output)
|
||
return True
|
||
|
||
|
||
manager_agent = CodeAgent(
|
||
model=HfApiModel("deepseek-ai/DeepSeek-R1", provider="together", max_tokens=8096),
|
||
tools=[calculate_cargo_travel_time],
|
||
managed_agents=[web_agent],
|
||
additional_authorized_imports=[
|
||
"geopandas",
|
||
"plotly",
|
||
"shapely",
|
||
"json",
|
||
"pandas",
|
||
"numpy",
|
||
],
|
||
planning_interval=5,
|
||
verbosity_level=2,
|
||
final_answer_checks=[check_reasoning_and_plot],
|
||
max_steps=15,
|
||
)
|
||
```
|
||
|
||
Let us inspect what this team looks like:
|
||
|
||
```python
|
||
manager_agent.visualize()
|
||
```
|
||
|
||
This will generate something like this, helping us understand the structure and relationship between agents and tools used:
|
||
|
||
```python
|
||
CodeAgent | deepseek-ai/DeepSeek-R1
|
||
├── ✅ Authorized imports: ['geopandas', 'plotly', 'shapely', 'json', 'pandas', 'numpy']
|
||
├── 🛠️ Tools:
|
||
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||
│ ┃ Name ┃ Description ┃ Arguments ┃
|
||
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||
│ │ calculate_cargo_travel_time │ Calculate the travel time for a cargo │ origin_coords (`array`): Tuple of │
|
||
│ │ │ plane between two points on Earth │ (latitude, longitude) for the │
|
||
│ │ │ using great-circle distance. │ starting point │
|
||
│ │ │ │ destination_coords (`array`): Tuple │
|
||
│ │ │ │ of (latitude, longitude) for the │
|
||
│ │ │ │ destination │
|
||
│ │ │ │ cruising_speed_kmh (`number`): │
|
||
│ │ │ │ Optional cruising speed in km/h │
|
||
│ │ │ │ (defaults to 750 km/h for typical │
|
||
│ │ │ │ cargo planes) │
|
||
│ │ final_answer │ Provides a final answer to the given │ answer (`any`): The final answer to │
|
||
│ │ │ problem. │ the problem │
|
||
│ └─────────────────────────────┴───────────────────────────────────────┴───────────────────────────────────────┘
|
||
└── 🤖 Managed agents:
|
||
└── web_agent | CodeAgent | Qwen/Qwen2.5-Coder-32B-Instruct
|
||
├── ✅ Authorized imports: []
|
||
├── 📝 Description: Browses the web to find information
|
||
└── 🛠️ Tools:
|
||
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||
┃ Name ┃ Description ┃ Arguments ┃
|
||
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||
│ web_search │ Performs a google web search for │ query (`string`): The search │
|
||
│ │ your query then returns a string │ query to perform. │
|
||
│ │ of the top search results. │ filter_year (`integer`): │
|
||
│ │ │ Optionally restrict results to a │
|
||
│ │ │ certain year │
|
||
│ visit_webpage │ Visits a webpage at the given url │ url (`string`): The url of the │
|
||
│ │ and reads its content as a │ webpage to visit. │
|
||
│ │ markdown string. Use this to │ │
|
||
│ │ browse webpages. │ │
|
||
│ calculate_cargo_travel_time │ Calculate the travel time for a │ origin_coords (`array`): Tuple of │
|
||
│ │ cargo plane between two points on │ (latitude, longitude) for the │
|
||
│ │ Earth using great-circle │ starting point │
|
||
│ │ distance. │ destination_coords (`array`): │
|
||
│ │ │ Tuple of (latitude, longitude) │
|
||
│ │ │ for the destination │
|
||
│ │ │ cruising_speed_kmh (`number`): │
|
||
│ │ │ Optional cruising speed in km/h │
|
||
│ │ │ (defaults to 750 km/h for typical │
|
||
│ │ │ cargo planes) │
|
||
│ final_answer │ Provides a final answer to the │ answer (`any`): The final answer │
|
||
│ │ given problem. │ to the problem │
|
||
└─────────────────────────────┴───────────────────────────────────┴───────────────────────────────────┘
|
||
```
|
||
|
||
```python
|
||
manager_agent.run("""
|
||
Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W).
|
||
Also give me some supercar factories with the same cargo plane transfer time. You need at least 6 points in total.
|
||
Represent this as spatial map of the world, with the locations represented as scatter points with a color that depends on the travel time, and save it to saved_map.png!
|
||
|
||
Here's an example of how to plot and return a map:
|
||
import plotly.express as px
|
||
df = px.data.carshare()
|
||
fig = px.scatter_map(df, lat="centroid_lat", lon="centroid_lon", text="name", color="peak_hour", size=100,
|
||
color_continuous_scale=px.colors.sequential.Magma, size_max=15, zoom=1)
|
||
fig.show()
|
||
fig.write_image("saved_image.png")
|
||
final_answer(fig)
|
||
|
||
Never try to process strings using code: when you have a string to read, just print it and you'll see it.
|
||
""")
|
||
```
|
||
|
||
I don't know how that went in your run, but in mine, the manager agent skilfully divided tasks given to the web agent in `1. Search for Batman filming locations`, then `2. Find supercar factories`, before aggregating the lists and plotting the map.
|
||
|
||
Let's see what the map looks like by inspecting it directly from the agent state:
|
||
|
||
```python
|
||
manager_agent.python_executor.state["fig"]
|
||
```
|
||
|
||
This will output the map:
|
||
|
||

|
||
|
||
## Resources
|
||
|
||
- [Multi-Agent Systems](https://huggingface.co/docs/smolagents/main/en/examples/multiagents) – Overview of multi-agent systems.
|
||
- [What is Agentic RAG?](https://weaviate.io/blog/what-is-agentic-rag) – Introduction to Agentic RAG.
|
||
- [Multi-Agent RAG System 🤖🤝🤖 Recipe](https://huggingface.co/learn/cookbook/multiagent_rag_system) – Step-by-step guide to building a multi-agent RAG system.
|