Merge branch 'main' into xsn/vi_translation
This commit is contained in:
2
.github/workflows/build_documentation.yml
vendored
2
.github/workflows/build_documentation.yml
vendored
@@ -14,6 +14,6 @@ jobs:
|
||||
package_name: agents-course
|
||||
path_to_docs: agents-course/units/
|
||||
additional_args: --not_python_module
|
||||
languages: en
|
||||
languages: en zh-CN
|
||||
secrets:
|
||||
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
|
||||
|
||||
2
.github/workflows/build_pr_documentation.yml
vendored
2
.github/workflows/build_pr_documentation.yml
vendored
@@ -17,4 +17,4 @@ jobs:
|
||||
package_name: agents-course
|
||||
path_to_docs: agents-course/units/
|
||||
additional_args: --not_python_module
|
||||
languages: en vi
|
||||
languages: en zh-CN vi
|
||||
|
||||
5
.gitignore
vendored
5
.gitignore
vendored
@@ -172,3 +172,8 @@ cython_debug/
|
||||
|
||||
# PyPI configuration file
|
||||
.pypirc
|
||||
|
||||
# custom additions
|
||||
notebooks/unit2/llama-index/data
|
||||
notebooks/unit2/llama-index/alfred_chroma_db
|
||||
.DS_Store
|
||||
|
||||
@@ -130,7 +130,9 @@
|
||||
"!pip install -q -U peft\n",
|
||||
"!pip install -q -U trl\n",
|
||||
"!pip install -q -U tensorboardX\n",
|
||||
"!pip install -q wandb"
|
||||
"!pip install -q wandb\n",
|
||||
"!pip install -q -U torchvision\n",
|
||||
"!pip install -q -U transformers"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -653,7 +655,7 @@
|
||||
"source": [
|
||||
"## Step 9: Let's configure the LoRA\n",
|
||||
"\n",
|
||||
"This is we are going to define the parameter of our adapter. Those a the most important parameters in LoRA as they define the size and importance of the adapters we are training."
|
||||
"This is we are going to define the parameter of our adapter. Those are the most important parameters in LoRA as they define the size and importance of the adapters we are training."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1194,7 +1196,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"execution_count": null,
|
||||
"id": "56b89825-70ac-42c1-934c-26e2d54f3b7b",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
@@ -1474,7 +1476,7 @@
|
||||
"device = \"auto\"\n",
|
||||
"config = PeftConfig.from_pretrained(peft_model_id)\n",
|
||||
"model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,\n",
|
||||
" device_map=\"auto\",\n",
|
||||
" device_map=device,\n",
|
||||
" )\n",
|
||||
"tokenizer = AutoTokenizer.from_pretrained(peft_model_id)\n",
|
||||
"model.resize_token_embeddings(len(tokenizer))\n",
|
||||
|
||||
@@ -40,7 +40,9 @@
|
||||
"\n",
|
||||
"In the Hugging Face ecosystem, there is a convenient feature called Serverless API that allows you to easily run inference on many models. There's no installation or deployment required.\n",
|
||||
"\n",
|
||||
"To run this notebook, **you need a Hugging Face token** that you can get from https://hf.co/settings/tokens. If you are running this notebook on Google Colab, you can set it up in the \"settings\" tab under \"secrets\". Make sure to call it \"HF_TOKEN\".\n",
|
||||
"To run this notebook, **you need a Hugging Face token** that you can get from https://hf.co/settings/tokens. A \"Read\" token type is sufficient. \n",
|
||||
"- If you are running this notebook on Google Colab, you can set it up in the \"settings\" tab under \"secrets\". Make sure to call it \"HF_TOKEN\" and restart the session to load the environment variable (Runtime -> Restart session).\n",
|
||||
"- If you are running this notebook locally, you can set it up as an [environment variable](https://huggingface.co/docs/huggingface_hub/en/package_reference/environment_variables). Make sure you restart the kernel after installing or updating huggingface_hub. You can update huggingface_hub by modifying the above `!pip install -q huggingface_hub -U`\n",
|
||||
"\n",
|
||||
"You also need to request access to [the Meta Llama models](https://huggingface.co/meta-llama), select [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) if you haven't done it click on Expand to review and access and fill the form. Approval usually takes up to an hour."
|
||||
]
|
||||
@@ -458,7 +460,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"execution_count": null,
|
||||
"id": "9fc783f2-66ac-42cf-8a57-51788f81d436",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
@@ -490,7 +492,7 @@
|
||||
"# The answer was hallucinated by the model. We need to stop to actually execute the function!\n",
|
||||
"output = client.text_generation(\n",
|
||||
" prompt,\n",
|
||||
" max_new_tokens=200,\n",
|
||||
" max_new_tokens=150,\n",
|
||||
" stop=[\"Observation:\"] # Let's stop before any actual function is called\n",
|
||||
")\n",
|
||||
"\n",
|
||||
@@ -506,7 +508,7 @@
|
||||
"source": [
|
||||
"Much Better!\n",
|
||||
"\n",
|
||||
"Let's now create a **dummy get weather function**. In a real situation you could call and API."
|
||||
"Let's now create a **dummy get weather function**. In a real situation you could call an API."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
334
notebooks/unit2/llama-index/agents.ipynb
Normal file
334
notebooks/unit2/llama-index/agents.ipynb
Normal file
@@ -0,0 +1,334 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Agents in LlamaIndex\n",
|
||||
"\n",
|
||||
"This notebook is part of the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course), a free Course from beginner to expert, where you learn to build Agents.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Let's install the dependencies\n",
|
||||
"\n",
|
||||
"We will install the dependencies for this unit."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 43,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install llama-index datasets llama-index-callbacks-arize-phoenix llama-index-vector-stores-chroma llama-index-llms-huggingface-api -U -q"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And, let's log in to Hugging Face to use serverless Inference APIs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from huggingface_hub import login\n",
|
||||
"\n",
|
||||
"login()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"vscode": {
|
||||
"languageId": "plaintext"
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Initialising agents\n",
|
||||
"\n",
|
||||
"Let's start by initialising an agent. We will use the basic `AgentWorkflow` class to create an agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI\n",
|
||||
"from llama_index.core.agent.workflow import AgentWorkflow, ToolCallResult, AgentStream\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def add(a: int, b: int) -> int:\n",
|
||||
" \"\"\"Add two numbers\"\"\"\n",
|
||||
" return a + b\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def subtract(a: int, b: int) -> int:\n",
|
||||
" \"\"\"Subtract two numbers\"\"\"\n",
|
||||
" return a - b\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def multiply(a: int, b: int) -> int:\n",
|
||||
" \"\"\"Multiply two numbers\"\"\"\n",
|
||||
" return a * b\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def divide(a: int, b: int) -> int:\n",
|
||||
" \"\"\"Divide two numbers\"\"\"\n",
|
||||
" return a / b\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"llm = HuggingFaceInferenceAPI(model_name=\"Qwen/Qwen2.5-Coder-32B-Instruct\")\n",
|
||||
"\n",
|
||||
"agent = AgentWorkflow.from_tools_or_functions(\n",
|
||||
" tools_or_functions=[subtract, multiply, divide, add],\n",
|
||||
" llm=llm,\n",
|
||||
" system_prompt=\"You are a math agent that can add, subtract, multiply, and divide numbers using provided tools.\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Then, we can run the agent and get the response and reasoning behind the tool calls."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"handler = agent.run(\"What is (2 + 2) * 2?\")\n",
|
||||
"async for ev in handler.stream_events():\n",
|
||||
" if isinstance(ev, ToolCallResult):\n",
|
||||
" print(\"\")\n",
|
||||
" print(\"Called tool: \", ev.tool_name, ev.tool_kwargs, \"=>\", ev.tool_output)\n",
|
||||
" elif isinstance(ev, AgentStream): # showing the thought process\n",
|
||||
" print(ev.delta, end=\"\", flush=True)\n",
|
||||
"\n",
|
||||
"resp = await handler\n",
|
||||
"resp"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In a similar fashion, we can pass state and context to the agent.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AgentOutput(response=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='Your name is Bob.')]), tool_calls=[], raw={'id': 'chatcmpl-B5sDHfGpSwsVyzvMVH8EWokYwdIKT', 'choices': [{'delta': {'content': None, 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None}, 'finish_reason': 'stop', 'index': 0, 'logprobs': None}], 'created': 1740739735, 'model': 'gpt-4o-2024-08-06', 'object': 'chat.completion.chunk', 'service_tier': 'default', 'system_fingerprint': 'fp_eb9dce56a8', 'usage': None}, current_agent_name='Agent')"
|
||||
]
|
||||
},
|
||||
"execution_count": 27,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from llama_index.core.workflow import Context\n",
|
||||
"\n",
|
||||
"ctx = Context(agent)\n",
|
||||
"\n",
|
||||
"response = await agent.run(\"My name is Bob.\", ctx=ctx)\n",
|
||||
"response = await agent.run(\"What was my name again?\", ctx=ctx)\n",
|
||||
"response"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating RAG Agents with QueryEngineTools\n",
|
||||
"\n",
|
||||
"Let's now re-use the `QueryEngine` we defined in the [previous unit on tools](/tools.ipynb) and convert it into a `QueryEngineTool`. We will pass it to the `AgentWorkflow` class to create a RAG agent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 46,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import chromadb\n",
|
||||
"\n",
|
||||
"from llama_index.core import VectorStoreIndex\n",
|
||||
"from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI\n",
|
||||
"from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding\n",
|
||||
"from llama_index.core.tools import QueryEngineTool\n",
|
||||
"from llama_index.vector_stores.chroma import ChromaVectorStore\n",
|
||||
"\n",
|
||||
"# Create a vector store\n",
|
||||
"db = chromadb.PersistentClient(path=\"./alfred_chroma_db\")\n",
|
||||
"chroma_collection = db.get_or_create_collection(\"alfred\")\n",
|
||||
"vector_store = ChromaVectorStore(chroma_collection=chroma_collection)\n",
|
||||
"\n",
|
||||
"# Create a query engine\n",
|
||||
"embed_model = HuggingFaceInferenceAPIEmbedding(model_name=\"BAAI/bge-small-en-v1.5\")\n",
|
||||
"llm = HuggingFaceInferenceAPI(model_name=\"Qwen/Qwen2.5-Coder-32B-Instruct\")\n",
|
||||
"index = VectorStoreIndex.from_vector_store(\n",
|
||||
" vector_store=vector_store, embed_model=embed_model\n",
|
||||
")\n",
|
||||
"query_engine = index.as_query_engine(llm=llm)\n",
|
||||
"query_engine_tool = QueryEngineTool.from_defaults(\n",
|
||||
" query_engine=query_engine,\n",
|
||||
" name=\"personas\",\n",
|
||||
" description=\"descriptions for various types of personas\",\n",
|
||||
" return_direct=False,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create a RAG agent\n",
|
||||
"query_engine_agent = AgentWorkflow.from_tools_or_functions(\n",
|
||||
" tools_or_functions=[query_engine_tool],\n",
|
||||
" llm=llm,\n",
|
||||
" system_prompt=\"You are a helpful assistant that has access to a database containing persona descriptions. \",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And, we can once more get the response and reasoning behind the tool calls."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"handler = query_engine_agent.run(\n",
|
||||
" \"Search the database for 'science fiction' and return some persona descriptions.\"\n",
|
||||
")\n",
|
||||
"async for ev in handler.stream_events():\n",
|
||||
" if isinstance(ev, ToolCallResult):\n",
|
||||
" print(\"\")\n",
|
||||
" print(\"Called tool: \", ev.tool_name, ev.tool_kwargs, \"=>\", ev.tool_output)\n",
|
||||
" elif isinstance(ev, AgentStream): # showing the thought process\n",
|
||||
" print(ev.delta, end=\"\", flush=True)\n",
|
||||
"\n",
|
||||
"resp = await handler\n",
|
||||
"resp"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Creating multi-agent systems\n",
|
||||
"\n",
|
||||
"We can also create multi-agent systems by passing multiple agents to the `AgentWorkflow` class."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from llama_index.core.agent.workflow import (\n",
|
||||
" AgentWorkflow,\n",
|
||||
" ReActAgent,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Define some tools\n",
|
||||
"def add(a: int, b: int) -> int:\n",
|
||||
" \"\"\"Add two numbers.\"\"\"\n",
|
||||
" return a + b\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def subtract(a: int, b: int) -> int:\n",
|
||||
" \"\"\"Subtract two numbers.\"\"\"\n",
|
||||
" return a - b\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# Create agent configs\n",
|
||||
"# NOTE: we can use FunctionAgent or ReActAgent here.\n",
|
||||
"# FunctionAgent works for LLMs with a function calling API.\n",
|
||||
"# ReActAgent works for any LLM.\n",
|
||||
"calculator_agent = ReActAgent(\n",
|
||||
" name=\"calculator\",\n",
|
||||
" description=\"Performs basic arithmetic operations\",\n",
|
||||
" system_prompt=\"You are a calculator assistant. Use your tools for any math operation.\",\n",
|
||||
" tools=[add, subtract],\n",
|
||||
" llm=llm,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"query_agent = ReActAgent(\n",
|
||||
" name=\"info_lookup\",\n",
|
||||
" description=\"Looks up information about XYZ\",\n",
|
||||
" system_prompt=\"Use your tool to query a RAG system to answer information about XYZ\",\n",
|
||||
" tools=[query_engine_tool],\n",
|
||||
" llm=llm,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create and run the workflow\n",
|
||||
"agent = AgentWorkflow(agents=[calculator_agent, query_agent], root_agent=\"calculator\")\n",
|
||||
"\n",
|
||||
"# Run the system\n",
|
||||
"handler = agent.run(user_msg=\"Can you add 5 and 3?\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"async for ev in handler.stream_events():\n",
|
||||
" if isinstance(ev, ToolCallResult):\n",
|
||||
" print(\"\")\n",
|
||||
" print(\"Called tool: \", ev.tool_name, ev.tool_kwargs, \"=>\", ev.tool_output)\n",
|
||||
" elif isinstance(ev, AgentStream): # showing the thought process\n",
|
||||
" print(ev.delta, end=\"\", flush=True)\n",
|
||||
"\n",
|
||||
"resp = await handler\n",
|
||||
"resp"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
403
notebooks/unit2/llama-index/components.ipynb
Normal file
403
notebooks/unit2/llama-index/components.ipynb
Normal file
File diff suppressed because one or more lines are too long
229
notebooks/unit2/llama-index/tools.ipynb
Normal file
229
notebooks/unit2/llama-index/tools.ipynb
Normal file
File diff suppressed because one or more lines are too long
395
notebooks/unit2/llama-index/workflows.ipynb
Normal file
395
notebooks/unit2/llama-index/workflows.ipynb
Normal file
@@ -0,0 +1,395 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Workflows in LlamaIndex\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This notebook is part of the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course), a free Course from beginner to expert, where you learn to build Agents.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"## Let's install the dependencies\n",
|
||||
"\n",
|
||||
"We will install the dependencies for this unit."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install llama-index datasets llama-index-callbacks-arize-phoenix llama-index-vector-stores-chroma llama-index-utils-workflow llama-index-llms-huggingface-api pyvis -U -q"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And, let's log in to Hugging Face to use serverless Inference APIs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from huggingface_hub import login\n",
|
||||
"\n",
|
||||
"login()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Basic Workflow Creation\n",
|
||||
"\n",
|
||||
"We can start by creating a simple workflow. We use the `StartEvent` and `StopEvent` classes to define the start and stop of the workflow."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Hello, world!'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from llama_index.core.workflow import StartEvent, StopEvent, Workflow, step\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class MyWorkflow(Workflow):\n",
|
||||
" @step\n",
|
||||
" async def my_step(self, ev: StartEvent) -> StopEvent:\n",
|
||||
" # do something here\n",
|
||||
" return StopEvent(result=\"Hello, world!\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"w = MyWorkflow(timeout=10, verbose=False)\n",
|
||||
"result = await w.run()\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Connecting Multiple Steps\n",
|
||||
"\n",
|
||||
"We can also create multi-step workflows. Here we pass the event information between steps. Note that we can use type hinting to specify the event type and the flow of the workflow."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Finished processing: Step 1 complete'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from llama_index.core.workflow import Event\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class ProcessingEvent(Event):\n",
|
||||
" intermediate_result: str\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class MultiStepWorkflow(Workflow):\n",
|
||||
" @step\n",
|
||||
" async def step_one(self, ev: StartEvent) -> ProcessingEvent:\n",
|
||||
" # Process initial data\n",
|
||||
" return ProcessingEvent(intermediate_result=\"Step 1 complete\")\n",
|
||||
"\n",
|
||||
" @step\n",
|
||||
" async def step_two(self, ev: ProcessingEvent) -> StopEvent:\n",
|
||||
" # Use the intermediate result\n",
|
||||
" final_result = f\"Finished processing: {ev.intermediate_result}\"\n",
|
||||
" return StopEvent(result=final_result)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"w = MultiStepWorkflow(timeout=10, verbose=False)\n",
|
||||
"result = await w.run()\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Loops and Branches\n",
|
||||
"\n",
|
||||
"We can also use type hinting to create branches and loops. Note that we can use the `|` operator to specify that the step can return multiple types."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Good thing happened\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Finished processing: First step complete.'"
|
||||
]
|
||||
},
|
||||
"execution_count": 28,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from llama_index.core.workflow import Event\n",
|
||||
"import random\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class ProcessingEvent(Event):\n",
|
||||
" intermediate_result: str\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class LoopEvent(Event):\n",
|
||||
" loop_output: str\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class MultiStepWorkflow(Workflow):\n",
|
||||
" @step\n",
|
||||
" async def step_one(self, ev: StartEvent) -> ProcessingEvent | LoopEvent:\n",
|
||||
" if random.randint(0, 1) == 0:\n",
|
||||
" print(\"Bad thing happened\")\n",
|
||||
" return LoopEvent(loop_output=\"Back to step one.\")\n",
|
||||
" else:\n",
|
||||
" print(\"Good thing happened\")\n",
|
||||
" return ProcessingEvent(intermediate_result=\"First step complete.\")\n",
|
||||
"\n",
|
||||
" @step\n",
|
||||
" async def step_two(self, ev: ProcessingEvent | LoopEvent) -> StopEvent:\n",
|
||||
" # Use the intermediate result\n",
|
||||
" final_result = f\"Finished processing: {ev.intermediate_result}\"\n",
|
||||
" return StopEvent(result=final_result)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"w = MultiStepWorkflow(verbose=False)\n",
|
||||
"result = await w.run()\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Drawing Workflows\n",
|
||||
"\n",
|
||||
"We can also draw workflows using the `draw_all_possible_flows` function.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<class 'NoneType'>\n",
|
||||
"<class '__main__.ProcessingEvent'>\n",
|
||||
"<class '__main__.LoopEvent'>\n",
|
||||
"<class 'llama_index.core.workflow.events.StopEvent'>\n",
|
||||
"workflow_all_flows.html\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from llama_index.utils.workflow import draw_all_possible_flows\n",
|
||||
"\n",
|
||||
"draw_all_possible_flows(w)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### State Management\n",
|
||||
"\n",
|
||||
"Instead of passing the event information between steps, we can use the `Context` type hint to pass information between steps. \n",
|
||||
"This might be useful for long running workflows, where you want to store information between steps."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Query: What is the capital of France?\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'Finished processing: Step 1 complete'"
|
||||
]
|
||||
},
|
||||
"execution_count": 25,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from llama_index.core.workflow import Event, Context\n",
|
||||
"from llama_index.core.agent.workflow import ReActAgent\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class ProcessingEvent(Event):\n",
|
||||
" intermediate_result: str\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class MultiStepWorkflow(Workflow):\n",
|
||||
" @step\n",
|
||||
" async def step_one(self, ev: StartEvent, ctx: Context) -> ProcessingEvent:\n",
|
||||
" # Process initial data\n",
|
||||
" await ctx.set(\"query\", \"What is the capital of France?\")\n",
|
||||
" return ProcessingEvent(intermediate_result=\"Step 1 complete\")\n",
|
||||
"\n",
|
||||
" @step\n",
|
||||
" async def step_two(self, ev: ProcessingEvent, ctx: Context) -> StopEvent:\n",
|
||||
" # Use the intermediate result\n",
|
||||
" query = await ctx.get(\"query\")\n",
|
||||
" print(f\"Query: {query}\")\n",
|
||||
" final_result = f\"Finished processing: {ev.intermediate_result}\"\n",
|
||||
" return StopEvent(result=final_result)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"w = MultiStepWorkflow(timeout=10, verbose=False)\n",
|
||||
"result = await w.run()\n",
|
||||
"result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Multi-Agent Workflows\n",
|
||||
"\n",
|
||||
"We can also create multi-agent workflows. Here we define two agents, one that multiplies two integers and one that adds two integers."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"AgentOutput(response=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='I have handed off the request to an agent who can help you with adding 5 and 3. Please wait for their response.')]), tool_calls=[ToolCallResult(tool_name='handoff', tool_kwargs={'to_agent': 'addition_agent', 'reason': 'Add 5 and 3'}, tool_id='call_F97vcIcsvZjfAAOBzzIifW3y', tool_output=ToolOutput(content='Agent addition_agent is now handling the request due to the following reason: Add 5 and 3.\\nPlease continue with the current request.', tool_name='handoff', raw_input={'args': (), 'kwargs': {'to_agent': 'addition_agent', 'reason': 'Add 5 and 3'}}, raw_output='Agent addition_agent is now handling the request due to the following reason: Add 5 and 3.\\nPlease continue with the current request.', is_error=False), return_direct=True), ToolCallResult(tool_name='handoff', tool_kwargs={'to_agent': 'addition_agent', 'reason': 'Add 5 and 3'}, tool_id='call_jf49ktFRs09xYdOsnApAk2zz', tool_output=ToolOutput(content='Agent addition_agent is now handling the request due to the following reason: Add 5 and 3.\\nPlease continue with the current request.', tool_name='handoff', raw_input={'args': (), 'kwargs': {'to_agent': 'addition_agent', 'reason': 'Add 5 and 3'}}, raw_output='Agent addition_agent is now handling the request due to the following reason: Add 5 and 3.\\nPlease continue with the current request.', is_error=False), return_direct=True)], raw={'id': 'chatcmpl-B6Cy54VQkvlG3VOrmdzCzgwcJmVOc', 'choices': [{'delta': {'content': None, 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None}, 'finish_reason': 'stop', 'index': 0, 'logprobs': None}], 'created': 1740819517, 'model': 'gpt-3.5-turbo-0125', 'object': 'chat.completion.chunk', 'service_tier': 'default', 'system_fingerprint': None, 'usage': None}, current_agent_name='addition_agent')"
|
||||
]
|
||||
},
|
||||
"execution_count": 33,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI\n",
|
||||
"\n",
|
||||
"# Define some tools\n",
|
||||
"def add(a: int, b: int) -> int:\n",
|
||||
" \"\"\"Add two numbers.\"\"\"\n",
|
||||
" return a + b\n",
|
||||
"\n",
|
||||
"def multiply(a: int, b: int) -> int:\n",
|
||||
" \"\"\"Multiply two numbers.\"\"\"\n",
|
||||
" return a * b\n",
|
||||
"\n",
|
||||
"llm = HuggingFaceInferenceAPI(model_name=\"Qwen/Qwen2.5-Coder-32B-Instruct\")\n",
|
||||
"\n",
|
||||
"# we can pass functions directly without FunctionTool -- the fn/docstring are parsed for the name/description\n",
|
||||
"multiply_agent = ReActAgent(\n",
|
||||
" name=\"multiply_agent\",\n",
|
||||
" description=\"Is able to multiply two integers\",\n",
|
||||
" system_prompt=\"A helpful assistant that can use a tool to multiply numbers.\",\n",
|
||||
" tools=[multiply], \n",
|
||||
" llm=llm,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"addition_agent = ReActAgent(\n",
|
||||
" name=\"add_agent\",\n",
|
||||
" description=\"Is able to add two integers\",\n",
|
||||
" system_prompt=\"A helpful assistant that can use a tool to add numbers.\",\n",
|
||||
" tools=[add], \n",
|
||||
" llm=llm,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create the workflow\n",
|
||||
"workflow = AgentWorkflow(\n",
|
||||
" agents=[multiply_agent, addition_agent],\n",
|
||||
" root_agent=\"multiply_agent\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Run the system\n",
|
||||
"response = await workflow.run(user_msg=\"Can you add 5 and 3?\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": ".venv",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.11"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
10634
notebooks/unit2/smolagents/code_agents.ipynb
Normal file
10634
notebooks/unit2/smolagents/code_agents.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
8639
notebooks/unit2/smolagents/multiagent_notebook.ipynb
Normal file
8639
notebooks/unit2/smolagents/multiagent_notebook.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
2883
notebooks/unit2/smolagents/retrieval_agents.ipynb
Normal file
2883
notebooks/unit2/smolagents/retrieval_agents.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
596
notebooks/unit2/smolagents/tool_calling_agents.ipynb
Normal file
596
notebooks/unit2/smolagents/tool_calling_agents.ipynb
Normal file
@@ -0,0 +1,596 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Pi9CF0391ARI"
|
||||
},
|
||||
"source": [
|
||||
"# Writing actions as code snippets or JSON blobs\n",
|
||||
"\n",
|
||||
"This notebook is part of the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course), a free Course from beginner to expert, where you learn to build Agents.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "9gsYky7F1GzT"
|
||||
},
|
||||
"source": [
|
||||
"## Let's install the dependencies and login to our HF account to access the Inference API\n",
|
||||
"\n",
|
||||
"If you haven't installed `smolagents` yet, you can do so by running the following command:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "MoFopncp0pnJ"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install smolagents -U"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "cH-4W1GhYL4T"
|
||||
},
|
||||
"source": [
|
||||
"Let's also login to the Hugging Face Hub to have access to the Inference API."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "TFTc-ry70y1f"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from huggingface_hub import notebook_login\n",
|
||||
"\n",
|
||||
"notebook_login()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "ekKxaZrd1HlB"
|
||||
},
|
||||
"source": [
|
||||
"## Selecting a Playlist for the Party Using `smolagents` and a `ToolCallingAgent`\n",
|
||||
"\n",
|
||||
"Let's revisit the previous example where Alfred started party preparations, but this time we'll use a `ToolCallingAgent` to highlight the difference. We'll build an agent that can search the web using DuckDuckGo, just like in our Code Agent example. The only difference is the agent type - the framework handles everything else:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/",
|
||||
"height": 1000
|
||||
},
|
||||
"id": "6IInDOUN01sP",
|
||||
"outputId": "e49f2360-d377-4ed8-b7ae-8da4a3e3757b"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #d4b702; text-decoration-color: #d4b702\">╭──────────────────────────────────────────────────── </span><span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">New run</span><span style=\"color: #d4b702; text-decoration-color: #d4b702\"> ────────────────────────────────────────────────────╮</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span> <span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span> <span style=\"font-weight: bold\">Search for the best music recommendations for a party at the Wayne's mansion.</span> <span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span> <span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702\">╰─ HfApiModel - Qwen/Qwen2.5-Coder-32B-Instruct ──────────────────────────────────────────────────────────────────╯</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[38;2;212;183;2m╭─\u001b[0m\u001b[38;2;212;183;2m───────────────────────────────────────────────────\u001b[0m\u001b[38;2;212;183;2m \u001b[0m\u001b[1;38;2;212;183;2mNew run\u001b[0m\u001b[38;2;212;183;2m \u001b[0m\u001b[38;2;212;183;2m───────────────────────────────────────────────────\u001b[0m\u001b[38;2;212;183;2m─╮\u001b[0m\n",
|
||||
"\u001b[38;2;212;183;2m│\u001b[0m \u001b[38;2;212;183;2m│\u001b[0m\n",
|
||||
"\u001b[38;2;212;183;2m│\u001b[0m \u001b[1mSearch for the best music recommendations for a party at the Wayne's mansion.\u001b[0m \u001b[38;2;212;183;2m│\u001b[0m\n",
|
||||
"\u001b[38;2;212;183;2m│\u001b[0m \u001b[38;2;212;183;2m│\u001b[0m\n",
|
||||
"\u001b[38;2;212;183;2m╰─\u001b[0m\u001b[38;2;212;183;2m HfApiModel - Qwen/Qwen2.5-Coder-32B-Instruct \u001b[0m\u001b[38;2;212;183;2m─────────────────────────────────────────────────────────────────\u001b[0m\u001b[38;2;212;183;2m─╯\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #d4b702; text-decoration-color: #d4b702\">━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ </span><span style=\"font-weight: bold\">Step </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span><span style=\"color: #d4b702; text-decoration-color: #d4b702\"> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[38;2;212;183;2m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \u001b[0m\u001b[1mStep \u001b[0m\u001b[1;36m1\u001b[0m\u001b[38;2;212;183;2m ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/usr/local/lib/python3.11/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n",
|
||||
"The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
|
||||
"To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
|
||||
"You will be able to reuse this secret in all of your notebooks.\n",
|
||||
"Please note that authentication is recommended but still optional to access public models or datasets.\n",
|
||||
" warnings.warn(\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n",
|
||||
"│ Calling tool: 'web_search' with arguments: {'query': \"best music recommendations for a party at Wayne's │\n",
|
||||
"│ mansion\"} │\n",
|
||||
"╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n",
|
||||
"│ Calling tool: 'web_search' with arguments: {'query': \"best music recommendations for a party at Wayne's │\n",
|
||||
"│ mansion\"} │\n",
|
||||
"╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Observations: ## Search Results\n",
|
||||
"\n",
|
||||
"|The <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">75</span> Best Party Songs That Will Get Everyone Dancing - \n",
|
||||
"Gear4music<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.gear4music.com/blog/best-party-songs/)</span>\n",
|
||||
"The best party songs <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>. <span style=\"color: #008000; text-decoration-color: #008000\">\"September\"</span> - Earth, Wind & Fire <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1978</span><span style=\"font-weight: bold\">)</span> Quite possibly the best party song. An infectious \n",
|
||||
"mix of funk and soul, <span style=\"color: #008000; text-decoration-color: #008000\">\"September\"</span> is celebrated for its upbeat melody and <span style=\"color: #008000; text-decoration-color: #008000\">\"ba-dee-ya\"</span> chorus, making it a timeless \n",
|
||||
"dance favorite.\n",
|
||||
"\n",
|
||||
"|Wedding Party Entrance Songs to Get the Party Started - The Mansion \n",
|
||||
"<span style=\"color: #808000; text-decoration-color: #808000\">...</span><span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://mansiononmainstreet.com/wedding-party-entrance-songs-to-get-the-party-started/)</span>\n",
|
||||
"Best Wedding Party Entrance Songs. No matter what vibe you're going for, there are some wedding party entrance \n",
|
||||
"songs that are guaranteed to be a hit with people. From the latest music from Justin Timberlake to oldies but \n",
|
||||
"goodies, most of your guests will be familiar with the popular wedding party entrance songs listed below.\n",
|
||||
"\n",
|
||||
"|<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">50</span> Songs on Every Event Planner's Playlist - \n",
|
||||
"Eventbrite<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.eventbrite.com/blog/event-planning-playlist-ds00/)</span>\n",
|
||||
"Music sets the mood and provides the soundtrack <span style=\"font-weight: bold\">(</span>literally<span style=\"font-weight: bold\">)</span> for a memorable and exciting time. While the right \n",
|
||||
"songs can enhance the experience, the wrong event music can throw off the vibe. For example, fast-paced songs \n",
|
||||
"probably aren't the best fit for a formal gala. And smooth jazz is likely to lull your guests at a motivational \n",
|
||||
"conference.\n",
|
||||
"\n",
|
||||
"|<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">200</span> Classic House Party Songs Everyone Knows | The Best <span style=\"color: #808000; text-decoration-color: #808000\">...</span> - \n",
|
||||
"iSpyTunes<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.ispytunes.com/post/house-party-songs)</span>\n",
|
||||
"\" Branded merchandise adds flair to any occasion, just like the perfect playlist. <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">200</span> classic house party songs \n",
|
||||
"everyone knows set the mood, bringing energy to every celebration. The best popular party hits keep guests dancing,\n",
|
||||
"creating unforgettable moments. From throwback anthems to modern beats, a great selection ensures nonstop fun.\n",
|
||||
"\n",
|
||||
"|The Best Songs For Parties - The Ambient Mixer \n",
|
||||
"Blog<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://blog.ambient-mixer.com/usage/parties-2/the-best-songs-for-parties/)</span>\n",
|
||||
"The <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span> best party songs ever made. Top <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span> Best Party Songs Of All Time. Of course, these are just <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span> of the many \n",
|
||||
"available playlists to choose from. However, these two contain some of the most popular ones most people usually \n",
|
||||
"end up using. If these are not the type of songs you or your guests might enjoy then simply follow the steps in the\n",
|
||||
"<span style=\"color: #808000; text-decoration-color: #808000\">...</span>\n",
|
||||
"\n",
|
||||
"|Passaic County Parks & Recreation: Music at the \n",
|
||||
"Mansion<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://passaiccountynj.myrec.com/info/activities/program_details.aspx?ProgramID=29909)</span>\n",
|
||||
"Thursdays from <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">7</span> to <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">9</span> PM the finest local bands will be playing music while In the Drink restaurant sells food and \n",
|
||||
"drinks on site. September 3rd: Norton Smull Band; Parking is limited at the Dey Mansion <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">209</span> Totowa Rd. Wayne, NJ. \n",
|
||||
"Overflow parking will be at the Preakness Valley Golf Course. You may drop off your guests at the Mansion first.\n",
|
||||
"\n",
|
||||
"|Grand Entrance Songs | SOUNDfonix<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://soundfonixent.com/resources/reception-song-ideas/grand-entrance-songs/)</span>\n",
|
||||
"The entrance song sets the tone for the rest of the dance and the evening. Choose your entrance song wisely.\n",
|
||||
"\n",
|
||||
"|Party Music Guide: Ultimate Tips for the Perfect \n",
|
||||
"Playlist<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://thebackstage-deezer.com/music/perfect-party-music-playlist/)</span>\n",
|
||||
"Check out the best party playlists and top party songs to ensure your next party is packed! The most popular party \n",
|
||||
"songs are here, just hit play. <span style=\"color: #808000; text-decoration-color: #808000\">...</span> to decor. But, most of all, you need to have fantastic music. We recommend you \n",
|
||||
"get at least three hours' worth of party music queued and ready — that's about <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">75</span> songs. Lucky for you, we've <span style=\"color: #808000; text-decoration-color: #808000\">...</span>\n",
|
||||
"\n",
|
||||
"|The Top <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">100</span> Best Party Songs of All Time - \n",
|
||||
"LiveAbout<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.liveabout.com/top-best-party-songs-of-all-time-3248355)</span>\n",
|
||||
"<span style=\"color: #008000; text-decoration-color: #008000\">\"Macarena\"</span> then spent <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">14</span> weeks at No. <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span> on the U.S. pop singles chart. For more than a year this was one of the \n",
|
||||
"most popular special event songs in the United States. It still works well as a charming party song encouraging \n",
|
||||
"everyone to join in on the simple dance.\n",
|
||||
"\n",
|
||||
"|<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">70</span> Best Piano Bar Songs You Should Request<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.pianoarea.com/best-piano-bar-songs/)</span>\n",
|
||||
"Best Piano Bar Songs You Should Request <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>. <span style=\"color: #008000; text-decoration-color: #008000\">\"Piano Man\"</span> by Billy Joel. One of the top recommendations for piano bar \n",
|
||||
"songs is <span style=\"color: #008000; text-decoration-color: #008000\">\"Piano Man\"</span> by Billy Joel.. This iconic track was released by Columbia Records in <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1973</span>.. As part of the \n",
|
||||
"album titled <span style=\"color: #008000; text-decoration-color: #008000\">'Piano Man,'</span> it's one of Billy Joel's most recognizable works.. The song spins a captivating narrative\n",
|
||||
"and showcases Joe's compelling <span style=\"color: #808000; text-decoration-color: #808000\">...</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"Observations: ## Search Results\n",
|
||||
"\n",
|
||||
"|The \u001b[1;36m75\u001b[0m Best Party Songs That Will Get Everyone Dancing - \n",
|
||||
"Gear4music\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.gear4music.com/blog/best-party-songs/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"The best party songs \u001b[1;36m1\u001b[0m. \u001b[32m\"September\"\u001b[0m - Earth, Wind & Fire \u001b[1m(\u001b[0m\u001b[1;36m1978\u001b[0m\u001b[1m)\u001b[0m Quite possibly the best party song. An infectious \n",
|
||||
"mix of funk and soul, \u001b[32m\"September\"\u001b[0m is celebrated for its upbeat melody and \u001b[32m\"ba-dee-ya\"\u001b[0m chorus, making it a timeless \n",
|
||||
"dance favorite.\n",
|
||||
"\n",
|
||||
"|Wedding Party Entrance Songs to Get the Party Started - The Mansion \n",
|
||||
"\u001b[33m...\u001b[0m\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://mansiononmainstreet.com/wedding-party-entrance-songs-to-get-the-party-started/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"Best Wedding Party Entrance Songs. No matter what vibe you're going for, there are some wedding party entrance \n",
|
||||
"songs that are guaranteed to be a hit with people. From the latest music from Justin Timberlake to oldies but \n",
|
||||
"goodies, most of your guests will be familiar with the popular wedding party entrance songs listed below.\n",
|
||||
"\n",
|
||||
"|\u001b[1;36m50\u001b[0m Songs on Every Event Planner's Playlist - \n",
|
||||
"Eventbrite\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.eventbrite.com/blog/event-planning-playlist-ds00/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"Music sets the mood and provides the soundtrack \u001b[1m(\u001b[0mliterally\u001b[1m)\u001b[0m for a memorable and exciting time. While the right \n",
|
||||
"songs can enhance the experience, the wrong event music can throw off the vibe. For example, fast-paced songs \n",
|
||||
"probably aren't the best fit for a formal gala. And smooth jazz is likely to lull your guests at a motivational \n",
|
||||
"conference.\n",
|
||||
"\n",
|
||||
"|\u001b[1;36m200\u001b[0m Classic House Party Songs Everyone Knows | The Best \u001b[33m...\u001b[0m - \n",
|
||||
"iSpyTunes\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.ispytunes.com/post/house-party-songs\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"\" Branded merchandise adds flair to any occasion, just like the perfect playlist. \u001b[1;36m200\u001b[0m classic house party songs \n",
|
||||
"everyone knows set the mood, bringing energy to every celebration. The best popular party hits keep guests dancing,\n",
|
||||
"creating unforgettable moments. From throwback anthems to modern beats, a great selection ensures nonstop fun.\n",
|
||||
"\n",
|
||||
"|The Best Songs For Parties - The Ambient Mixer \n",
|
||||
"Blog\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://blog.ambient-mixer.com/usage/parties-2/the-best-songs-for-parties/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"The \u001b[1;36m100\u001b[0m best party songs ever made. Top \u001b[1;36m100\u001b[0m Best Party Songs Of All Time. Of course, these are just \u001b[1;36m2\u001b[0m of the many \n",
|
||||
"available playlists to choose from. However, these two contain some of the most popular ones most people usually \n",
|
||||
"end up using. If these are not the type of songs you or your guests might enjoy then simply follow the steps in the\n",
|
||||
"\u001b[33m...\u001b[0m\n",
|
||||
"\n",
|
||||
"|Passaic County Parks & Recreation: Music at the \n",
|
||||
"Mansion\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://passaiccountynj.myrec.com/info/activities/program_details.aspx?\u001b[0m\u001b[4;94mProgramID\u001b[0m\u001b[4;94m=\u001b[0m\u001b[4;94m29909\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"Thursdays from \u001b[1;36m7\u001b[0m to \u001b[1;36m9\u001b[0m PM the finest local bands will be playing music while In the Drink restaurant sells food and \n",
|
||||
"drinks on site. September 3rd: Norton Smull Band; Parking is limited at the Dey Mansion \u001b[1;36m209\u001b[0m Totowa Rd. Wayne, NJ. \n",
|
||||
"Overflow parking will be at the Preakness Valley Golf Course. You may drop off your guests at the Mansion first.\n",
|
||||
"\n",
|
||||
"|Grand Entrance Songs | SOUNDfonix\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://soundfonixent.com/resources/reception-song-ideas/grand-entrance-songs/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"The entrance song sets the tone for the rest of the dance and the evening. Choose your entrance song wisely.\n",
|
||||
"\n",
|
||||
"|Party Music Guide: Ultimate Tips for the Perfect \n",
|
||||
"Playlist\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://thebackstage-deezer.com/music/perfect-party-music-playlist/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"Check out the best party playlists and top party songs to ensure your next party is packed! The most popular party \n",
|
||||
"songs are here, just hit play. \u001b[33m...\u001b[0m to decor. But, most of all, you need to have fantastic music. We recommend you \n",
|
||||
"get at least three hours' worth of party music queued and ready — that's about \u001b[1;36m75\u001b[0m songs. Lucky for you, we've \u001b[33m...\u001b[0m\n",
|
||||
"\n",
|
||||
"|The Top \u001b[1;36m100\u001b[0m Best Party Songs of All Time - \n",
|
||||
"LiveAbout\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.liveabout.com/top-best-party-songs-of-all-time-3248355\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"\u001b[32m\"Macarena\"\u001b[0m then spent \u001b[1;36m14\u001b[0m weeks at No. \u001b[1;36m1\u001b[0m on the U.S. pop singles chart. For more than a year this was one of the \n",
|
||||
"most popular special event songs in the United States. It still works well as a charming party song encouraging \n",
|
||||
"everyone to join in on the simple dance.\n",
|
||||
"\n",
|
||||
"|\u001b[1;36m70\u001b[0m Best Piano Bar Songs You Should Request\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.pianoarea.com/best-piano-bar-songs/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"Best Piano Bar Songs You Should Request \u001b[1;36m1\u001b[0m. \u001b[32m\"Piano Man\"\u001b[0m by Billy Joel. One of the top recommendations for piano bar \n",
|
||||
"songs is \u001b[32m\"Piano Man\"\u001b[0m by Billy Joel.. This iconic track was released by Columbia Records in \u001b[1;36m1973\u001b[0m.. As part of the \n",
|
||||
"album titled \u001b[32m'Piano Man,'\u001b[0m it's one of Billy Joel's most recognizable works.. The song spins a captivating narrative\n",
|
||||
"and showcases Joe's compelling \u001b[33m...\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">[Step 0: Duration 4.70 seconds| Input tokens: 1,174 | Output tokens: 26]</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[2m[Step 0: Duration 4.70 seconds| Input tokens: 1,174 | Output tokens: 26]\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #d4b702; text-decoration-color: #d4b702\">━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ </span><span style=\"font-weight: bold\">Step </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span><span style=\"color: #d4b702; text-decoration-color: #d4b702\"> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[38;2;212;183;2m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \u001b[0m\u001b[1mStep \u001b[0m\u001b[1;36m2\u001b[0m\u001b[38;2;212;183;2m ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n",
|
||||
"│ Calling tool: 'web_search' with arguments: {'query': 'best party songs for a mansion late-night event'} │\n",
|
||||
"╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n",
|
||||
"│ Calling tool: 'web_search' with arguments: {'query': 'best party songs for a mansion late-night event'} │\n",
|
||||
"╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">Observations: ## Search Results\n",
|
||||
"\n",
|
||||
"|The <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">75</span> Best Party Songs That Will Get Everyone Dancing - \n",
|
||||
"Gear4music<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.gear4music.com/blog/best-party-songs/)</span>\n",
|
||||
"The best party songs <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span>. <span style=\"color: #008000; text-decoration-color: #008000\">\"September\"</span> - Earth, Wind & Fire <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1978</span><span style=\"font-weight: bold\">)</span> Quite possibly the best party song. An infectious \n",
|
||||
"mix of funk and soul, <span style=\"color: #008000; text-decoration-color: #008000\">\"September\"</span> is celebrated for its upbeat melody and <span style=\"color: #008000; text-decoration-color: #008000\">\"ba-dee-ya\"</span> chorus, making it a timeless \n",
|
||||
"dance favorite.\n",
|
||||
"\n",
|
||||
"|<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">45</span> Songs That Get Your Event Guests on the Dance Floor Every \n",
|
||||
"Time<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://hub.theeventplannerexpo.com/entertainment/35-songs-that-get-your-event-guests-on-the-dance-floor-ever</span>\n",
|
||||
"<span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">y-time)</span>\n",
|
||||
"You'll know your client's event best, including music genre preferences and styles. But these songs are wildly \n",
|
||||
"popular among many generations and are always great to have on standby should your dance guests need a boost. Party\n",
|
||||
"Songs <span style=\"color: #008000; text-decoration-color: #008000\">\"Flowers\"</span> by Miley Cyrus <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2023</span><span style=\"font-weight: bold\">)</span> <span style=\"color: #008000; text-decoration-color: #008000\">\"TQG\"</span> by KAROL G & Shakira <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2023</span><span style=\"font-weight: bold\">)</span> <span style=\"color: #008000; text-decoration-color: #008000\">\"TRUSTFALL\"</span> by P!nk <span style=\"font-weight: bold\">(</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2023</span><span style=\"font-weight: bold\">)</span>\n",
|
||||
"\n",
|
||||
"|Top <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">200</span> Most Requested Songs - DJ Event Planner<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://djeventplanner.com/mostrequested.htm)</span>\n",
|
||||
"Based on over <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span> million requests using the DJ Event Planner song request system, this is a list of the most \n",
|
||||
"requested songs of the past year. <span style=\"color: #808000; text-decoration-color: #808000\">...</span> December <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1963</span> <span style=\"font-weight: bold\">(</span>Oh, What A Night<span style=\"font-weight: bold\">)</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">24</span>: Commodores: Brick House: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">25</span>: Earth, Wind\n",
|
||||
"and Fire: Boogie Wonderland: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">26</span>: Elton John: Your Song: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">27</span>: Stevie Wonder: Isn't She Lovely: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">28</span>: <span style=\"color: #808000; text-decoration-color: #808000\">...</span> Grove St. \n",
|
||||
"Party: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">30</span> <span style=\"color: #808000; text-decoration-color: #808000\">...</span>\n",
|
||||
"\n",
|
||||
"|<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">50</span> Songs on Every Event Planner's Playlist - \n",
|
||||
"Eventbrite<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.eventbrite.com/blog/event-planning-playlist-ds00/)</span>\n",
|
||||
"For example, fast-paced songs probably aren't the best fit for a formal gala. And smooth jazz is likely to lull \n",
|
||||
"your guests at a motivational conference. That's why it's crucial to think about the tone you want to set and \n",
|
||||
"choose a playlist that embodies it. We've compiled a list of possible tunes to help you pick the best event songs.\n",
|
||||
"\n",
|
||||
"|The Best Party Songs of All Time <span style=\"font-weight: bold\">(</span>Our Playlists<span style=\"font-weight: bold\">)](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.ispytunes.com/post/best-party-songs)</span>\n",
|
||||
"Discover the best party songs to make your event unforgettable! Our playlists feature the top party songs, from \n",
|
||||
"timeless classics to the latest hits. <span style=\"color: #808000; text-decoration-color: #808000\">...</span> Last Friday Night by Katy Perry. Sweet Child O' Mine by Guns N' Roses. I \n",
|
||||
"Gotta Feeling by the Black Eyed Peas. <span style=\"color: #808000; text-decoration-color: #808000\">...</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">200</span> Classic House Party Songs Everyone Knows | The Best Popular Party \n",
|
||||
"Songs <span style=\"color: #808000; text-decoration-color: #808000\">...</span>\n",
|
||||
"\n",
|
||||
"|<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">15</span> Best Party Songs of All Time - Singersroom.com<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://singersroom.com/w75/best-party-songs-of-all-time/)</span>\n",
|
||||
"Whether it's a wild club night, a backyard BBQ, or a house party with friends, the best party songs bring people \n",
|
||||
"together, get them moving, and keep the good vibes flowing all night long.\n",
|
||||
"\n",
|
||||
"|Best Songs To Party: DJ's Ultimate Party Songs Playlist - \n",
|
||||
"Top40Weekly.com<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://top40weekly.com/best-songs-to-party/)</span>\n",
|
||||
"<span style=\"color: #008000; text-decoration-color: #008000\">\"Jump Around\"</span> by House of Pain is a classic party anthem that has stood the test of time, remaining a staple at \n",
|
||||
"parties and sporting events for over two decades. The song's energetic rap verses, pulsating rhythm, and catchy \n",
|
||||
"chorus create an atmosphere of pure excitement and exhilaration that never fails to ignite the dance floor.\n",
|
||||
"\n",
|
||||
"|<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">50</span>+ Best Songs For Your Next Party in <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2025</span> - Aleka's \n",
|
||||
"Get-Together<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://alekasgettogether.com/top-songs-for-any-party/)</span>\n",
|
||||
"A perfect, high-energy track to keep the party vibe strong all night. Last Friday Night <span style=\"font-weight: bold\">(</span>T.G.I.F.<span style=\"font-weight: bold\">)</span> - Katy Perry \n",
|
||||
"This upbeat pop anthem is a must-play to keep the energy light and fun. Bleeding Love - Leona Lewis A heartfelt \n",
|
||||
"ballad that balances out the upbeat tracks with an emotional sing-along. Crank That <span style=\"font-weight: bold\">(</span>Soulja Boy<span style=\"font-weight: bold\">)</span> - Soulja Boy Tell \n",
|
||||
"'Em\n",
|
||||
"\n",
|
||||
"|<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">27</span> Most Influential Songs About Parties & Celebrations <span style=\"font-weight: bold\">(</span>Must Hear<span style=\"font-weight: bold\">)](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.pdmusic.org/songs-about-parties/)</span>\n",
|
||||
"Contents. <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span> <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">27</span> Most Famous Songs About Parties, Partying & Drinking With Friend <span style=\"font-weight: bold\">(</span>Ultimate Playlist<span style=\"font-weight: bold\">)</span>; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span> #<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span> <span style=\"color: #008000; text-decoration-color: #008000\">\"Party in</span>\n",
|
||||
"<span style=\"color: #008000; text-decoration-color: #008000\">the U.S.A.\"</span> by Miley Cyrus; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span> #<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span> <span style=\"color: #008000; text-decoration-color: #008000\">\"I Gotta Feeling\"</span> by The Black Eyed Peas; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">4</span> #<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span> <span style=\"color: #008000; text-decoration-color: #008000\">\"Party Rock Anthem\"</span> by LMFAO; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span> #<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">4</span> \n",
|
||||
"<span style=\"color: #008000; text-decoration-color: #008000\">\"Last Friday Night (T.G.I.F.)\"</span> by Katy Perry; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">6</span> #<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span> <span style=\"color: #008000; text-decoration-color: #008000\">\"Dancing Queen\"</span> by ABBA; <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">7</span> #<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">6</span> <span style=\"color: #008000; text-decoration-color: #008000\">\"Turn Down for What\"</span> by DJ Snake &\n",
|
||||
"Lil Jon\n",
|
||||
"\n",
|
||||
"|<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">40</span> Best Party Songs | Songs To Dance To, Ranked By Our Editors - Time \n",
|
||||
"Out<span style=\"font-weight: bold\">](</span><span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.timeout.com/music/best-party-songs)</span>\n",
|
||||
"The best is when you go for the extended version, and find yourself in the midst of the intro for about <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">11</span> minutes.\n",
|
||||
"But whichever version you go for, this is the party song, in every way. She <span style=\"color: #808000; text-decoration-color: #808000\">...</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"Observations: ## Search Results\n",
|
||||
"\n",
|
||||
"|The \u001b[1;36m75\u001b[0m Best Party Songs That Will Get Everyone Dancing - \n",
|
||||
"Gear4music\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.gear4music.com/blog/best-party-songs/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"The best party songs \u001b[1;36m1\u001b[0m. \u001b[32m\"September\"\u001b[0m - Earth, Wind & Fire \u001b[1m(\u001b[0m\u001b[1;36m1978\u001b[0m\u001b[1m)\u001b[0m Quite possibly the best party song. An infectious \n",
|
||||
"mix of funk and soul, \u001b[32m\"September\"\u001b[0m is celebrated for its upbeat melody and \u001b[32m\"ba-dee-ya\"\u001b[0m chorus, making it a timeless \n",
|
||||
"dance favorite.\n",
|
||||
"\n",
|
||||
"|\u001b[1;36m45\u001b[0m Songs That Get Your Event Guests on the Dance Floor Every \n",
|
||||
"Time\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://hub.theeventplannerexpo.com/entertainment/35-songs-that-get-your-event-guests-on-the-dance-floor-ever\u001b[0m\n",
|
||||
"\u001b[4;94my-time\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"You'll know your client's event best, including music genre preferences and styles. But these songs are wildly \n",
|
||||
"popular among many generations and are always great to have on standby should your dance guests need a boost. Party\n",
|
||||
"Songs \u001b[32m\"Flowers\"\u001b[0m by Miley Cyrus \u001b[1m(\u001b[0m\u001b[1;36m2023\u001b[0m\u001b[1m)\u001b[0m \u001b[32m\"TQG\"\u001b[0m by KAROL G & Shakira \u001b[1m(\u001b[0m\u001b[1;36m2023\u001b[0m\u001b[1m)\u001b[0m \u001b[32m\"TRUSTFALL\"\u001b[0m by P!nk \u001b[1m(\u001b[0m\u001b[1;36m2023\u001b[0m\u001b[1m)\u001b[0m\n",
|
||||
"\n",
|
||||
"|Top \u001b[1;36m200\u001b[0m Most Requested Songs - DJ Event Planner\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://djeventplanner.com/mostrequested.htm\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"Based on over \u001b[1;36m2\u001b[0m million requests using the DJ Event Planner song request system, this is a list of the most \n",
|
||||
"requested songs of the past year. \u001b[33m...\u001b[0m December \u001b[1;36m1963\u001b[0m \u001b[1m(\u001b[0mOh, What A Night\u001b[1m)\u001b[0m \u001b[1;36m24\u001b[0m: Commodores: Brick House: \u001b[1;36m25\u001b[0m: Earth, Wind\n",
|
||||
"and Fire: Boogie Wonderland: \u001b[1;36m26\u001b[0m: Elton John: Your Song: \u001b[1;36m27\u001b[0m: Stevie Wonder: Isn't She Lovely: \u001b[1;36m28\u001b[0m: \u001b[33m...\u001b[0m Grove St. \n",
|
||||
"Party: \u001b[1;36m30\u001b[0m \u001b[33m...\u001b[0m\n",
|
||||
"\n",
|
||||
"|\u001b[1;36m50\u001b[0m Songs on Every Event Planner's Playlist - \n",
|
||||
"Eventbrite\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.eventbrite.com/blog/event-planning-playlist-ds00/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"For example, fast-paced songs probably aren't the best fit for a formal gala. And smooth jazz is likely to lull \n",
|
||||
"your guests at a motivational conference. That's why it's crucial to think about the tone you want to set and \n",
|
||||
"choose a playlist that embodies it. We've compiled a list of possible tunes to help you pick the best event songs.\n",
|
||||
"\n",
|
||||
"|The Best Party Songs of All Time \u001b[1m(\u001b[0mOur Playlists\u001b[1m)\u001b[0m\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.ispytunes.com/post/best-party-songs\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"Discover the best party songs to make your event unforgettable! Our playlists feature the top party songs, from \n",
|
||||
"timeless classics to the latest hits. \u001b[33m...\u001b[0m Last Friday Night by Katy Perry. Sweet Child O' Mine by Guns N' Roses. I \n",
|
||||
"Gotta Feeling by the Black Eyed Peas. \u001b[33m...\u001b[0m \u001b[1;36m200\u001b[0m Classic House Party Songs Everyone Knows | The Best Popular Party \n",
|
||||
"Songs \u001b[33m...\u001b[0m\n",
|
||||
"\n",
|
||||
"|\u001b[1;36m15\u001b[0m Best Party Songs of All Time - Singersroom.com\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://singersroom.com/w75/best-party-songs-of-all-time/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"Whether it's a wild club night, a backyard BBQ, or a house party with friends, the best party songs bring people \n",
|
||||
"together, get them moving, and keep the good vibes flowing all night long.\n",
|
||||
"\n",
|
||||
"|Best Songs To Party: DJ's Ultimate Party Songs Playlist - \n",
|
||||
"Top40Weekly.com\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://top40weekly.com/best-songs-to-party/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"\u001b[32m\"Jump Around\"\u001b[0m by House of Pain is a classic party anthem that has stood the test of time, remaining a staple at \n",
|
||||
"parties and sporting events for over two decades. The song's energetic rap verses, pulsating rhythm, and catchy \n",
|
||||
"chorus create an atmosphere of pure excitement and exhilaration that never fails to ignite the dance floor.\n",
|
||||
"\n",
|
||||
"|\u001b[1;36m50\u001b[0m+ Best Songs For Your Next Party in \u001b[1;36m2025\u001b[0m - Aleka's \n",
|
||||
"Get-Together\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://alekasgettogether.com/top-songs-for-any-party/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"A perfect, high-energy track to keep the party vibe strong all night. Last Friday Night \u001b[1m(\u001b[0mT.G.I.F.\u001b[1m)\u001b[0m - Katy Perry \n",
|
||||
"This upbeat pop anthem is a must-play to keep the energy light and fun. Bleeding Love - Leona Lewis A heartfelt \n",
|
||||
"ballad that balances out the upbeat tracks with an emotional sing-along. Crank That \u001b[1m(\u001b[0mSoulja Boy\u001b[1m)\u001b[0m - Soulja Boy Tell \n",
|
||||
"'Em\n",
|
||||
"\n",
|
||||
"|\u001b[1;36m27\u001b[0m Most Influential Songs About Parties & Celebrations \u001b[1m(\u001b[0mMust Hear\u001b[1m)\u001b[0m\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.pdmusic.org/songs-about-parties/\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"Contents. \u001b[1;36m1\u001b[0m \u001b[1;36m27\u001b[0m Most Famous Songs About Parties, Partying & Drinking With Friend \u001b[1m(\u001b[0mUltimate Playlist\u001b[1m)\u001b[0m; \u001b[1;36m2\u001b[0m #\u001b[1;36m1\u001b[0m \u001b[32m\"Party in\u001b[0m\n",
|
||||
"\u001b[32mthe U.S.A.\"\u001b[0m by Miley Cyrus; \u001b[1;36m3\u001b[0m #\u001b[1;36m2\u001b[0m \u001b[32m\"I Gotta Feeling\"\u001b[0m by The Black Eyed Peas; \u001b[1;36m4\u001b[0m #\u001b[1;36m3\u001b[0m \u001b[32m\"Party Rock Anthem\"\u001b[0m by LMFAO; \u001b[1;36m5\u001b[0m #\u001b[1;36m4\u001b[0m \n",
|
||||
"\u001b[32m\"Last Friday Night \u001b[0m\u001b[32m(\u001b[0m\u001b[32mT.G.I.F.\u001b[0m\u001b[32m)\u001b[0m\u001b[32m\"\u001b[0m by Katy Perry; \u001b[1;36m6\u001b[0m #\u001b[1;36m5\u001b[0m \u001b[32m\"Dancing Queen\"\u001b[0m by ABBA; \u001b[1;36m7\u001b[0m #\u001b[1;36m6\u001b[0m \u001b[32m\"Turn Down for What\"\u001b[0m by DJ Snake &\n",
|
||||
"Lil Jon\n",
|
||||
"\n",
|
||||
"|\u001b[1;36m40\u001b[0m Best Party Songs | Songs To Dance To, Ranked By Our Editors - Time \n",
|
||||
"Out\u001b[1m]\u001b[0m\u001b[1m(\u001b[0m\u001b[4;94mhttps://www.timeout.com/music/best-party-songs\u001b[0m\u001b[4;94m)\u001b[0m\n",
|
||||
"The best is when you go for the extended version, and find yourself in the midst of the intro for about \u001b[1;36m11\u001b[0m minutes.\n",
|
||||
"But whichever version you go for, this is the party song, in every way. She \u001b[33m...\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">[Step 1: Duration 6.66 seconds| Input tokens: 3,435 | Output tokens: 55]</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[2m[Step 1: Duration 6.66 seconds| Input tokens: 3,435 | Output tokens: 55]\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #d4b702; text-decoration-color: #d4b702\">━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ </span><span style=\"font-weight: bold\">Step </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span><span style=\"color: #d4b702; text-decoration-color: #d4b702\"> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[38;2;212;183;2m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \u001b[0m\u001b[1mStep \u001b[0m\u001b[1;36m3\u001b[0m\u001b[38;2;212;183;2m ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n",
|
||||
"│ Calling tool: 'final_answer' with arguments: {'answer': \"For a party at Wayne's mansion, consider playing a mix │\n",
|
||||
"│ of classic party hits and modern anthems to cater to various age groups. A recommended playlist might include │\n",
|
||||
"│ songs like 'September' by Earth, Wind & Fire, 'I Gotta Feeling' by The Black Eyed Peas, 'Last Friday Night │\n",
|
||||
"│ (T.G.I.F.)' by Katy Perry, 'Dancing Queen' by ABBA, 'Turn Down for What' by DJ Snake & Lil Jon, and 'Crank That │\n",
|
||||
"│ (Soulja Boy)' by Soulja Boy Tell 'Em. These songs are known to get everyone dancing and celebrating!\"} │\n",
|
||||
"╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n",
|
||||
"│ Calling tool: 'final_answer' with arguments: {'answer': \"For a party at Wayne's mansion, consider playing a mix │\n",
|
||||
"│ of classic party hits and modern anthems to cater to various age groups. A recommended playlist might include │\n",
|
||||
"│ songs like 'September' by Earth, Wind & Fire, 'I Gotta Feeling' by The Black Eyed Peas, 'Last Friday Night │\n",
|
||||
"│ (T.G.I.F.)' by Katy Perry, 'Dancing Queen' by ABBA, 'Turn Down for What' by DJ Snake & Lil Jon, and 'Crank That │\n",
|
||||
"│ (Soulja Boy)' by Soulja Boy Tell 'Em. These songs are known to get everyone dancing and celebrating!\"} │\n",
|
||||
"╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">Final answer: For a party at Wayne's mansion, consider playing a mix of classic party hits and modern anthems to </span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">cater to various age groups. A recommended playlist might include songs like 'September' by Earth, Wind & Fire, 'I </span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">Gotta Feeling' by The Black Eyed Peas, 'Last Friday Night (T.G.I.F.)' by Katy Perry, 'Dancing Queen' by ABBA, 'Turn</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">Down for What' by DJ Snake & Lil Jon, and 'Crank That (Soulja Boy)' by Soulja Boy Tell 'Em. These songs are known </span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">to get everyone dancing and celebrating!</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[1;38;2;212;183;2mFinal answer: For a party at Wayne's mansion, consider playing a mix of classic party hits and modern anthems to \u001b[0m\n",
|
||||
"\u001b[1;38;2;212;183;2mcater to various age groups. A recommended playlist might include songs like 'September' by Earth, Wind & Fire, 'I \u001b[0m\n",
|
||||
"\u001b[1;38;2;212;183;2mGotta Feeling' by The Black Eyed Peas, 'Last Friday Night (T.G.I.F.)' by Katy Perry, 'Dancing Queen' by ABBA, 'Turn\u001b[0m\n",
|
||||
"\u001b[1;38;2;212;183;2mDown for What' by DJ Snake & Lil Jon, and 'Crank That (Soulja Boy)' by Soulja Boy Tell 'Em. These songs are known \u001b[0m\n",
|
||||
"\u001b[1;38;2;212;183;2mto get everyone dancing and celebrating!\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">[Step 2: Duration 10.69 seconds| Input tokens: 6,869 | Output tokens: 199]</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[2m[Step 2: Duration 10.69 seconds| Input tokens: 6,869 | Output tokens: 199]\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"application/vnd.google.colaboratory.intrinsic+json": {
|
||||
"type": "string"
|
||||
},
|
||||
"text/plain": [
|
||||
"\"For a party at Wayne's mansion, consider playing a mix of classic party hits and modern anthems to cater to various age groups. A recommended playlist might include songs like 'September' by Earth, Wind & Fire, 'I Gotta Feeling' by The Black Eyed Peas, 'Last Friday Night (T.G.I.F.)' by Katy Perry, 'Dancing Queen' by ABBA, 'Turn Down for What' by DJ Snake & Lil Jon, and 'Crank That (Soulja Boy)' by Soulja Boy Tell 'Em. These songs are known to get everyone dancing and celebrating!\""
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from smolagents import ToolCallingAgent, DuckDuckGoSearchTool, HfApiModel\n",
|
||||
"\n",
|
||||
"agent = ToolCallingAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel())\n",
|
||||
"\n",
|
||||
"agent.run(\"Search for the best music recommendations for a party at the Wayne's mansion.\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Cl19VWGRYXrr"
|
||||
},
|
||||
"source": [
|
||||
"\n",
|
||||
"When you examine the agent's trace, instead of seeing `Executing parsed code:`, you'll see something like:\n",
|
||||
"\n",
|
||||
"```text\n",
|
||||
"╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n",
|
||||
"│ Calling tool: 'web_search' with arguments: {'query': \"best music recommendations for a party at Wayne's │\n",
|
||||
"│ mansion\"} │\n",
|
||||
"╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n",
|
||||
"``` \n",
|
||||
"\n",
|
||||
"The agent generates a structured tool call that the system processes to produce the output, rather than directly executing code like a `CodeAgent`.\n",
|
||||
"\n",
|
||||
"Now that we understand both agent types, we can choose the right one for our needs. Let's continue exploring `smolagents` to make Alfred's party a success! 🎉"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
1590
notebooks/unit2/smolagents/tools.ipynb
Normal file
1590
notebooks/unit2/smolagents/tools.ipynb
Normal file
File diff suppressed because one or more lines are too long
535
notebooks/unit2/smolagents/vision_agents.ipynb
Normal file
535
notebooks/unit2/smolagents/vision_agents.ipynb
Normal file
@@ -0,0 +1,535 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "O7wvDb5Xq0ZH"
|
||||
},
|
||||
"source": [
|
||||
"# Vision Agents with smolagents\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This notebook is part of the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course), a free Course from beginner to expert, where you learn to build Agents.\n",
|
||||
"\n",
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "fqKoOdz8q6fF"
|
||||
},
|
||||
"source": [
|
||||
"## Let's install the dependencies and login to our HF account to access the Inference API\n",
|
||||
"\n",
|
||||
"If you haven't installed `smolagents` yet, you can do so by running the following command:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "m_muGXjDRhTD"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install smolagents"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Let's also login to the Hugging Face Hub to have access to the Inference API."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "WJGFjRbZbL50"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "MnLNhxDzRiKh"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from huggingface_hub import notebook_login\n",
|
||||
"\n",
|
||||
"notebook_login()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "qOp72sO9q-TD"
|
||||
},
|
||||
"source": [
|
||||
"## Providing Images at the Start of the Agent's Execution\n",
|
||||
"\n",
|
||||
"In this approach, images are passed to the agent at the start and stored as `task_images` alongside the task prompt. The agent then processes these images throughout its execution. \n",
|
||||
"\n",
|
||||
"Consider the case where Alfred wants to verify the identities of the superheroes attending the party. He already has a dataset of images from previous parties with the names of the guests. Given a new visitor's image, the agent can compare it with the existing dataset and make a decision about letting them in. \n",
|
||||
"\n",
|
||||
"In this case, a guest is trying to enter, and Alfred suspects that this visitor might be The Joker impersonating Wonder Woman. Alfred needs to verify their identity to prevent anyone unwanted from entering. \n",
|
||||
"\n",
|
||||
"Let’s build the example. First, the images are loaded. In this case, we use images from Wikipedia to keep the example minimal, but image the possible use-case!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "BI9E3okPR5wc"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from PIL import Image\n",
|
||||
"import requests\n",
|
||||
"from io import BytesIO\n",
|
||||
"\n",
|
||||
"image_urls = [\n",
|
||||
" \"https://upload.wikimedia.org/wikipedia/commons/e/e8/The_Joker_at_Wax_Museum_Plus.jpg\",\n",
|
||||
" \"https://upload.wikimedia.org/wikipedia/en/9/98/Joker_%28DC_Comics_character%29.jpg\"\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"images = []\n",
|
||||
"for url in image_urls:\n",
|
||||
" response = requests.get(url)\n",
|
||||
" image = Image.open(BytesIO(response.content)).convert(\"RGB\")\n",
|
||||
" images.append(image)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"Now that we have the images, the agent will tell us wether the guests is actually a superhero (Wonder Woman) or a villian (The Joker)."
|
||||
],
|
||||
"metadata": {
|
||||
"id": "vUBQjETkbRU6"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "6HroQ3eIT-3m"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from google.colab import userdata\n",
|
||||
"import os\n",
|
||||
"os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/",
|
||||
"height": 1000
|
||||
},
|
||||
"id": "A8qra0deRkUY",
|
||||
"outputId": "2867daa1-e84e-4d02-ef10-eeeaf3ea863d"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #d4b702; text-decoration-color: #d4b702\">╭──────────────────────────────────────────────────── </span><span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">New run</span><span style=\"color: #d4b702; text-decoration-color: #d4b702\"> ────────────────────────────────────────────────────╮</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span> <span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span> <span style=\"font-weight: bold\">Describe the costume and makeup that the comic character in these photos is wearing and return the description.</span> <span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span> <span style=\"font-weight: bold\"> Tell me if the guest is The Joker or Wonder Woman.</span> <span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span> <span style=\"color: #d4b702; text-decoration-color: #d4b702\">│</span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702\">╰─ OpenAIServerModel - gpt-4o ────────────────────────────────────────────────────────────────────────────────────╯</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[38;2;212;183;2m╭─\u001b[0m\u001b[38;2;212;183;2m───────────────────────────────────────────────────\u001b[0m\u001b[38;2;212;183;2m \u001b[0m\u001b[1;38;2;212;183;2mNew run\u001b[0m\u001b[38;2;212;183;2m \u001b[0m\u001b[38;2;212;183;2m───────────────────────────────────────────────────\u001b[0m\u001b[38;2;212;183;2m─╮\u001b[0m\n",
|
||||
"\u001b[38;2;212;183;2m│\u001b[0m \u001b[38;2;212;183;2m│\u001b[0m\n",
|
||||
"\u001b[38;2;212;183;2m│\u001b[0m \u001b[1mDescribe the costume and makeup that the comic character in these photos is wearing and return the description.\u001b[0m \u001b[38;2;212;183;2m│\u001b[0m\n",
|
||||
"\u001b[38;2;212;183;2m│\u001b[0m \u001b[1m Tell me if the guest is The Joker or Wonder Woman.\u001b[0m \u001b[38;2;212;183;2m│\u001b[0m\n",
|
||||
"\u001b[38;2;212;183;2m│\u001b[0m \u001b[38;2;212;183;2m│\u001b[0m\n",
|
||||
"\u001b[38;2;212;183;2m╰─\u001b[0m\u001b[38;2;212;183;2m OpenAIServerModel - gpt-4o \u001b[0m\u001b[38;2;212;183;2m───────────────────────────────────────────────────────────────────────────────────\u001b[0m\u001b[38;2;212;183;2m─╯\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #d4b702; text-decoration-color: #d4b702\">━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ </span><span style=\"font-weight: bold\">Step </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span><span style=\"color: #d4b702; text-decoration-color: #d4b702\"> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[38;2;212;183;2m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \u001b[0m\u001b[1mStep \u001b[0m\u001b[1;36m1\u001b[0m\u001b[38;2;212;183;2m ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold; font-style: italic\">Output message of the LLM:</span> <span style=\"color: #d4b702; text-decoration-color: #d4b702\">────────────────────────────────────────────────────────────────────────────────────────</span>\n",
|
||||
"<span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">I don't have the capability to identify or recognize people in images, but I can describe what I see.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">The character in the photos you provided is wearing:</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117\">1.</span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\"> </span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117; font-weight: bold\">**Costume:**</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117\">-</span><span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">A purple suit with a large bow tie in one image.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117\">-</span><span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">A white flower lapel and card in another image.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117\">-</span><span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">The style is flamboyant and colorful, typical of a comic villain.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117\">2.</span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\"> </span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117; font-weight: bold\">**Makeup:**</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117\">-</span><span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">White face makeup covering the entire face.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117\">-</span><span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">Red lips forming a wide, exaggerated smile.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117\">-</span><span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">Dark makeup around the eyes.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117\">-</span><span style=\"color: #6e7681; text-decoration-color: #6e7681; background-color: #0d1117\"> </span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">Green hair.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">From the description, this character resembles The Joker, a well-known comic book villain.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[1;3mOutput message of the LLM:\u001b[0m \u001b[38;2;212;183;2m────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
|
||||
"\u001b[38;2;230;237;243;48;2;13;17;23mI\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mdon't\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mhave\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mthe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcapability\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mto\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23midentify\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mor\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mrecognize\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mpeople\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23min\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mimages,\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mbut\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mI\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcan\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mdescribe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mwhat\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mI\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23msee.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;230;237;243;48;2;13;17;23mThe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcharacter\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23min\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mthe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mphotos\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23myou\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mprovided\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mis\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mwearing:\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;255;123;114;48;2;13;17;23m1.\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[1;38;2;230;237;243;48;2;13;17;23m**Costume:**\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;255;123;114;48;2;13;17;23m-\u001b[0m\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mA\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mpurple\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23msuit\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mwith\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23ma\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mlarge\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mbow\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mtie\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23min\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mone\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mimage.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;255;123;114;48;2;13;17;23m-\u001b[0m\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mA\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mwhite\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mflower\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mlapel\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mand\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcard\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23min\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23manother\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mimage.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;255;123;114;48;2;13;17;23m-\u001b[0m\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mThe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mstyle\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mis\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mflamboyant\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mand\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcolorful,\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mtypical\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mof\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23ma\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcomic\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mvillain.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;255;123;114;48;2;13;17;23m2.\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[1;38;2;230;237;243;48;2;13;17;23m**Makeup:**\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;255;123;114;48;2;13;17;23m-\u001b[0m\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mWhite\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mface\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mmakeup\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcovering\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mthe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mentire\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mface.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;255;123;114;48;2;13;17;23m-\u001b[0m\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mRed\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mlips\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mforming\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23ma\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mwide,\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mexaggerated\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23msmile.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;255;123;114;48;2;13;17;23m-\u001b[0m\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mDark\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mmakeup\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23maround\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mthe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23meyes.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;255;123;114;48;2;13;17;23m-\u001b[0m\u001b[38;2;110;118;129;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mGreen\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mhair.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;230;237;243;48;2;13;17;23mFrom\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mthe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mdescription,\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mthis\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcharacter\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mresembles\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mThe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mJoker,\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23ma\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mwell-known\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcomic\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mbook\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mvillain.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Error in code parsing:</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Your code snippet is invalid, because the regex pattern ```(?:py|python)?\\n(.*?)\\n``` was not found in it.</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Here is your code snippet:</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">I don't have the capability to identify or recognize people in images, but I can describe what I see.</span>\n",
|
||||
"\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">The character in the photos you provided is wearing:</span>\n",
|
||||
"\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">1</span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">. **Costume:**</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\"> - A purple suit with a large bow tie in one image.</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\"> - A white flower lapel and card in another image.</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\"> - The style is flamboyant and colorful, typical of a comic villain.</span>\n",
|
||||
"\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">2</span><span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">. **Makeup:**</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\"> - White face makeup covering the entire face.</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\"> - Red lips forming a wide, exaggerated smile.</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\"> - Dark makeup around the eyes.</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\"> - Green hair.</span>\n",
|
||||
"\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">From the description, this character resembles The Joker, a well-known comic book villain.</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Make sure to include code with the correct pattern, for instance:</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Thoughts: Your thoughts</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Code:</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">```py</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\"># Your python code here</span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">```<end_code></span>\n",
|
||||
"<span style=\"color: #800000; text-decoration-color: #800000; font-weight: bold\">Make sure to provide correct code blobs.</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[1;31mError in code parsing:\u001b[0m\n",
|
||||
"\u001b[1;31mYour code snippet is invalid, because the regex pattern ```\u001b[0m\u001b[1;31m(\u001b[0m\u001b[1;31m?:py|python\u001b[0m\u001b[1;31m)\u001b[0m\u001b[1;31m?\\\u001b[0m\u001b[1;31mn\u001b[0m\u001b[1;31m(\u001b[0m\u001b[1;31m.*?\u001b[0m\u001b[1;31m)\u001b[0m\u001b[1;31m\\n``` was not found in it.\u001b[0m\n",
|
||||
"\u001b[1;31mHere is your code snippet:\u001b[0m\n",
|
||||
"\u001b[1;31mI don't have the capability to identify or recognize people in images, but I can describe what I see.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1;31mThe character in the photos you provided is wearing:\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1;31m1\u001b[0m\u001b[1;31m. **Costume:**\u001b[0m\n",
|
||||
"\u001b[1;31m - A purple suit with a large bow tie in one image.\u001b[0m\n",
|
||||
"\u001b[1;31m - A white flower lapel and card in another image.\u001b[0m\n",
|
||||
"\u001b[1;31m - The style is flamboyant and colorful, typical of a comic villain.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1;31m2\u001b[0m\u001b[1;31m. **Makeup:**\u001b[0m\n",
|
||||
"\u001b[1;31m - White face makeup covering the entire face.\u001b[0m\n",
|
||||
"\u001b[1;31m - Red lips forming a wide, exaggerated smile.\u001b[0m\n",
|
||||
"\u001b[1;31m - Dark makeup around the eyes.\u001b[0m\n",
|
||||
"\u001b[1;31m - Green hair.\u001b[0m\n",
|
||||
"\n",
|
||||
"\u001b[1;31mFrom the description, this character resembles The Joker, a well-known comic book villain.\u001b[0m\n",
|
||||
"\u001b[1;31mMake sure to include code with the correct pattern, for instance:\u001b[0m\n",
|
||||
"\u001b[1;31mThoughts: Your thoughts\u001b[0m\n",
|
||||
"\u001b[1;31mCode:\u001b[0m\n",
|
||||
"\u001b[1;31m```py\u001b[0m\n",
|
||||
"\u001b[1;31m# Your python code here\u001b[0m\n",
|
||||
"\u001b[1;31m```\u001b[0m\u001b[1;31m<\u001b[0m\u001b[1;31mend_code\u001b[0m\u001b[1;31m>\u001b[0m\n",
|
||||
"\u001b[1;31mMake sure to provide correct code blobs.\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">[Step 0: Duration 4.30 seconds| Input tokens: 3,004 | Output tokens: 139]</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[2m[Step 0: Duration 4.30 seconds| Input tokens: 3,004 | Output tokens: 139]\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #d4b702; text-decoration-color: #d4b702\">━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ </span><span style=\"font-weight: bold\">Step </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span><span style=\"color: #d4b702; text-decoration-color: #d4b702\"> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[38;2;212;183;2m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ \u001b[0m\u001b[1mStep \u001b[0m\u001b[1;36m2\u001b[0m\u001b[38;2;212;183;2m ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold; font-style: italic\">Output message of the LLM:</span> <span style=\"color: #d4b702; text-decoration-color: #d4b702\">────────────────────────────────────────────────────────────────────────────────────────</span>\n",
|
||||
"<span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">I'm unable to identify characters in images, but I can offer a description.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">Thought: From the images, I will describe the costume and makeup.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">Code:</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\">```py</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">description </span><span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117; font-weight: bold\">=</span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\"> </span><span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\">\"\"\"</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\">1. Costume:</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\"> - A purple suit with a yellow shirt and a large purple bow tie.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\"> - Features a white flower lapel and a playing card in the second image.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\"> - The style is flamboyant, consistent with a comic villain.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\">2. Makeup:</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\"> - White face makeup covering the entire face.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\"> - Red lips forming a wide, exaggerated smile.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\"> - Blue eyeshadow with dark eye accents.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\"> - Slicked-back green hair.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\">\"\"\"</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #8b949e; text-decoration-color: #8b949e; background-color: #0d1117; font-style: italic\"># Based on the description, this character resembles The Joker.</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">character </span><span style=\"color: #ff7b72; text-decoration-color: #ff7b72; background-color: #0d1117; font-weight: bold\">=</span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\"> </span><span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\">\"The Joker\"</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">final_answer({</span><span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\">\"description\"</span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">: description, </span><span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\">\"character\"</span><span style=\"color: #e6edf3; text-decoration-color: #e6edf3; background-color: #0d1117\">: character})</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"<span style=\"color: #a5d6ff; text-decoration-color: #a5d6ff; background-color: #0d1117\">```</span><span style=\"background-color: #0d1117\"> </span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[1;3mOutput message of the LLM:\u001b[0m \u001b[38;2;212;183;2m────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n",
|
||||
"\u001b[38;2;230;237;243;48;2;13;17;23mI'm\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23munable\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mto\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23midentify\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcharacters\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23min\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mimages,\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mbut\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mI\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcan\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23moffer\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23ma\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mdescription.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;230;237;243;48;2;13;17;23mThought:\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mFrom\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mthe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mimages,\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mI\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mwill\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mdescribe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mthe\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcostume\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mand\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mmakeup.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;230;237;243;48;2;13;17;23mCode:\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m```\u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23mpy\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;230;237;243;48;2;13;17;23mdescription\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[1;38;2;255;123;114;48;2;13;17;23m=\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23m\"\"\"\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m1. Costume:\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m - A purple suit with a yellow shirt and a large purple bow tie.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m - Features a white flower lapel and a playing card in the second image.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m - The style is flamboyant, consistent with a comic villain.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m2. Makeup:\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m - White face makeup covering the entire face.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m - Red lips forming a wide, exaggerated smile.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m - Blue eyeshadow with dark eye accents.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m - Slicked-back green hair.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m\"\"\"\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[3;38;2;139;148;158;48;2;13;17;23m# Based on the description, this character resembles The Joker.\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;230;237;243;48;2;13;17;23mcharacter\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[1;38;2;255;123;114;48;2;13;17;23m=\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23m\"\u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23mThe Joker\u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23m\"\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;230;237;243;48;2;13;17;23mfinal_answer\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m(\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m{\u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23m\"\u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23mdescription\u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23m\"\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m:\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mdescription\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m,\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23m\"\u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23mcharacter\u001b[0m\u001b[38;2;165;214;255;48;2;13;17;23m\"\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m:\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m \u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23mcharacter\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m}\u001b[0m\u001b[38;2;230;237;243;48;2;13;17;23m)\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n",
|
||||
"\u001b[38;2;165;214;255;48;2;13;17;23m```\u001b[0m\u001b[48;2;13;17;23m \u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"> ─ <span style=\"font-weight: bold\">Executing parsed code:</span> ──────────────────────────────────────────────────────────────────────────────────────── \n",
|
||||
" <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">description </span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\"> </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"\"\"</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">1. Costume:</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\"> - A purple suit with a yellow shirt and a large purple bow tie.</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\"> - Features a white flower lapel and a playing card in the second image.</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\"> - The style is flamboyant, consistent with a comic villain.</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">2. Makeup:</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\"> - White face makeup covering the entire face.</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\"> - Red lips forming a wide, exaggerated smile.</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\"> - Blue eyeshadow with dark eye accents.</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\"> - Slicked-back green hair.</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"\"\"</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #959077; text-decoration-color: #959077; background-color: #272822\"># Based on the description, this character resembles The Joker.</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">character </span><span style=\"color: #ff4689; text-decoration-color: #ff4689; background-color: #272822\">=</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\"> </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"The Joker\"</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"background-color: #272822\"> </span> \n",
|
||||
" <span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">final_answer({</span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"description\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">: description, </span><span style=\"color: #e6db74; text-decoration-color: #e6db74; background-color: #272822\">\"character\"</span><span style=\"color: #f8f8f2; text-decoration-color: #f8f8f2; background-color: #272822\">: character})</span><span style=\"background-color: #272822\"> </span> \n",
|
||||
" ───────────────────────────────────────────────────────────────────────────────────────────────────────────────── \n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
" ─ \u001b[1mExecuting parsed code:\u001b[0m ──────────────────────────────────────────────────────────────────────────────────────── \n",
|
||||
" \u001b[38;2;248;248;242;48;2;39;40;34mdescription\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\"\"\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m1. Costume:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m - A purple suit with a yellow shirt and a large purple bow tie.\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m - Features a white flower lapel and a playing card in the second image.\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m - The style is flamboyant, consistent with a comic villain.\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m2. Makeup:\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m - White face makeup covering the entire face.\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m - Red lips forming a wide, exaggerated smile.\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m - Blue eyeshadow with dark eye accents.\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m - Slicked-back green hair.\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;230;219;116;48;2;39;40;34m\"\"\"\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;149;144;119;48;2;39;40;34m# Based on the description, this character resembles The Joker.\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;248;248;242;48;2;39;40;34mcharacter\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;255;70;137;48;2;39;40;34m=\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mThe Joker\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" \u001b[38;2;248;248;242;48;2;39;40;34mfinal_answer\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m(\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m{\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mdescription\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m:\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mdescription\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m,\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34mcharacter\u001b[0m\u001b[38;2;230;219;116;48;2;39;40;34m\"\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m:\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m \u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34mcharacter\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m}\u001b[0m\u001b[38;2;248;248;242;48;2;39;40;34m)\u001b[0m\u001b[48;2;39;40;34m \u001b[0m \n",
|
||||
" ───────────────────────────────────────────────────────────────────────────────────────────────────────────────── \n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">Out - Final answer: {'description': '\\n1. Costume:\\n - A purple suit with a yellow shirt and a large purple bow </span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">tie.\\n - Features a white flower lapel and a playing card in the second image.\\n - The style is flamboyant, </span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">consistent with a comic villain.\\n\\n2. Makeup:\\n - White face makeup covering the entire face.\\n - Red lips </span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">forming a wide, exaggerated smile.\\n - Blue eyeshadow with dark eye accents.\\n - Slicked-back green hair.\\n', </span>\n",
|
||||
"<span style=\"color: #d4b702; text-decoration-color: #d4b702; font-weight: bold\">'character': 'The Joker'}</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[1;38;2;212;183;2mOut - Final answer: {'description': '\\n1. Costume:\\n - A purple suit with a yellow shirt and a large purple bow \u001b[0m\n",
|
||||
"\u001b[1;38;2;212;183;2mtie.\\n - Features a white flower lapel and a playing card in the second image.\\n - The style is flamboyant, \u001b[0m\n",
|
||||
"\u001b[1;38;2;212;183;2mconsistent with a comic villain.\\n\\n2. Makeup:\\n - White face makeup covering the entire face.\\n - Red lips \u001b[0m\n",
|
||||
"\u001b[1;38;2;212;183;2mforming a wide, exaggerated smile.\\n - Blue eyeshadow with dark eye accents.\\n - Slicked-back green hair.\\n', \u001b[0m\n",
|
||||
"\u001b[1;38;2;212;183;2m'character': 'The Joker'}\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #7f7f7f; text-decoration-color: #7f7f7f\">[Step 1: Duration 7.36 seconds| Input tokens: 7,431 | Output tokens: 302]</span>\n",
|
||||
"</pre>\n"
|
||||
],
|
||||
"text/plain": [
|
||||
"\u001b[2m[Step 1: Duration 7.36 seconds| Input tokens: 7,431 | Output tokens: 302]\u001b[0m\n"
|
||||
]
|
||||
},
|
||||
"metadata": {},
|
||||
"output_type": "display_data"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from smolagents import CodeAgent, OpenAIServerModel\n",
|
||||
"\n",
|
||||
"model = OpenAIServerModel(model_id=\"gpt-4o\")\n",
|
||||
"\n",
|
||||
"# Instantiate the agent\n",
|
||||
"agent = CodeAgent(\n",
|
||||
" tools=[],\n",
|
||||
" model=model,\n",
|
||||
" max_steps=20,\n",
|
||||
" verbosity_level=2\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"response = agent.run(\n",
|
||||
" \"\"\"\n",
|
||||
" Describe the costume and makeup that the comic character in these photos is wearing and return the description.\n",
|
||||
" Tell me if the guest is The Joker or Wonder Woman.\n",
|
||||
" \"\"\",\n",
|
||||
" images=images\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "uvKj37AmeIu0",
|
||||
"outputId": "ed7984d4-f6a2-4062-9939-41cb2e97b3b2"
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'description': '\\n1. Costume:\\n - A purple suit with a yellow shirt and a large purple bow tie.\\n - Features a white flower lapel and a playing card in the second image.\\n - The style is flamboyant, consistent with a comic villain.\\n\\n2. Makeup:\\n - White face makeup covering the entire face.\\n - Red lips forming a wide, exaggerated smile.\\n - Blue eyeshadow with dark eye accents.\\n - Slicked-back green hair.\\n',\n",
|
||||
" 'character': 'The Joker'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 40,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"response"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"source": [
|
||||
"In this case, the output reveals that the person is impersonating someone else, so we can prevent The Joker from entering the party!"
|
||||
],
|
||||
"metadata": {
|
||||
"id": "NrV-yK5zbT9r"
|
||||
}
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "ziyfk-3ZrHw5"
|
||||
},
|
||||
"source": [
|
||||
"## Providing Images with Dynamic Retrieval\n",
|
||||
"\n",
|
||||
"This examples is provided as a `.py` file since it needs to be run locally since it'll browse the web. Go to the [Hugging Face Agents Course](https://www.hf.co/learn/agents-course) for more details."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"name": "python"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
207
notebooks/unit2/smolagents/vision_web_browser.py
Normal file
207
notebooks/unit2/smolagents/vision_web_browser.py
Normal file
@@ -0,0 +1,207 @@
|
||||
import argparse
|
||||
from io import BytesIO
|
||||
from time import sleep
|
||||
|
||||
import helium
|
||||
from dotenv import load_dotenv
|
||||
from PIL import Image
|
||||
from selenium import webdriver
|
||||
from selenium.webdriver.common.by import By
|
||||
from selenium.webdriver.common.keys import Keys
|
||||
|
||||
from smolagents import CodeAgent, DuckDuckGoSearchTool, tool
|
||||
from smolagents.agents import ActionStep
|
||||
from smolagents.cli import load_model
|
||||
|
||||
|
||||
alfred_guest_list_request = """
|
||||
I am Alfred, the butler of Wayne Manor, responsible for verifying the identity of guests at party. A superhero has arrived at the entrance claiming to be Wonder Woman, but I need to confirm if she is who she says she is.
|
||||
|
||||
Please search for images of Wonder Woman and generate a detailed visual description based on those images. Additionally, navigate to Wikipedia to gather key details about her appearance. With this information, I can determine whether to grant her access to the event.
|
||||
"""
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
parser = argparse.ArgumentParser(description="Run a web browser automation script with a specified model.")
|
||||
parser.add_argument(
|
||||
"prompt",
|
||||
type=str,
|
||||
nargs="?", # Makes it optional
|
||||
default=alfred_guest_list_request,
|
||||
help="The prompt to run with the agent",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--model-type",
|
||||
type=str,
|
||||
default="LiteLLMModel",
|
||||
help="The model type to use (e.g., OpenAIServerModel, LiteLLMModel, TransformersModel, HfApiModel)",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--model-id",
|
||||
type=str,
|
||||
default="gpt-4o",
|
||||
help="The model ID to use for the specified model type",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def save_screenshot(memory_step: ActionStep, agent: CodeAgent) -> None:
|
||||
sleep(1.0) # Let JavaScript animations happen before taking the screenshot
|
||||
driver = helium.get_driver()
|
||||
current_step = memory_step.step_number
|
||||
if driver is not None:
|
||||
for previous_memory_step in agent.memory.steps: # Remove previous screenshots from logs for lean processing
|
||||
if isinstance(previous_memory_step, ActionStep) and previous_memory_step.step_number <= current_step - 2:
|
||||
previous_memory_step.observations_images = None
|
||||
png_bytes = driver.get_screenshot_as_png()
|
||||
image = Image.open(BytesIO(png_bytes))
|
||||
print(f"Captured a browser screenshot: {image.size} pixels")
|
||||
memory_step.observations_images = [image.copy()] # Create a copy to ensure it persists, important!
|
||||
|
||||
# Update observations with current URL
|
||||
url_info = f"Current url: {driver.current_url}"
|
||||
memory_step.observations = (
|
||||
url_info if memory_step.observations is None else memory_step.observations + "\n" + url_info
|
||||
)
|
||||
return
|
||||
|
||||
|
||||
@tool
|
||||
def search_item_ctrl_f(text: str, nth_result: int = 1) -> str:
|
||||
"""
|
||||
Searches for text on the current page via Ctrl + F and jumps to the nth occurrence.
|
||||
Args:
|
||||
text: The text to search for
|
||||
nth_result: Which occurrence to jump to (default: 1)
|
||||
"""
|
||||
elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]")
|
||||
if nth_result > len(elements):
|
||||
raise Exception(f"Match n°{nth_result} not found (only {len(elements)} matches found)")
|
||||
result = f"Found {len(elements)} matches for '{text}'."
|
||||
elem = elements[nth_result - 1]
|
||||
driver.execute_script("arguments[0].scrollIntoView(true);", elem)
|
||||
result += f"Focused on element {nth_result} of {len(elements)}"
|
||||
return result
|
||||
|
||||
|
||||
@tool
|
||||
def go_back() -> None:
|
||||
"""Goes back to previous page."""
|
||||
driver.back()
|
||||
|
||||
|
||||
@tool
|
||||
def close_popups() -> str:
|
||||
"""
|
||||
Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows! This does not work on cookie consent banners.
|
||||
"""
|
||||
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
|
||||
|
||||
|
||||
def initialize_driver():
|
||||
"""Initialize the Selenium WebDriver."""
|
||||
chrome_options = webdriver.ChromeOptions()
|
||||
chrome_options.add_argument("--force-device-scale-factor=1")
|
||||
chrome_options.add_argument("--window-size=1000,1350")
|
||||
chrome_options.add_argument("--disable-pdf-viewer")
|
||||
chrome_options.add_argument("--window-position=0,0")
|
||||
return helium.start_chrome(headless=False, options=chrome_options)
|
||||
|
||||
|
||||
def initialize_agent(model):
|
||||
"""Initialize the CodeAgent with the specified model."""
|
||||
return CodeAgent(
|
||||
tools=[DuckDuckGoSearchTool(), go_back, close_popups, search_item_ctrl_f],
|
||||
model=model,
|
||||
additional_authorized_imports=["helium"],
|
||||
step_callbacks=[save_screenshot],
|
||||
max_steps=20,
|
||||
verbosity_level=2,
|
||||
)
|
||||
|
||||
|
||||
helium_instructions = """
|
||||
Use your web_search tool when you want to get Google search results.
|
||||
Then you can use helium to access websites. Don't use helium for Google search, only for navigating websites!
|
||||
Don't bother about the helium driver, it's already managed.
|
||||
We've already ran "from helium import *"
|
||||
Then you can go to pages!
|
||||
Code:
|
||||
```py
|
||||
go_to('github.com/trending')
|
||||
```<end_code>
|
||||
|
||||
You can directly click clickable elements by inputting the text that appears on them.
|
||||
Code:
|
||||
```py
|
||||
click("Top products")
|
||||
```<end_code>
|
||||
|
||||
If it's a link:
|
||||
Code:
|
||||
```py
|
||||
click(Link("Top products"))
|
||||
```<end_code>
|
||||
|
||||
If you try to interact with an element and it's not found, you'll get a LookupError.
|
||||
In general stop your action after each button click to see what happens on your screenshot.
|
||||
Never try to login in a page.
|
||||
|
||||
To scroll up or down, use scroll_down or scroll_up with as an argument the number of pixels to scroll from.
|
||||
Code:
|
||||
```py
|
||||
scroll_down(num_pixels=1200) # This will scroll one viewport down
|
||||
```<end_code>
|
||||
|
||||
When you have pop-ups with a cross icon to close, don't try to click the close icon by finding its element or targeting an 'X' element (this most often fails).
|
||||
Just use your built-in tool `close_popups` to close them:
|
||||
Code:
|
||||
```py
|
||||
close_popups()
|
||||
```<end_code>
|
||||
|
||||
You can use .exists() to check for the existence of an element. For example:
|
||||
Code:
|
||||
```py
|
||||
if Text('Accept cookies?').exists():
|
||||
click('I accept')
|
||||
```<end_code>
|
||||
|
||||
Proceed in several steps rather than trying to solve the task in one shot.
|
||||
And at the end, only when you have your answer, return your final answer.
|
||||
Code:
|
||||
```py
|
||||
final_answer("YOUR_ANSWER_HERE")
|
||||
```<end_code>
|
||||
|
||||
If pages seem stuck on loading, you might have to wait, for instance `import time` and run `time.sleep(5.0)`. But don't overuse this!
|
||||
To list elements on page, DO NOT try code-based element searches like 'contributors = find_all(S("ol > li"))': just look at the latest screenshot you have and read it visually, or use your tool search_item_ctrl_f.
|
||||
Of course, you can act on buttons like a user would do when navigating.
|
||||
After each code blob you write, you will be automatically provided with an updated screenshot of the browser and the current browser url.
|
||||
But beware that the screenshot will only be taken at the end of the whole action, it won't see intermediate states.
|
||||
Don't kill the browser.
|
||||
When you have modals or cookie banners on screen, you should get rid of them before you can click anything else.
|
||||
"""
|
||||
|
||||
|
||||
def main():
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Parse command line arguments
|
||||
args = parse_arguments()
|
||||
|
||||
# Initialize the model based on the provided arguments
|
||||
model = load_model(args.model_type, args.model_id)
|
||||
|
||||
global driver
|
||||
driver = initialize_driver()
|
||||
agent = initialize_agent(model)
|
||||
|
||||
# Run the agent with the provided prompt
|
||||
agent.python_executor("from helium import *", agent.state)
|
||||
agent.run(args.prompt + helium_instructions)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -37,11 +37,61 @@
|
||||
- local: unit1/dummy-agent-library
|
||||
title: Dummy Agent Library
|
||||
- local: unit1/tutorial
|
||||
title: Let’s Create Our First Agent Using Smolagents
|
||||
title: Let’s Create Our First Agent Using smolagents
|
||||
- local: unit1/final-quiz
|
||||
title: Unit 1 Final Quiz
|
||||
- local: unit1/conclusion
|
||||
title: Conclusion
|
||||
- title: Unit 2. Frameworks for AI Agents
|
||||
sections:
|
||||
- local: unit2/introduction
|
||||
title: Frameworks for AI Agents
|
||||
- title: Unit 2.1 The smolagents framework
|
||||
sections:
|
||||
- local: unit2/smolagents/introduction
|
||||
title: Introduction to smolagents
|
||||
- local: unit2/smolagents/why_use_smolagents
|
||||
title: Why use smolagents?
|
||||
- local: unit2/smolagents/quiz1
|
||||
title: Quick Quiz 1
|
||||
- local: unit2/smolagents/code_agents
|
||||
title: Building Agents That Use Code
|
||||
- local: unit2/smolagents/tool_calling_agents
|
||||
title: Writing actions as code snippets or JSON blobs
|
||||
- local: unit2/smolagents/tools
|
||||
title: Tools
|
||||
- local: unit2/smolagents/retrieval_agents
|
||||
title: Retrieval Agents
|
||||
- local: unit2/smolagents/quiz2
|
||||
title: Quick Quiz 2
|
||||
- local: unit2/smolagents/multi_agent_systems
|
||||
title: Multi-Agent Systems
|
||||
- local: unit2/smolagents/vision_agents
|
||||
title: Vision and Browser agents
|
||||
- local: unit2/smolagents/final_quiz
|
||||
title: Final Quiz
|
||||
- local: unit2/smolagents/conclusion
|
||||
title: Conclusion
|
||||
- title: Unit 2.2 The LlamaIndex framework
|
||||
sections:
|
||||
- local: unit2/llama-index/introduction
|
||||
title: Introduction to LLamaIndex
|
||||
- local: unit2/llama-index/llama-hub
|
||||
title: Introduction to LlamaHub
|
||||
- local: unit2/llama-index/components
|
||||
title: What are Components in LlamaIndex?
|
||||
- local: unit2/llama-index/tools
|
||||
title: Using Tools in LlamaIndex
|
||||
- local: unit2/llama-index/quiz1
|
||||
title: Quick Quiz 1
|
||||
- local: unit2/llama-index/agents
|
||||
title: Using Agents in LlamaIndex
|
||||
- local: unit2/llama-index/workflows
|
||||
title: Creating Agentic Workflows in LlamaIndex
|
||||
- local: unit2/llama-index/quiz2
|
||||
title: Quick Quiz 2
|
||||
- local: unit2/llama-index/conclusion
|
||||
title: Conclusion
|
||||
- title: Bonus Unit 1. Fine-tuning an LLM for Function-calling
|
||||
sections:
|
||||
- local: bonus-unit1/introduction
|
||||
@@ -56,3 +106,4 @@
|
||||
sections:
|
||||
- local: communication/next-units
|
||||
title: Next Units
|
||||
|
||||
|
||||
@@ -27,7 +27,7 @@ Let's get started!
|
||||
In this course, you will:
|
||||
|
||||
- 📖 Study AI Agents in **theory, design, and practice.**
|
||||
- 🧑💻 Learn to **use established AI Agent libraries** such as [smolagents](https://huggingface.co/docs/smolagents/en/index), [LangChain](https://www.langchain.com/), and [LlamaIndex](https://www.llamaindex.ai/).
|
||||
- 🧑💻 Learn to **use established AI Agent libraries** such as [smolagents](https://huggingface.co/docs/smolagents/en/index), [LlamaIndex](https://www.llamaindex.ai/), and [LangGraph](https://langchain-ai.github.io/langgraph/).
|
||||
- 💾 **Share your agents** on the Hugging Face Hub and explore agents created by the community.
|
||||
- 🏆 Participate in challenges where you will **evaluate your agents against other students'.**
|
||||
- 🎓 **Earn a certificate of completion** by completing assignments.
|
||||
@@ -36,7 +36,7 @@ And more!
|
||||
|
||||
At the end of this course you'll understand **how Agents work and how to build your own Agents using the latest libraries and tools**.
|
||||
|
||||
Don't forget to **<a href="https://bit.ly/hf-learn-agents">sign up to the course!</a>**
|
||||
Don't forget to **<a href="https://bit.ly/hf-learn-agents">sign up to the course!</a>**
|
||||
|
||||
(We are respectful of your privacy. We collect your email address to be able to **send you the links when each Unit is published and give you information about the challenges and updates**).
|
||||
|
||||
@@ -88,13 +88,13 @@ You only need 2 things:
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/three-paths.jpg" alt="Two paths" width="100%"/>
|
||||
|
||||
You can choose to follow this course *in audit mode*, or do the activities and *get one of the two certificates we'll issue*.
|
||||
You can choose to follow this course *in audit mode*, or do the activities and *get one of the two certificates we'll issue*.
|
||||
|
||||
If you audit the course, you can participate in all the challenges and do assignments if you want, and **you don't need to notify us**.
|
||||
|
||||
The certification process is **completely free**:
|
||||
|
||||
- *To get a certification for fundamentals*: you need to complete Unit 1 of the course. This is intended for students that want to get up to date with the latest trends in Agents.
|
||||
- *To get a certification for fundamentals*: you need to complete Unit 1 of the course. This is intended for students that want to get up to date with the latest trends in Agents.
|
||||
- *To get a certificate of completion*: you need to complete Unit 1, one of the use case assignments we'll propose during the course, and the final challenge.
|
||||
|
||||
There's a deadline for the certification process: all the assignments must be finished before **May 1st 2025**.
|
||||
@@ -147,15 +147,17 @@ Thomas is a machine learning engineer at Hugging Face and delivered the successf
|
||||
- [Follow Thomas on X](https://x.com/ThomasSimonini)
|
||||
- [Follow Thomas on Linkedin](https://www.linkedin.com/in/simoninithomas/)
|
||||
|
||||
## Acknowledgments
|
||||
## Acknowledgments
|
||||
|
||||
We would like to extend our gratitude to the following individuals for their invaluable contributions to this course:
|
||||
We would like to extend our gratitude to the following individuals for their invaluable contributions to this course:
|
||||
|
||||
- **[Pedro Cuenca](https://huggingface.co/pcuenq)** – For his guidance and expertise in reviewing the materials
|
||||
- **[Pedro Cuenca](https://huggingface.co/pcuenq)** – For his guidance and expertise in reviewing the materials.
|
||||
- **[Aymeric Roucher](https://huggingface.co/m-ric)** – For his amazing demo spaces ( decoding and final agent ) as well as his help on the smolagents parts.
|
||||
- **[Joshua Lochner](https://huggingface.co/Xenova)** – For his amazing demo space on tokenization.
|
||||
- **[Quentin Gallouédec](https://huggingface.co/qgallouedec)** – For his help on the course content.
|
||||
- **[David Berenstein](https://huggingface.co/davidberenstein1957)** – For his help on the course content and moderation.
|
||||
- **[XiaXiao (ShawnSiao)](https://huggingface.co/SSSSSSSiao)** – Chinese translator for the course.
|
||||
- **[Jiaming Huang](https://huggingface.co/nordicsushi)** – Chinese translator for the course.
|
||||
|
||||
## I found a bug, or I want to improve the course [[contribute]]
|
||||
|
||||
@@ -169,8 +171,7 @@ Contributions are **welcome** 🤗
|
||||
|
||||
Please ask your question in our <a href="https://discord.gg/UrrTSsSyjb">discord server #ai-agents-discussions.</a>
|
||||
|
||||
|
||||
Now that you have all the information, let's get on board ⛵
|
||||
Now that you have all the information, let's get on board ⛵
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/time-to-onboard.jpg" alt="Time to Onboard" width="100%"/>
|
||||
|
||||
|
||||
@@ -130,7 +130,7 @@ Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you
|
||||
```
|
||||
|
||||
Since we are running the "text_generation" method, we need to apply the prompt manually:
|
||||
```
|
||||
```python
|
||||
prompt=f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
||||
{SYSTEM_PROMPT}
|
||||
<|eot_id|><|start_header_id|>user<|end_header_id|>
|
||||
@@ -140,7 +140,7 @@ What's the weather in London ?
|
||||
```
|
||||
|
||||
We can also do it like this, which is what happens inside the `chat` method :
|
||||
```
|
||||
```python
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": "What's the weather in London ?"},
|
||||
@@ -204,6 +204,7 @@ print(output)
|
||||
output:
|
||||
|
||||
````
|
||||
Thought: I will check the weather in London.
|
||||
Action:
|
||||
```
|
||||
{
|
||||
@@ -211,7 +212,6 @@ Action:
|
||||
"action_input": {"location": "London"}
|
||||
}
|
||||
```
|
||||
Thought: I will check the weather in London.
|
||||
Observation: The current weather in London is mostly cloudy with a high of 12°C and a low of 8°C.
|
||||
````
|
||||
|
||||
@@ -231,6 +231,7 @@ print(output)
|
||||
output:
|
||||
|
||||
````
|
||||
Thought: I will check the weather in London.
|
||||
Action:
|
||||
```
|
||||
{
|
||||
@@ -238,7 +239,6 @@ Action:
|
||||
"action_input": {"location": "London"}
|
||||
}
|
||||
```
|
||||
Thought: I will check the weather in London.
|
||||
Observation:
|
||||
````
|
||||
|
||||
@@ -307,7 +307,7 @@ Now begin! Reminder to ALWAYS use the exact characters `Final Answer:` when you
|
||||
<|eot_id|><|start_header_id|>user<|end_header_id|>
|
||||
What's the weather in London ?
|
||||
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
||||
|
||||
Thought: I will check the weather in London.
|
||||
Action:
|
||||
```
|
||||
{
|
||||
@@ -315,7 +315,6 @@ Action:
|
||||
"action_input": {"location": {"type": "string", "value": "London"}
|
||||
}
|
||||
```
|
||||
Thought: I will check the weather in London.
|
||||
Observation:the weather in London is sunny with low temperatures.
|
||||
````
|
||||
|
||||
|
||||
@@ -181,7 +181,7 @@ A chat template structures conversations between users and AI models...<|im_end|
|
||||
How do I use it ?<|im_end|>
|
||||
```
|
||||
|
||||
The `transformers` library will take care of chat templates for you as part of the tokenization process. Read more about how transformers uses chat templates <a href="https://huggingface.co/docs/transformers/en/chat_templating#how-do-i-use-chat-templates" target="_blank">here</a>. All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest.
|
||||
The `transformers` library will take care of chat templates for you as part of the tokenization process. Read more about how transformers uses chat templates <a href="https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-use-chat-templates" target="_blank">here</a>. All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest.
|
||||
|
||||
You can experiment with the following Space to see how the same conversation would be formatted for different models using their corresponding chat templates:
|
||||
|
||||
|
||||
@@ -1,19 +1,18 @@
|
||||
---
|
||||
### Q1: What is an Agent?
|
||||
Which of the following best describes an AI Agent?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "A system that solely processes static text, without any inherent mechanism to interact dynamically with its surroundings or execute meaningful actions.",
|
||||
explain: "An Agent must be able to take an action and interact with its environment.",
|
||||
},
|
||||
{
|
||||
text: "An AI model that can reason, plan, and use tools to interact with its environment to achieve a specific goal.",
|
||||
explain: "This definition captures the essential characteristics of an Agent.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "A system that solely processes static text, without any inherent mechanism to interact dynamically with its surroundings or execute meaningful actions.",
|
||||
explain: "An Agent must be able to take an action and interact with its environment.",
|
||||
},
|
||||
{
|
||||
text: "A conversational agent restricted to answering queries, lacking the ability to perform any actions or interact with external systems.",
|
||||
explain: "A chatbot like this lacks the ability to take actions, making it different from an Agent.",
|
||||
},
|
||||
@@ -63,17 +62,17 @@ text: "Tools serve no real purpose and do not contribute to the Agent’s abilit
|
||||
explain: "Tools expand an Agent's capabilities by allowing it to perform actions beyond text generation.",
|
||||
},
|
||||
{
|
||||
text: "Tools provide the Agent with the ability to execute actions a text-generation model cannot perform natively, such as making coffee or generating images.",
|
||||
explain: "Tools enable Agents to interact with the real world and complete tasks.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "Tools are solely designed for memory storage, lacking any capacity to facilitate the execution of tasks or enhance interactive performance.",
|
||||
explain: "Tools are primarily for performing actions, not just for storing data.",
|
||||
},
|
||||
{
|
||||
text: "Tools severely restrict the Agent exclusively to generating text, thereby preventing it from engaging in a broader range of interactive actions.",
|
||||
explain: "On the contrary, tools allow Agents to go beyond text-based responses.",
|
||||
},
|
||||
{
|
||||
text: "Tools provide the Agent with the ability to execute actions a text-generation model cannot perform natively, such as making coffee or generating images.",
|
||||
explain: "Tools enable Agents to interact with the real world and complete tasks.",
|
||||
correct: true
|
||||
}
|
||||
]}
|
||||
/>
|
||||
@@ -144,15 +143,15 @@ text: "A static FAQ page on a website that provides fixed information and lacks
|
||||
explain: "A static FAQ page does not interact dynamically with users or take actions.",
|
||||
},
|
||||
{
|
||||
text: "A simple calculator that performs arithmetic operations based on fixed rules, without any capability for reasoning or planning.",
|
||||
explain: "A calculator follows fixed rules without reasoning or planning, so it is not an Agent.",
|
||||
},
|
||||
{
|
||||
text: "A virtual assistant like Siri or Alexa that can understand spoken commands, reason through them, and perform tasks like setting reminders or sending messages.",
|
||||
explain: "This example includes reasoning, planning, and interaction with the environment.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "A simple calculator that performs arithmetic operations based on fixed rules, without any capability for reasoning or planning.",
|
||||
explain: "A calculator follows fixed rules without reasoning or planning, so it is not an Agent.",
|
||||
},
|
||||
{
|
||||
text: "A video game NPC that operates on a fixed script of responses, without the ability to reason, plan, or use external tools.",
|
||||
explain: "Unless the NPC can reason, plan, and use tools, it does not function as an AI Agent.",
|
||||
}
|
||||
|
||||
@@ -40,6 +40,14 @@ To start, duplicate this Space: <a href="https://huggingface.co/spaces/agents-co
|
||||
Duplicating this space means **creating a local copy on your own profile**:
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/duplicate-space.gif" alt="Duplicate"/>
|
||||
|
||||
After duplicating the Space, you'll need to add your Hugging Face API token so your agent can access the model API:
|
||||
|
||||
1. First, get your Hugging Face token from [https://hf.co/settings/tokens](https://hf.co/settings/tokens) if you don't already have one
|
||||
2. Go to your duplicated Space and click on the **Settings** tab
|
||||
3. Scroll down to the **Variables and Secrets** section and click **New Secret**
|
||||
4. Create a secret with the name `HF_TOKEN` and paste your token as the value
|
||||
5. Click **Save** to store your token securely
|
||||
|
||||
Throughout this lesson, the only file you will need to modify is the (currently incomplete) **"app.py"**. You can see here the [original one in the template](https://huggingface.co/spaces/agents-course/First_agent_template/blob/main/app.py). To find yours, go to your copy of the space, then click the `Files` tab and then on `app.py` in the directory listing.
|
||||
|
||||
Let's break down the code together:
|
||||
@@ -210,7 +218,7 @@ agent = CodeAgent(
|
||||
GradioUI(agent).launch()
|
||||
```
|
||||
|
||||
Your **Goal** is to get familiar with the Space and the Agent.
|
||||
Your **Goal** is to get familiar with the Space and the Agent.
|
||||
|
||||
Currently, the agent in the template **does not use any tools, so try to provide it with some of the pre-made ones or even make some new tools yourself!**
|
||||
|
||||
|
||||
@@ -20,7 +20,7 @@ Most LLMs nowadays are **built on the Transformer architecture**—a deep learni
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
There are 3 types of transformers :
|
||||
There are 3 types of transformers:
|
||||
|
||||
1. **Encoders**
|
||||
An encoder-based Transformer takes text (or other data) as input and outputs a dense representation (or embedding) of that text.
|
||||
@@ -73,7 +73,7 @@ Each LLM has some **special tokens** specific to the model. The LLM uses these t
|
||||
|
||||
The forms of special tokens are highly diverse across model providers.
|
||||
|
||||
The table below illustrates the diversity of special tokens.
|
||||
The table below illustrates the diversity of special tokens.
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
|
||||
37
units/en/unit2/introduction.mdx
Normal file
37
units/en/unit2/introduction.mdx
Normal file
@@ -0,0 +1,37 @@
|
||||
# Introduction to Agentic Frameworks
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/thumbnail.jpg" alt="Thumbnail"/>
|
||||
|
||||
Welcome to this second unit, where **we'll explore different agentic frameworks** that can be used to build powerful agentic applications.
|
||||
|
||||
We will study:
|
||||
|
||||
- In Unit 2.1: [smolagents](https://huggingface.co/docs/smolagents/en/index)
|
||||
- In Unit 2.2: [LlamaIndex](https://www.llamaindex.ai/)
|
||||
- In Unit 2.3: [LangGraph](https://www.langchain.com/langgraph)
|
||||
|
||||
Let's dive in! 🕵
|
||||
|
||||
## When to Use an Agentic Framework
|
||||
|
||||
An agentic framework is **not always needed when building an application around LLMs**. They provide flexibility in the workflow to efficiently solve a specific task, but they're not always necessary.
|
||||
|
||||
Sometimes, **predefined workflows are sufficient** to fulfill user requests, and there is no real need for an agentic framework. If the approach to build an agent is simple, like a chain of prompts, using plain code may be enough. The advantage is that the developer will have **full control and understanding of their system without abstractions**.
|
||||
|
||||
However, when the workflow becomes more complex, such as letting an LLM call functions or using multiple agents, these abstractions start to become helpful.
|
||||
|
||||
Considering these ideas, we can already identify the need for some features:
|
||||
|
||||
* An *LLM engine* that powers the system.
|
||||
* A *list of tools* the agent can access.
|
||||
* A *parser* for extracting tool calls from the LLM output.
|
||||
* A *system prompt* synced with the parser.
|
||||
* A *memory system*.
|
||||
* *Error logging and retry mechanisms* to control LLM mistakes.
|
||||
We'll explore how these topics are resolved in various frameworks, including `smolagents`, `LlamaIndex`, and `LangGraph`.
|
||||
|
||||
## Agentic Frameworks Units
|
||||
|
||||
| Framework | Description | Unit Author |
|
||||
|------------|----------------|----------------|
|
||||
| [smolagents](./smolagents/introduction) | Agents framework developed by Hugging Face. | Sergio Paniego - [HF](https://huggingface.co/sergiopaniego) - [X](https://x.com/sergiopaniego) - [Linkedin](https://www.linkedin.com/in/sergio-paniego-blanco) |
|
||||
15
units/en/unit2/llama-index/README.md
Normal file
15
units/en/unit2/llama-index/README.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# Table of Contents
|
||||
|
||||
This LlamaIndex frame outline is part of unit 2 of the course. You can access the unit 2 about LlamaIndex on hf.co/learn 👉 <a href="https://hf.co/learn/agents-course/unit2/llama-index/introduction">here</a>
|
||||
|
||||
| Title | Description |
|
||||
| --- | --- |
|
||||
| [Introduction](introduction.mdx) | Introduction to LlamaIndex |
|
||||
| [LlamaHub](llama-hub.mdx) | LlamaHub: a registry of integrations, agents and tools |
|
||||
| [Components](components.mdx) | Components: the building blocks of workflows |
|
||||
| [Tools](tools.mdx) | Tools: how to build tools in LlamaIndex |
|
||||
| [Quiz 1](quiz1.mdx) | Quiz 1 |
|
||||
| [Agents](agents.mdx) | Agents: how to build agents in LlamaIndex |
|
||||
| [Workflows](workflows.mdx) | Workflows: a sequence of steps, events made of components that are executed in order |
|
||||
| [Quiz 2](quiz2.mdx) | Quiz 2 |
|
||||
| [Conclusion](conclusion.mdx) | Conclusion |
|
||||
160
units/en/unit2/llama-index/agents.mdx
Normal file
160
units/en/unit2/llama-index/agents.mdx
Normal file
@@ -0,0 +1,160 @@
|
||||
# Using Agents in LlamaIndex
|
||||
|
||||
Remember Alfred, our helpful butler agent from earlier? Well, he's about to get an upgrade!
|
||||
Now that we understand the tools available in LlamaIndex, we can give Alfred new capabilities to serve us better.
|
||||
|
||||
But before we continue, let's remind ourselves what makes an agent like Alfred tick.
|
||||
Back in Unit 1, we learned that:
|
||||
|
||||
> An Agent is a system that leverages an AI model to interact with its environment to achieve a user-defined objective. It combines reasoning, planning, and action execution (often via external tools) to fulfil tasks.
|
||||
|
||||
LlamaIndex supports **three main types of reasoning agents:**
|
||||
|
||||

|
||||
|
||||
1. `Function Calling Agents` - These work with AI models that can call specific functions.
|
||||
2. `ReAct Agents` - These can work with any AI that does chat or text endpoint and deal with complex reasoning tasks.
|
||||
3. `Advanced Custom Agents` - These use more complex methods to deal with more complex tasks and workflows.
|
||||
|
||||
<Tip>Find more information on advanced agents on <a href="https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/agent/workflow/base_agent.py">BaseWorkflowAgent</a></Tip>
|
||||
|
||||
## Initialising Agents
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
To create an agent, we start by providing it with a **set of functions/tools that define its capabilities**.
|
||||
Let's look at how to create an agent with some basic tools. As of this writing, the agent will automatically use the function calling API (if available), or a standard ReAct agent loop.
|
||||
|
||||
LLMs that support a tools/functions API are relatively new, but they provide a powerful way to call tools by avoiding specific prompting and allowing the LLM to create tool calls based on provided schemas.
|
||||
|
||||
ReAct agents are also good at complex reasoning tasks and can work with any LLM that has chat or text completion capabilities. They are more verbose, and show the reasoning behind certain actions that they take.
|
||||
|
||||
```python
|
||||
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
|
||||
from llama_index.core.agent.workflow import AgentWorkflow
|
||||
from llama_index.core.tools import FunctionTool
|
||||
|
||||
# define sample Tool -- type annotations, function names, and docstrings, are all included in parsed schemas!
|
||||
def multiply(a: int, b: int) -> int:
|
||||
"""Multiplies two integers and returns the resulting integer"""
|
||||
return a * b
|
||||
|
||||
# initialize llm
|
||||
llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
|
||||
|
||||
# initialize agent
|
||||
agent = AgentWorkflow.from_tools_or_functions(
|
||||
[FunctionTool.from_defaults(multiply)],
|
||||
llm=llm
|
||||
)
|
||||
```
|
||||
|
||||
**Agents are stateless by default**, add remembering past interactions is opt-in using a `Context` object
|
||||
This might be useful if you want to use an agent that needs to remember previous interactions, like a chatbot that maintains context across multiple messages or a task manager that needs to track progress over time.
|
||||
|
||||
```python
|
||||
# stateless
|
||||
response = await agent.run("What is 2 times 2?")
|
||||
|
||||
# remembering state
|
||||
from llama_index.core.workflow import Context
|
||||
|
||||
ctx = Context(agent)
|
||||
|
||||
response = await agent.run("My name is Bob.", ctx=ctx)
|
||||
response = await agent.run("What was my name again?", ctx=ctx)
|
||||
```
|
||||
|
||||
You'll notice that agents in `LlamaIndex` are async because they use Python's `await` operator. If you are new to async code in Python, or need a refresher, they have an [excellent async guide](https://docs.llamaindex.ai/en/stable/getting_started/async_python/).
|
||||
|
||||
Now we've gotten the basics, let's take a look at how we can use more complex tools in our agents.
|
||||
|
||||
## Creating RAG Agents with QueryEngineTools
|
||||
|
||||
**Agentic RAG is a powerful way to use agents to answer questions about your data.** We can pass various tools to Alfred to help him answer questions.
|
||||
However, instead of answering the question on top of documents automatically, Alfred can decide to use any other tool or flow to answer the question.
|
||||
|
||||

|
||||
|
||||
It is easy to **wrap `QueryEngine` as a tool** for an agent.
|
||||
When doing so, we need to **define a name and description**. The LLM will use this information to correctly use the tool.
|
||||
Let's see how to load in a `QueryEngineTool` using the `QueryEngine` we created in the [component section](02_components).
|
||||
|
||||
```python
|
||||
from llama_index.core.tools import QueryEngineTool
|
||||
|
||||
query_engine = index.as_query_engine(llm=llm, similarity_top_k=3) # as shown in the previous section
|
||||
|
||||
query_engine_tool = QueryEngineTool.from_defaults(
|
||||
query_engine=query_engine,
|
||||
name="name",
|
||||
description="a specific description",
|
||||
return_direct=False,
|
||||
)
|
||||
query_engine_agent = AgentWorkflow.from_tools_or_functions(
|
||||
[query_engine_tool],
|
||||
llm=llm,
|
||||
system_prompt="You are a helpful assistant that has access to a database containing persona descriptions. "
|
||||
)
|
||||
```
|
||||
|
||||
## Creating Multi-agent systems
|
||||
|
||||
The `AgentWorkflow` class also directly supports multi-agent systems. By giving each agent a name and description, the system maintains a single active speaker, with each agent having the ability to hand off to another agent.
|
||||
|
||||
By narrowing the scope of each agent, we can help increase their general accuracy when responding to user messages.
|
||||
|
||||
**Agents in LlamaIndex can also directly be used as tools** for other agents, for more complex and custom scenarios.
|
||||
|
||||
```python
|
||||
from llama_index.core.agent.workflow import (
|
||||
AgentWorkflow,
|
||||
FunctionAgent,
|
||||
ReActAgent,
|
||||
)
|
||||
|
||||
# Define some tools
|
||||
def add(a: int, b: int) -> int:
|
||||
"""Add two numbers."""
|
||||
return a + b
|
||||
|
||||
|
||||
def subtract(a: int, b: int) -> int:
|
||||
"""Subtract two numbers."""
|
||||
return a - b
|
||||
|
||||
|
||||
# Create agent configs
|
||||
# NOTE: we can use FunctionAgent or ReActAgent here.
|
||||
# FunctionAgent works for LLMs with a function calling API.
|
||||
# ReActAgent works for any LLM.
|
||||
calculator_agent = ReActAgent(
|
||||
name="calculator",
|
||||
description="Performs basic arithmetic operations",
|
||||
system_prompt="You are a calculator assistant. Use your tools for any math operation.",
|
||||
tools=[add, subtract],
|
||||
llm=llm,
|
||||
)
|
||||
|
||||
query_agent = ReActAgent(
|
||||
name="info_lookup",
|
||||
description="Looks up information about XYZ",
|
||||
system_prompt="Use your tool to query a RAG system to answer information about XYZ",
|
||||
tools=[query_engine_tool],
|
||||
llm=llm
|
||||
)
|
||||
|
||||
# Create and run the workflow
|
||||
agent = AgentWorkflow(
|
||||
agents=[calculator_agent, query_agent], root_agent="calculator"
|
||||
)
|
||||
|
||||
# Run the system
|
||||
response = await agent.run(user_msg="Can you add 5 and 3?")
|
||||
```
|
||||
|
||||
<Tip>Haven't learned enough yet? There is a lot more to discover about agents and tools in LlamaIndex within the <a href="https://docs.llamaindex.ai/en/stable/examples/agent/agent_workflow_basic/">AgentWorkflow Basic Introduction</a> or the <a href="https://docs.llamaindex.ai/en/stable/understanding/agent/">Agent Learning Guide</a>, where you can read more about streaming, context serialization, and human-in-the-loop!</Tip>
|
||||
|
||||
Now that we understand the basics of agents and tools in LlamaIndex, let's see how we can use LlamaIndex to **create configurable and manageable workflows!**
|
||||
239
units/en/unit2/llama-index/components.mdx
Normal file
239
units/en/unit2/llama-index/components.mdx
Normal file
@@ -0,0 +1,239 @@
|
||||
# What are components in LlamaIndex?
|
||||
|
||||
Remember Alfred, our helpful butler agent from Unit 1?
|
||||
To assist us effectively, Alfred needs to understand our requests and **prepare, find and use relevant information to help complete tasks.**
|
||||
This is where LlamaIndex's components come in.
|
||||
|
||||
While LlamaIndex has many components, **we'll focus specifically on the `QueryEngine` component.**
|
||||
Why? Because it can be used as a Retrieval-Augmented Generation (RAG) tool for an agent.
|
||||
|
||||
So, what is RAG? LLMs are trained on enormous bodies of data to learn general knowledge.
|
||||
However, they may not be trained on relevant and up-to-date data.
|
||||
RAG solves this problem by finding and retrieving relevant information from your data and giving that to the LLM.
|
||||
|
||||

|
||||
|
||||
Now, think about how Alfred works:
|
||||
|
||||
1. You ask Alfred to help plan a dinner party
|
||||
2. Alfred needs to check your calendar, dietary preferences, and past successful menus
|
||||
3. The `QueryEngine` helps Alfred find this information and use it to plan the dinner party
|
||||
|
||||
This makes the `QueryEngine` **a key component for building agentic RAG workflows** in LlamaIndex.
|
||||
Just as Alfred needs to search through your household information to be helpful, any agent needs a way to find and understand relevant data.
|
||||
The `QueryEngine` provides exactly this capability.
|
||||
|
||||
Now, let's dive a bit deeper into the components and see how you can **combine components to create a RAG pipeline.**
|
||||
|
||||
## Creating a RAG pipeline using components
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/components.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
There are five key stages within RAG, which in turn will be a part of most larger applications you build. These are:
|
||||
|
||||
1. **Loading**: this refers to getting your data from where it lives -- whether it's text files, PDFs, another website, a database, or an API -- into your workflow. LlamaHub provides hundreds of integrations to choose from.
|
||||
2. **Indexing**: this means creating a data structure that allows for querying the data. For LLMs, this nearly always means creating vector embeddings. Which are numerical representations of the meaning of the data. Indexing can also refer to numerous other metadata strategies to make it easy to accurately find contextually relevant data based on properties.
|
||||
3. **Storing**: once your data is indexed you will want to store your index, as well as other metadata, to avoid having to re-index it.
|
||||
4. **Querying**: for any given indexing strategy there are many ways you can utilize LLMs and LlamaIndex data structures to query, including sub-queries, multi-step queries and hybrid strategies.
|
||||
5. **Evaluation**: a critical step in any flow is checking how effective it is relative to other strategies, or when you make changes. Evaluation provides objective measures of how accurate, faithful and fast your responses to queries are.
|
||||
|
||||
Next, let's see how we can reproduce these stages using components.
|
||||
|
||||
### Loading and embedding documents
|
||||
|
||||
As mentioned before, LlamaIndex can work on top of your own data, however, **before accessing data, we need to load it.**
|
||||
There are three main ways to load data into LlamaIndex:
|
||||
|
||||
1. `SimpleDirectoryReader`: A built-in loader for various file types from a local directory.
|
||||
2. `LlamaParse`: LlamaParse, LlamaIndex's official tool for PDF parsing, available as a managed API.
|
||||
3. `LlamaHub`: A registry of hundreds of data-loading libraries to ingest data from any source.
|
||||
|
||||
<Tip>Get familiar with <a href="https://docs.llamaindex.ai/en/stable/module_guides/loading/connector/">LlamaHub</a> loaders and <a href="https://github.com/run-llama/llama_cloud_services/blob/main/parse.md">LlamaParse</a> parser for more complex data sources.</Tip>
|
||||
|
||||
**The simplest way to load data is with `SimpleDirectoryReader`.**
|
||||
This versatile component can load various file types from a folder and convert them into `Document` objects that LlamaIndex can work with.
|
||||
Let's see how we can use `SimpleDirectoryReader` to load data from a folder.
|
||||
|
||||
```python
|
||||
from llama_index.core import SimpleDirectoryReader
|
||||
|
||||
reader = SimpleDirectoryReader(input_dir="path/to/directory")
|
||||
documents = reader.load_data()
|
||||
```
|
||||
|
||||
After loading our documents, we need to break them into smaller pieces called `Node` objects.
|
||||
A `Node` is just a chunk of text from the original document that's easier for the AI to work with, while it still has references to the original `Document` object.
|
||||
|
||||
The `IngestionPipeline` helps us create these nodes through two key transformations.
|
||||
1. `SentenceSplitter` breaks down documents into manageable chunks by splitting them at natural sentence boundaries.
|
||||
2. `HuggingFaceInferenceAPIEmbedding` converts each chunk into numerical embeddings - vector representations that capture the semantic meaning in a way AI can process efficiently.
|
||||
|
||||
This process helps us organise our documents in a way that's more useful for searching and analysis.
|
||||
|
||||
```python
|
||||
from llama_index.core import Document
|
||||
from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding
|
||||
from llama_index.core.node_parser import SentenceSplitter
|
||||
from llama_index.core.ingestion import IngestionPipeline
|
||||
|
||||
# create the pipeline with transformations
|
||||
pipeline = IngestionPipeline(
|
||||
transformations=[
|
||||
SentenceSplitter(chunk_overlap=0),
|
||||
HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5"),
|
||||
]
|
||||
)
|
||||
|
||||
nodes = await pipeline.arun(documents=[Document.example()])
|
||||
```
|
||||
|
||||
|
||||
### Storing and indexing documents
|
||||
|
||||
After creating our `Node` objects we need to index them to make them searchable, but before we can do that, we need a place to store our data.
|
||||
|
||||
Since we are using an ingestion pipeline, we can directly attach a vector store to the pipeline to populate it.
|
||||
In this case, we will use `Chroma` to store our documents.
|
||||
|
||||
<details>
|
||||
<summary>Install ChromaDB</summary>
|
||||
As introduced in the [section on the LlamaHub](llama-hub), we can install the ChromaDB vector store with the following command:
|
||||
|
||||
```bash
|
||||
pip install llama-index-vector-stores-chroma
|
||||
```
|
||||
</details>
|
||||
|
||||
```python
|
||||
import chromadb
|
||||
from llama_index.vector_stores.chroma import ChromaVectorStore
|
||||
|
||||
db = chromadb.PersistentClient(path="./alfred_chroma_db")
|
||||
chroma_collection = db.get_or_create_collection("alfred")
|
||||
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
|
||||
|
||||
pipeline = IngestionPipeline(
|
||||
transformations=[
|
||||
SentenceSplitter(chunk_size=25, chunk_overlap=0),
|
||||
HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5"),
|
||||
],
|
||||
vector_store=vector_store,
|
||||
)
|
||||
```
|
||||
|
||||
<Tip>An overview of the different vector stores can be found in the <a href="https://docs.llamaindex.ai/en/stable/module_guides/storing/vector_stores/">LlamaIndex documentation</a>.</Tip>
|
||||
|
||||
|
||||
This is where vector embeddings come in - by embedding both the query and nodes in the same vector space, we can find relevant matches.
|
||||
The `VectorStoreIndex` handles this for us, using the same embedding model we used during ingestion to ensure consistency.
|
||||
|
||||
Let's see how to create this index from our vector store and embeddings:
|
||||
|
||||
```python
|
||||
from llama_index.core import VectorStoreIndex
|
||||
from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding
|
||||
|
||||
embed_model = HuggingFaceInferenceAPIEmbedding(model_name="BAAI/bge-small-en-v1.5")
|
||||
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)
|
||||
```
|
||||
|
||||
All infomration is automatically persisted within the `ChromaVectorStore` object and the passed directory path.
|
||||
|
||||
Great! Now that we can save and load our index easily, let's explore how to query it in different ways.
|
||||
|
||||
### Querying a VectorStoreIndex with prompts and LLMs
|
||||
|
||||
Before we can query our index, we need to convert it to a query interface. The most common conversion options are:
|
||||
|
||||
- `as_retriever`: For basic document retrieval, returning a list of `NodeWithScore` objects with similarity scores
|
||||
- `as_query_engine`: For single question-answer interactions, returning a written response
|
||||
- `as_chat_engine`: For conversational interactions that maintain memory across multiple messages, returning a written response using chat history and indexed context
|
||||
|
||||
We'll focus on the query engine since it is more common for agent-like interactions.
|
||||
We also pass in an LLM to the query engine to use for the response.
|
||||
|
||||
```python
|
||||
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
|
||||
|
||||
llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
|
||||
query_engine = index.as_query_engine(
|
||||
llm=llm,
|
||||
response_mode="tree_summarize",
|
||||
)
|
||||
query_engine.query("What is the meaning of life?")
|
||||
# The meaning of life is 42
|
||||
```
|
||||
|
||||
### Response Processing
|
||||
|
||||
Under the hood, the query engine doesn't only use the LLM to answer the question but also uses a `ResponseSynthesizer` as a strategy to process the response.
|
||||
Once again, this is fully customisable but there are three main strategies that work well out of the box:
|
||||
|
||||
- `refine`: create and refine an answer by sequentially going through each retrieved text chunk. This makes a separate LLM call per Node/retrieved chunk.
|
||||
- `compact` (default): similar to refining but concatenating the chunks beforehand, resulting in fewer LLM calls.
|
||||
- `tree_summarize`: create a detailed answer by going through each retrieved text chunk and creating a tree structure of the answer.
|
||||
|
||||
<Tip>Take fine-grained control of your query workflows with the <a href="https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/usage_pattern/#low-level-composition-api">low-level composition API</a>. This API lets you customize and fine-tune every step of the query process to match your exact needs, which also pairs great with <a href="https://docs.llamaindex.ai/en/stable/module_guides/workflow/">Workflows</a> </Tip>
|
||||
|
||||
The language model won't always perform in predictable ways, so we can't be sure that the answer we get is always correct. We can deal with this by **evaluating the quality of the answer**.
|
||||
|
||||
### Evaluation and observability
|
||||
|
||||
LlamaIndex provides **built-in evaluation tools to assess response quality.**
|
||||
These evaluators leverage LLMs to analyze responses across different dimensions.
|
||||
Let's look at the three main evaluators available:
|
||||
|
||||
- `FaithfulnessEvaluator`: Evaluates the faithfulness of the answer by checking if the answer is supported by the context.
|
||||
- `AnswerRelevancyEvaluator`: Evaluate the relevance of the answer by checking if the answer is relevant to the question.
|
||||
- `CorrectnessEvaluator`: Evaluate the correctness of the answer by checking if the answer is correct.
|
||||
|
||||
```python
|
||||
from llama_index.core.evaluation import FaithfulnessEvaluator
|
||||
|
||||
query_engine = # from the previous section
|
||||
llm = # from the previous section
|
||||
|
||||
# query index
|
||||
evaluator = FaithfulnessEvaluator(llm=llm)
|
||||
response = query_engine.query(
|
||||
"What battles took place in New York City in the American Revolution?"
|
||||
)
|
||||
eval_result = evaluator.evaluate_response(response=response)
|
||||
eval_result.passing
|
||||
```
|
||||
|
||||
Even without direct evaluation, we can **gain insights into how our system is performing through observability.**
|
||||
This is especially useful when we are building more complex workflows and want to understand how each component is performing.
|
||||
|
||||
<details>
|
||||
<summary>Install LlamaTrace</summary>
|
||||
As introduced in the [section on the LlamaHub](llama-hub), we can install the LlamaTrace callback from Arize Phoenix with the following command:
|
||||
|
||||
```bash
|
||||
pip install -U llama-index-callbacks-arize-phoenix
|
||||
```
|
||||
|
||||
Additionally, we need to set the `PHOENIX_API_KEY` environment variable to our LlamaTrace API key. We can get this by:
|
||||
- Creating an account at [LlamaTrace](https://llamatrace.com/login)
|
||||
- Generating an API key in your account settings
|
||||
- Using the API key in the code below to enable tracing
|
||||
|
||||
</details>
|
||||
|
||||
```python
|
||||
import llama_index
|
||||
import os
|
||||
|
||||
PHOENIX_API_KEY = "<PHOENIX_API_KEY>"
|
||||
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
|
||||
llama_index.core.set_global_handler(
|
||||
"arize_phoenix",
|
||||
endpoint="https://llamatrace.com/v1/traces"
|
||||
)
|
||||
```
|
||||
|
||||
<Tip>Want to learn more about components and how to use them? Continue your journey with the <a href="https://docs.llamaindex.ai/en/stable/module_guides/">Components Guides</a> or the <a href="https://docs.llamaindex.ai/en/stable/understanding/rag/">Guide on RAG</a>.</Tip>
|
||||
|
||||
We have seen how to use components to create a `QueryEngine`. Now, let's see how we can **use the `QueryEngine` as a tool for an agent!**
|
||||
13
units/en/unit2/llama-index/conclusion.mdx
Normal file
13
units/en/unit2/llama-index/conclusion.mdx
Normal file
@@ -0,0 +1,13 @@
|
||||
# Conclusion
|
||||
|
||||
Congratulations on finishing the `llama-index` module of this second Unit 🥳
|
||||
|
||||
You’ve just mastered the fundamentals of `llama-index` and you’ve seen how to build your own agentic workflows!
|
||||
Now that you have skills in `llama-index`, you can start to create search engines that will solve tasks you're interested in.
|
||||
|
||||
In the next module of the unit, you're going to learn **how to build Agents with LangGraph**.
|
||||
|
||||
Finally, we would love **to hear what you think of the course and how we can improve it**.
|
||||
If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
|
||||
|
||||
### Keep Learning, and stay awesome 🤗
|
||||
30
units/en/unit2/llama-index/introduction.mdx
Normal file
30
units/en/unit2/llama-index/introduction.mdx
Normal file
@@ -0,0 +1,30 @@
|
||||
# Introduction to LlamaIndex
|
||||
|
||||
Welcome to this module, where you’ll learn how to build LLM-powered agents using the [LlamaIndex](https://www.llamaindex.ai/) toolkit.
|
||||
|
||||
LlamaIndex is **a complete toolkit for creating LLM-powered agents over your data using indexes and workflows**. For this course we'll focus on three main parts that help build agents in LlamaIndex: **Components**, **Agents and Tools** and **Workflows**.
|
||||
|
||||

|
||||
|
||||
Let's look at these key parts of LlamaIndex and how they help with agents:
|
||||
|
||||
- **Components** are the basic building blocks you use in LlamaIndex. These include things like prompts, models and databases. Components often help connect LlamaIndex with other tools and libraries.
|
||||
- **Tools**: Tools are components that provide specific capabilities like searching, calculating, or accessing external services. They are the building blocks that enable agents to perform tasks.
|
||||
- **Agents**: Agents are autonomous components that can use tools and make decisions. They coordinate tool usage to accomplish complex goals.
|
||||
- **Workflows** are step-by-step processes that process logic together. Workflows or agentic workflows are a way to structure agentic behaviour without the explicit use of agents.
|
||||
|
||||
|
||||
## What Makes LlamaIndex Special?
|
||||
|
||||
While LlamaIndex does some things similar to other frameworks like smolagents, it has some key benefits:
|
||||
|
||||
- **Clear Workflow System**. Workflows help break down how agents should make decisions step by step using an event-driven and async-first syntax. This helps you clearly compose and organize your logic.
|
||||
- **Advanced Document Parsing with LlamaParse** LlamaParse was made specifically for LlamaIndex, so the integration is seamless, although it is a paid feature.
|
||||
- **Many Ready-to-Use Components** LlamaIndex has been around for a while, so it works with lots of other frameworks. This means it has many tested and reliable components, like LLMs, retrievers, indexes, and more.
|
||||
- **LlamaHub** is a registry of hundreds of these components, agents and tools that you can use within LlamaIndex.
|
||||
|
||||
All of these concepts are required in different scenarios to create useful agents.
|
||||
In the following sections, we will go over each of these concepts in detail.
|
||||
After mastering the concepts, we will use our learnings to **create applied use cases with Alfred the agent**!
|
||||
|
||||
Getting our hands on LlamaIndex is exciting, right? So, what are we waiting for? Let's get started with **finding and installing the integrations we need using LlamaHub! 🚀**
|
||||
46
units/en/unit2/llama-index/llama-hub.mdx
Normal file
46
units/en/unit2/llama-index/llama-hub.mdx
Normal file
@@ -0,0 +1,46 @@
|
||||
# Introduction to the LlamaHub
|
||||
|
||||
**LlamaHub is a registry of hundreds of integrations, agents and tools that you can use within LlamaIndex.**
|
||||
|
||||

|
||||
|
||||
We will be using various integrations in this course, so let's first look at the LlamaHub and how it can help us.
|
||||
|
||||
Let's see how to find and install the dependencies for the components we need.
|
||||
|
||||
## Installation
|
||||
|
||||
LlamaIndex installation instructions are available as a well-structured **overview on [LlamaHub](https://llamahub.ai/)**.
|
||||
This might be a bit overwhelming at first, but most of the **installation commands generally follow an easy-to-remember format**:
|
||||
|
||||
```bash
|
||||
pip install llama-index-{component-type}-{framework-name}
|
||||
```
|
||||
|
||||
Let's try to install the dependencies for an LLM component using the [Hugging Face inference API integration](https://llamahub.ai/l/llms/llama-index-llms-huggingface-api?from=llms).
|
||||
|
||||
```bash
|
||||
pip install llama-index-llms-huggingface-api
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
Once installed, we can see the usage patterns. You'll notice that the import paths follow the install command!
|
||||
Underneath, we can see an example of the usage of **the Hugging Face inference API for an LLM component**.
|
||||
|
||||
```python
|
||||
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
|
||||
|
||||
llm = HuggingFaceInferenceAPI(
|
||||
model_name="Qwen/Qwen2.5-Coder-32B-Instruct",
|
||||
temperature=0.7,
|
||||
max_tokens=100,
|
||||
token="hf_xxx",
|
||||
)
|
||||
|
||||
llm.complete("Hello, how are you?")
|
||||
# I am good, how can I help you today?
|
||||
```
|
||||
|
||||
Wonderful, we now know how to find, install and use the integrations for the components we need.
|
||||
**Let's dive deeper into the components** and see how we can use them to build our own agents.
|
||||
117
units/en/unit2/llama-index/quiz1.mdx
Normal file
117
units/en/unit2/llama-index/quiz1.mdx
Normal file
@@ -0,0 +1,117 @@
|
||||
# Small Quiz (ungraded) [[quiz1]]
|
||||
|
||||
So far we've discussed the key components and tools used in LlamaIndex.
|
||||
It's time to make a short quiz, since **testing yourself** is the best way to learn and [to avoid the illusion of competence](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf).
|
||||
This will help you find **where you need to reinforce your knowledge**.
|
||||
|
||||
This is an optional quiz and it's not graded.
|
||||
|
||||
### Q1: What is a QueryEngine?
|
||||
Which of the following best describes a QueryEngine component?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "A system that only processes static text without any retrieval capabilities.",
|
||||
explain: "A QueryEngine must be able to retrieve and process relevant information.",
|
||||
},
|
||||
{
|
||||
text: "A component that finds and retrieves relevant information as part of the RAG process.",
|
||||
explain: "This captures the core purpose of a QueryEngine component.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "A tool that only stores vector embeddings without search functionality.",
|
||||
explain: "A QueryEngine does more than just store embeddings - it actively searches and retrieves information.",
|
||||
},
|
||||
{
|
||||
text: "A component that only evaluates response quality.",
|
||||
explain: "Evaluation is separate from the QueryEngine's main retrieval purpose.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q2: What is the Purpose of FunctionTools?
|
||||
Why are FunctionTools important for an Agent?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "To handle large amounts of data storage.",
|
||||
explain: "FunctionTools are not primarily for data storage.",
|
||||
},
|
||||
{
|
||||
text: "To convert Python functions into tools that an agent can use.",
|
||||
explain: "FunctionTools wrap Python functions to make them accessible to agents.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "To allow agents to create random functions definitions.",
|
||||
explain: "FunctionTools serve the specific purpose of making functions available to agents.",
|
||||
},
|
||||
{
|
||||
text: "To only process text data.",
|
||||
explain: "FunctionTools can work with various types of functions, not just text processing.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q3: What are Toolspecs in LlamaIndex?
|
||||
What is the main purpose of Toolspecs?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "They are redundant components that don't add functionality.",
|
||||
explain: "Toolspecs serve an important purpose in the LlamaIndex ecosystem.",
|
||||
},
|
||||
{
|
||||
text: "They are sets of community-created tools that extend agent capabilities.",
|
||||
explain: "Toolspecs allow the community to share and reuse tools.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "They are used solely for memory management.",
|
||||
explain: "Toolspecs are about providing tools, not managing memory.",
|
||||
},
|
||||
{
|
||||
text: "They only work with text processing.",
|
||||
explain: "Toolspecs can include various types of tools, not just text processing.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q4: What is Required to create a tool?
|
||||
What information must be included when creating a tool?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "A function, a name, and description must be defined.",
|
||||
explain: "While these all make up a tool, the name and description can be parsed from the function and docstring.",
|
||||
},
|
||||
{
|
||||
text: "Only the name is required.",
|
||||
explain: "A function and description/docstring is also required for proper tool documentation.",
|
||||
},
|
||||
{
|
||||
text: "Only the description is required.",
|
||||
explain: "A function is required so that we have code to run when an agent selects a tool",
|
||||
},
|
||||
{
|
||||
text: "Only the function is required.",
|
||||
explain: "The name and description default to the name and docstring from the provided function",
|
||||
correct: true
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce your knowledge. If you pass it, you're ready to dive deeper into building with these components!
|
||||
112
units/en/unit2/llama-index/quiz2.mdx
Normal file
112
units/en/unit2/llama-index/quiz2.mdx
Normal file
@@ -0,0 +1,112 @@
|
||||
# Quick Self-Check (ungraded) [[quiz2]]
|
||||
|
||||
What?! Another Quiz? We know, we know, ... 😅 But this short, ungraded quiz is here to **help you reinforce key concepts you've just learned**.
|
||||
|
||||
This quiz covers agent workflows and interactions - essential components for building effective AI agents.
|
||||
|
||||
### Q1: What is the purpose of AgentWorkflow in LlamaIndex?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "To run one or more agents with tools",
|
||||
explain: "Yes, the AgentWorkflow is the main way to quickly create a system with one or more agents.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "To create a single agent that can query your data without memory",
|
||||
explain: "No, the AgentWorkflow is more capable than that, the QueryEngine is for simple queries over your data.",
|
||||
},
|
||||
{
|
||||
text: "To automatically build tools for agents",
|
||||
explain: "The AgentWorkflow does not build tools, that is the job of the developer.",
|
||||
},
|
||||
{
|
||||
text: "To manage agent memory and state",
|
||||
explain: "Managing memory and state is not the primary purpose of AgentWorkflow.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q2: What object is used for keeping track of the state of the workflow?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "State",
|
||||
explain: "State is not the correct object for workflow state management.",
|
||||
},
|
||||
{
|
||||
text: "Context",
|
||||
explain: "Context is the correct object used for keeping track of workflow state.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "WorkflowState",
|
||||
explain: "WorkflowState is not the correct object.",
|
||||
},
|
||||
{
|
||||
text: "Management",
|
||||
explain: "Management is not a valid object for workflow state.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q3: Which method should be used if you want an agent to remember previous interactions?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "run(query_str)",
|
||||
explain: ".run(query_str) does not maintain conversation history.",
|
||||
},
|
||||
{
|
||||
text: "chat(query_str, ctx=ctx)",
|
||||
explain: "chat() is not a valid method on workflows.",
|
||||
},
|
||||
{
|
||||
text: "interact(query_str)",
|
||||
explain: "interact() is not a valid method for agent interactions.",
|
||||
},
|
||||
{
|
||||
text: "run(query_str, ctx=ctx",
|
||||
explain: "By passing in and maintaining the context, we can maintain state!",
|
||||
correct: true
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q4: What is a key feature of Agentic RAG?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "It can only use document-based tools, to answer questions in a RAG workflow",
|
||||
explain: "Agentic RAG can use different tools, including document-based tools.",
|
||||
},
|
||||
{
|
||||
text: "It automatically answers questions without tools, like a chatbot",
|
||||
explain: "Agentic RAG does use tools to answer questions.",
|
||||
},
|
||||
{
|
||||
text: "It can decide to use any tool to answer questions, including RAG tools",
|
||||
explain: "Agentic RAG has the flexibility to use different tools to answer questions.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "It only works with Function Calling Agents",
|
||||
explain: "Agentic RAG is not limited to Function Calling Agents.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
|
||||
Got it? Great! Now let's **do a brief recap of the unit!**
|
||||
111
units/en/unit2/llama-index/tools.mdx
Normal file
111
units/en/unit2/llama-index/tools.mdx
Normal file
@@ -0,0 +1,111 @@
|
||||
# Using Tools in LlamaIndex
|
||||
|
||||
**Defining a clear set of Tools is crucial to performance.** As we discussed in [unit 1](../../unit1/tools), clear tool interfaces are easier for LLMs to use.
|
||||
Much like a software API interface for human engineers, they can get more out of the tool if it's easy to understand how it works.
|
||||
|
||||
There are **four main types of tools in LlamaIndex**:
|
||||
|
||||

|
||||
|
||||
1. `FunctionTool`: Convert any Python function into a tool that an agent can use. It automatically figures out how the function works.
|
||||
2. `QueryEngineTool`: A tool that lets agents use query engines. Since agents are built on query engines, they can also use other agents as tools.
|
||||
3. `Toolspecs`: Sets of tools created by the community, which often include tools for specific services like Gmail.
|
||||
4. `Utility Tools`: Special tools that help handle large amounts of data from other tools.
|
||||
|
||||
We will go over each of them in more detail below.
|
||||
|
||||
## Creating a FunctionTool
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/tools.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
A FunctionTool provides a simple way to wrap any Python function and make it available to an agent.
|
||||
You can pass either a synchronous or asynchronous function to the tool, along with optional `name` and `description` parameters.
|
||||
The name and description are particularly important as they help the agent understand when and how to use the tool effectively.
|
||||
Let's look at how to create a FunctionTool below and then call it.
|
||||
|
||||
```python
|
||||
from llama_index.core.tools import FunctionTool
|
||||
|
||||
def get_weather(location: str) -> str:
|
||||
"""Useful for getting the weather for a given location."""
|
||||
print(f"Getting weather for {location}")
|
||||
return f"The weather in {location} is sunny"
|
||||
|
||||
tool = FunctionTool.from_defaults(
|
||||
get_weather,
|
||||
name="my_weather_tool",
|
||||
description="Useful for getting the weather for a given location.",
|
||||
)
|
||||
tool.call("New York")
|
||||
```
|
||||
|
||||
<Tip>When using an agent or LLM with function calling, the tool selected (and the arguments written for that tool) rely strongly on the tool name and description of the purpose and arguments of the tool. Learn more about function calling in the <a href="https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/modules/function_calling.html">Function Calling Guide</a> and <a href="https://docs.llamaindex.ai/en/stable/understanding/agent/function_calling.html">Function Calling Learning Guide</a>.</Tip>
|
||||
|
||||
## Creating a QueryEngineTool
|
||||
|
||||
The `QueryEngine` we defined in the previous unit can be easily transformed into a tool using the `QueryEngineTool` class.
|
||||
Let's see how to create a `QueryEngineTool` from a `QueryEngine` in the example below.
|
||||
|
||||
```python
|
||||
from llama_index.core import VectorStoreIndex
|
||||
from llama_index.core.tools import QueryEngineTool
|
||||
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
|
||||
from llama_index.embeddings.huggingface_api import HuggingFaceInferenceAPIEmbedding
|
||||
from llama_index.vector_stores.chroma import ChromaVectorStore
|
||||
|
||||
embed_model = HuggingFaceInferenceAPIEmbedding("BAAI/bge-small-en-v1.5")
|
||||
|
||||
db = chromadb.PersistentClient(path="./alfred_chroma_db")
|
||||
chroma_collection = db.get_or_create_collection("alfred")
|
||||
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
|
||||
|
||||
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)
|
||||
|
||||
llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
|
||||
query_engine = index.as_query_engine(llm=llm)
|
||||
tool = QueryEngineTool.from_defaults(query_engine, name="some useful name", description="some useful description")
|
||||
```
|
||||
|
||||
## Creating Toolspecs
|
||||
|
||||
Think of `ToolSpecs` as collections of tools that work together harmoniously - like a well-organized professional toolkit.
|
||||
Just as a mechanic's toolkit contains complementary tools that work together for vehicle repairs, a `ToolSpec` combines related tools for specific purposes.
|
||||
For example, an accounting agent's `ToolSpec` might elegantly integrate spreadsheet capabilities, email functionality, and calculation tools to handle financial tasks with precision and efficiency.
|
||||
|
||||
<details>
|
||||
<summary>Install the Google Toolspec</summary>
|
||||
As introduced in the [section on the LlamaHub](llama-hub), we can install the Google toolspec with the following command:
|
||||
|
||||
```python
|
||||
pip install llama-index-tools-google
|
||||
```
|
||||
</details>
|
||||
|
||||
And now we can load the toolspec and convert it to a list of tools.
|
||||
|
||||
```python
|
||||
from llama_index.tools.google import GmailToolSpec
|
||||
|
||||
tool_spec = GmailToolSpec()
|
||||
tool_spec_list = tool_spec.to_tool_list()
|
||||
```
|
||||
|
||||
To get a more detailed view of the tools, we can take a look at the `metadata` of each tool.
|
||||
|
||||
```python
|
||||
[(tool.metadata.name, tool.metadata.description) for tool in tool_spec_list]
|
||||
```
|
||||
|
||||
## Utility Tools
|
||||
|
||||
Oftentimes, directly querying an API **can return an excessive amount of data**, some of which may be irrelevant, overflow the context window of the LLM, or unnecessarily increase the number of tokens that you are using.
|
||||
Let's walk through our two main utility tools below.
|
||||
|
||||
1. `OnDemandToolLoader`: This tool turns any existing LlamaIndex data loader (BaseReader class) into a tool that an agent can use. The tool can be called with all the parameters needed to trigger `load_data` from the data loader, along with a natural language query string. During execution, we first load data from the data loader, index it (for instance with a vector store), and then query it 'on-demand'. All three of these steps happen in a single tool call.
|
||||
2. `LoadAndSearchToolSpec`: The LoadAndSearchToolSpec takes in any existing Tool as input. As a tool spec, it implements `to_tool_list`, and when that function is called, two tools are returned: a loading tool and then a search tool. The load Tool execution would call the underlying Tool, and the index the output (by default with a vector index). The search Tool execution would take in a query string as input and call the underlying index.
|
||||
|
||||
<Tip>You can find toolspecs and utility tools on the <a href="https://llamahub.ai/">LlamaHub</a></Tip>
|
||||
|
||||
Now that we understand the basics of agents and tools in LlamaIndex, let's see how we can **use LlamaIndex to create configurable and manageable workflows!**
|
||||
280
units/en/unit2/llama-index/workflows.mdx
Normal file
280
units/en/unit2/llama-index/workflows.mdx
Normal file
@@ -0,0 +1,280 @@
|
||||
# Creating agentic workflows in LlamaIndex
|
||||
|
||||
A workflow in LlamaIndex provides a structured way to organize your code into sequential and manageable steps.
|
||||
|
||||
Such a workflow is created by defining `Steps` which are triggered by `Events`, and themselves emit `Events` to trigger further steps.
|
||||
Let's take a look at Alfred showing a LlamaIndex workflow for a RAG task.
|
||||
|
||||

|
||||
|
||||
**Workflows offer several key benefits:**
|
||||
|
||||
- Clear organization of code into discrete steps
|
||||
- Event-driven architecture for flexible control flow
|
||||
- Type-safe communication between steps
|
||||
- Built-in state management
|
||||
- Support for both simple and complex agent interactions
|
||||
|
||||
As you might have guessed, **workflows strike a great balance between the autonomy of agents while maintaining control over the overall workflow.**
|
||||
|
||||
So, let's learn how to create a workflow ourselves!
|
||||
|
||||
## Creating Workflows
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/llama-index/workflows.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
### Basic Workflow Creation
|
||||
|
||||
<details>
|
||||
<summary>Install the Workflow package</summary>
|
||||
As introduced in the [section on the LlamaHub](llama-hub), we can install the Workflow package with the following command:
|
||||
|
||||
```python
|
||||
pip install llama-index-utils-workflow
|
||||
```
|
||||
</details>
|
||||
|
||||
We can create a single-step workflow by defining a class that inherits from `Workflow` and decorating your functions with `@step`.
|
||||
We will also need to add `StartEvent` and `StopEvent`, which are special events that are used to indicate the start and end of the workflow.
|
||||
|
||||
```python
|
||||
from llama_index.core.workflow import StartEvent, StopEvent, Workflow, step
|
||||
|
||||
class MyWorkflow(Workflow):
|
||||
@step
|
||||
async def my_step(self, ev: StartEvent) -> StopEvent:
|
||||
# do something here
|
||||
return StopEvent(result="Hello, world!")
|
||||
|
||||
|
||||
w = MyWorkflow(timeout=10, verbose=False)
|
||||
result = await w.run()
|
||||
```
|
||||
|
||||
As you can see, we can now run the workflow by calling `w.run()`.
|
||||
|
||||
### Connecting Multiple Steps
|
||||
|
||||
To connect multiple steps, we **create custom events that carry data between steps.**
|
||||
To do so, we need to add an `Event` that is passed between the steps and transfers the output of the first step to the second step.
|
||||
|
||||
```python
|
||||
from llama_index.core.workflow import Event
|
||||
|
||||
class ProcessingEvent(Event):
|
||||
intermediate_result: str
|
||||
|
||||
class MultiStepWorkflow(Workflow):
|
||||
@step
|
||||
async def step_one(self, ev: StartEvent) -> ProcessingEvent:
|
||||
# Process initial data
|
||||
return ProcessingEvent(intermediate_result="Step 1 complete")
|
||||
|
||||
@step
|
||||
async def step_two(self, ev: ProcessingEvent) -> StopEvent:
|
||||
# Use the intermediate result
|
||||
final_result = f"Finished processing: {ev.intermediate_result}"
|
||||
return StopEvent(result=final_result)
|
||||
|
||||
w = MultiStepWorkflow(timeout=10, verbose=False)
|
||||
result = await w.run()
|
||||
result
|
||||
```
|
||||
|
||||
The type hinting is important here, as it ensures that the workflow is executed correctly. Let's complicate things a bit more!
|
||||
|
||||
### Loops and Branches
|
||||
|
||||
The type hinting is the most powerful part of workflows because it allows us to create branches, loops, and joins to facilitate more complex workflows.
|
||||
|
||||
Let's show an example of **creating a loop** by using the union operator `|`.
|
||||
In the example below, we see that the `LoopEvent` is taken as input for the step and can also be returned as output.
|
||||
|
||||
```python
|
||||
from llama_index.core.workflow import Event
|
||||
import random
|
||||
|
||||
|
||||
class ProcessingEvent(Event):
|
||||
intermediate_result: str
|
||||
|
||||
|
||||
class LoopEvent(Event):
|
||||
loop_output: str
|
||||
|
||||
|
||||
class MultiStepWorkflow(Workflow):
|
||||
@step
|
||||
async def step_one(self, ev: StartEvent) -> ProcessingEvent | LoopEvent:
|
||||
if random.randint(0, 1) == 0:
|
||||
print("Bad thing happened")
|
||||
return LoopEvent(loop_output="Back to step one.")
|
||||
else:
|
||||
print("Good thing happened")
|
||||
return ProcessingEvent(intermediate_result="First step complete.")
|
||||
|
||||
@step
|
||||
async def step_two(self, ev: ProcessingEvent | LoopEvent) -> StopEvent:
|
||||
# Use the intermediate result
|
||||
final_result = f"Finished processing: {ev.intermediate_result}"
|
||||
return StopEvent(result=final_result)
|
||||
|
||||
|
||||
w = MultiStepWorkflow(verbose=False)
|
||||
result = await w.run()
|
||||
result
|
||||
```
|
||||
|
||||
### Drawing Workflows
|
||||
|
||||
We can also draw workflows. Let's use the `draw_all_possible_flows` function to draw the workflow. This stores the workflow in an HTML file.
|
||||
|
||||
```python
|
||||
from llama_index.utils.workflow import draw_all_possible_flows
|
||||
|
||||
w = ... # as defined in the previous section
|
||||
draw_all_possible_flows(w, "flow.html")
|
||||
```
|
||||
|
||||

|
||||
|
||||
There is one last cool trick that we will cover in the course, which is the ability to add state to the workflow.
|
||||
|
||||
### State Management
|
||||
|
||||
State management is useful when you want to keep track of the state of the workflow, so that every step has access to the same state.
|
||||
We can do this by using the `Context` type hint on top of a parameter in the step function.
|
||||
|
||||
```python
|
||||
from llama_index.core.workflow import Context, StartEvent, StopEvent
|
||||
|
||||
|
||||
@step
|
||||
async def query(self, ctx: Context, ev: StartEvent) -> StopEvent:
|
||||
# store in context
|
||||
await ctx.set("query", "What is the capital of France?")
|
||||
|
||||
# do something with context and event
|
||||
val = ...
|
||||
|
||||
# retrieve from context
|
||||
query = await ctx.get("query")
|
||||
|
||||
return StopEvent(result=result)
|
||||
```
|
||||
|
||||
Great! Now you know how to create basic workflows in LlamaIndex!
|
||||
|
||||
<Tip>There are some more complex nuances to workflows, which you can learn about in <a href="https://docs.llamaindex.ai/en/stable/understanding/workflows/">the LlamaIndex documentation</a>.</Tip>
|
||||
|
||||
However, there is another way to create workflows, which relies on the `AgentWorkflow` class. Let's take a look at how we can use this to create a multi-agent workflow.
|
||||
|
||||
## Automating workflows with Multi-Agent Workflows
|
||||
|
||||
Instead of manual workflow creation, we can use the **`AgentWorkflow` class to create a multi-agent workflow**.
|
||||
The `AgentWorkflow` uses Workflow Agents to allow you to create a system of one or more agents that can collaborate and hand off tasks to each other based on their specialized capabilities.
|
||||
This enables building complex agent systems where different agents handle different aspects of a task.
|
||||
Instead of importing classes from `llama_index.core.agent`, we will import the agent classes from `llama_index.core.agent.workflow`.
|
||||
One agent must be designated as the root agent in the `AgentWorkflow` constructor.
|
||||
When a user message comes in, it is first routed to the root agent.
|
||||
|
||||
Each agent can then:
|
||||
|
||||
- Handle the request directly using their tools
|
||||
- Handoff to another agent better suited for the task
|
||||
- Return a response to the user
|
||||
|
||||
Let's see how to create a multi-agent workflow.
|
||||
|
||||
```python
|
||||
from llama_index.core.agent.workflow import AgentWorkflow, ReActAgent
|
||||
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI
|
||||
|
||||
# Define some tools
|
||||
def add(a: int, b: int) -> int:
|
||||
"""Add two numbers."""
|
||||
return a + b
|
||||
|
||||
def multiply(a: int, b: int) -> int:
|
||||
"""Multiply two numbers."""
|
||||
return a * b
|
||||
|
||||
llm = HuggingFaceInferenceAPI(model_name="Qwen/Qwen2.5-Coder-32B-Instruct")
|
||||
|
||||
# we can pass functions directly without FunctionTool -- the fn/docstring are parsed for the name/description
|
||||
multiply_agent = ReActAgent(
|
||||
name="multiply_agent",
|
||||
description="Is able to multiply two integers",
|
||||
system_prompt="A helpful assistant that can use a tool to multiply numbers.",
|
||||
tools=[multiply],
|
||||
llm=llm,
|
||||
)
|
||||
|
||||
addition_agent = ReActAgent(
|
||||
name="add_agent",
|
||||
description="Is able to add two integers",
|
||||
system_prompt="A helpful assistant that can use a tool to add numbers.",
|
||||
tools=[add],
|
||||
llm=llm,
|
||||
)
|
||||
|
||||
# Create the workflow
|
||||
workflow = AgentWorkflow(
|
||||
agents=[multiply_agent, addition_agent],
|
||||
root_agent="multiply_agent",
|
||||
)
|
||||
|
||||
# Run the system
|
||||
response = await workflow.run(user_msg="Can you add 5 and 3?")
|
||||
```
|
||||
|
||||
Agent tools can also modify the workflow state we mentioned earlier. Before starting the workflow, we can provide an initial state dict that will be available to all agents.
|
||||
The state is stored in the state key of the workflow context. It will be injected into the state_prompt which augments each new user message.
|
||||
|
||||
Let's inject a counter to count function calls by modifying the previous example:
|
||||
|
||||
```python
|
||||
from llama_index.core.workflow import Context
|
||||
|
||||
# Define some tools
|
||||
async def add(ctx: Context, a: int, b: int) -> int:
|
||||
"""Add two numbers."""
|
||||
# update our count
|
||||
cur_state = await ctx.get("state")
|
||||
cur_state["num_fn_calls"] += 1
|
||||
await ctx.set("state", cur_state)
|
||||
|
||||
return a + b
|
||||
|
||||
async def multiply(ctx: Context, a: int, b: int) -> int:
|
||||
"""Multiply two numbers."""
|
||||
# update our count
|
||||
cur_state = await ctx.get("state")
|
||||
cur_state["num_fn_calls"] += 1
|
||||
await ctx.set("state", cur_state)
|
||||
|
||||
return a * b
|
||||
|
||||
...
|
||||
|
||||
workflow = AgentWorkflow(
|
||||
agents=[multiply_agent, addition_agent],
|
||||
root_agent="multiply_agent"
|
||||
initial_state={"num_fn_calls": 0},
|
||||
state_prompt="Current state: {state}. User message: {msg}",
|
||||
)
|
||||
|
||||
# run the workflow with context
|
||||
ctx = Context(workflow)
|
||||
response = await workflow.run(user_msg="Can you add 5 and 3?", ctx=ctx)
|
||||
|
||||
# pull out and inspect the state
|
||||
state = await ctx.get("state")
|
||||
print(state["num_fn_calls"])
|
||||
```
|
||||
|
||||
Congratulations! You have now mastered the basics of Agents in LlamaIndex! 🎉
|
||||
|
||||
Let's continue with one final quiz to solidify your knowledge! 🚀
|
||||
379
units/en/unit2/smolagents/code_agents.mdx
Normal file
379
units/en/unit2/smolagents/code_agents.mdx
Normal file
@@ -0,0 +1,379 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/code_agents.ipynb"},
|
||||
]} />
|
||||
|
||||
# Building Agents That Use Code
|
||||
|
||||
Code agents are the default agent type in `smolagents`. They generate Python tool calls to perform actions, achieving action representations that are efficient, expressive, and accurate.
|
||||
|
||||
Their streamlined approach reduces the number of required actions, simplifies complex operations, and enables reuse of existing code functions. `smolagents` provides a lightweight framework for building code agents, implemented in approximately 1,000 lines of code.
|
||||
|
||||

|
||||
Graphic from the paper [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030)
|
||||
|
||||
<Tip>
|
||||
If you want to learn more about why code agents are effective, check out <a href="https://huggingface.co/docs/smolagents/en/conceptual_guides/intro_agents#code-agents" target="_blank">this guide</a> from the smolagents documentation.
|
||||
</Tip>
|
||||
|
||||
## Why Code Agents?
|
||||
|
||||
In a multi-step agent process, the LLM writes and executes actions, typically involving external tool calls. Traditional approaches use a JSON format to specify tool names and arguments as strings, **which the system must parse to determine which tool to execute**.
|
||||
|
||||
However, research shows that **tool-calling LLMs work more effectively with code directly**. This is a core principle of `smolagents`, as shown in the diagram above from [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030).
|
||||
|
||||
Writing actions in code rather than JSON offers several key advantages:
|
||||
|
||||
* **Composability**: Easily combine and reuse actions
|
||||
* **Object Management**: Work directly with complex structures like images
|
||||
* **Generality**: Express any computationally possible task
|
||||
* **Natural for LLMs**: High-quality code is already present in LLM training data
|
||||
|
||||
## How Does a Code Agent Work?
|
||||
|
||||

|
||||
|
||||
The diagram above illustrates how `CodeAgent.run()` operates, following the ReAct framework we mentioned in Unit 1. The main abstraction for agents in `smolagents` is a `MultiStepAgent`, which serves as the core building block. `CodeAgent` is a special kind of `MultiStepAgent`, as we will see in an example below.
|
||||
|
||||
A `CodeAgent` performs actions through a cycle of steps, with existing variables and knowledge being incorporated into the agent's context, which is kept in an execution log:
|
||||
|
||||
1. The system prompt is stored in a `SystemPromptStep`, and the user query is logged in a `TaskStep`.
|
||||
|
||||
2. Then, the following while loop is executed:
|
||||
|
||||
2.1 Method `agent.write_memory_to_messages()` writes the agent's logs into a list of LLM-readable [chat messages](https://huggingface.co/docs/transformers/en/chat_templating).
|
||||
|
||||
2.2 These messages are sent to a `Model`, which generates a completion.
|
||||
|
||||
2.3 The completion is parsed to extract the action, which, in our case, should be a code snippet since we're working with a `CodeAgent`.
|
||||
|
||||
2.4 The action is executed.
|
||||
|
||||
2.5 The results are logged into memory in an `ActionStep`.
|
||||
|
||||
At the end of each step, if the agent includes any function calls (in `agent.step_callback`), they are executed.
|
||||
|
||||
## Let's See Some Examples
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/code_agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
Alfred is planning a party at the Wayne family mansion and needs your help to ensure everything goes smoothly. To assist him, we'll apply what we've learned about how a multi-step `CodeAgent` operates.
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/alfred-party.jpg" alt="Alfred Party"/>
|
||||
|
||||
If you haven't installed `smolagents` yet, you can do so by running the following command:
|
||||
|
||||
```bash
|
||||
pip install smolagents -U
|
||||
```
|
||||
|
||||
Let's also login to the Hugging Face Hub to have access to the Serverless Inference API.
|
||||
|
||||
```python
|
||||
from huggingface_hub import login
|
||||
|
||||
login()
|
||||
```
|
||||
|
||||
### Selecting a Playlist for the Party Using `smolagents`
|
||||
|
||||
Music is an essential part of a successful party! Alfred needs some help selecting the playlist. Luckily, `smolagents` has got us covered! We can build an agent capable of searching the web using DuckDuckGo. To give the agent access to this tool, we include it in the tool list when creating the agent.
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/alfred-playlist.jpg" alt="Alfred Playlist"/>
|
||||
|
||||
For the model, we'll rely on `HfApiModel`, which provides access to Hugging Face's [Serverless Inference API](https://huggingface.co/docs/api-inference/index). The default model is `"Qwen/Qwen2.5-Coder-32B-Instruct"`, which is performant and available for fast inference, but you can select any compatible model from the Hub.
|
||||
|
||||
Running an agent is quite straightforward:
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
|
||||
|
||||
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel())
|
||||
|
||||
agent.run("Search for the best music recommendations for a party at the Wayne's mansion.")
|
||||
```
|
||||
|
||||
When you run this example, the output will **display a trace of the workflow steps being executed**. It will also print the corresponding Python code with the message:
|
||||
|
||||
```python
|
||||
─ Executing parsed code: ────────────────────────────────────────────────────────────────────────────────────────
|
||||
results = web_search(query="best music for a Batman party")
|
||||
print(results)
|
||||
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
```
|
||||
|
||||
After a few steps, you'll see the generated playlist that Alfred can use for the party! 🎵
|
||||
|
||||
### Using a Custom Tool to Prepare the Menu
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/alfred-menu.jpg" alt="Alfred Menu"/>
|
||||
|
||||
Now that we have selected a playlist, we need to organize the menu for the guests. Again, Alfred can take advantage of `smolagents` to do so. Here, we use the `@tool` decorator to define a custom function that acts as a tool. We'll cover tool creation in more detail later, so for now, we can simply run the code.
|
||||
|
||||
As you can see in the example below, we will create a tool using the `@tool` decorator and include it in the `tools` list.
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, tool, HfApiModel
|
||||
|
||||
# Tool to suggest a menu based on the occasion
|
||||
@tool
|
||||
def suggest_menu(occasion: str) -> str:
|
||||
"""
|
||||
Suggests a menu based on the occasion.
|
||||
Args:
|
||||
occasion: The type of occasion for the party.
|
||||
"""
|
||||
if occasion == "casual":
|
||||
return "Pizza, snacks, and drinks."
|
||||
elif occasion == "formal":
|
||||
return "3-course dinner with wine and dessert."
|
||||
elif occasion == "superhero":
|
||||
return "Buffet with high-energy and healthy food."
|
||||
else:
|
||||
return "Custom menu for the butler."
|
||||
|
||||
# Alfred, the butler, preparing the menu for the party
|
||||
agent = CodeAgent(tools=[suggest_menu], model=HfApiModel())
|
||||
|
||||
# Preparing the menu for the party
|
||||
agent.run("Prepare a formal menu for the party.")
|
||||
```
|
||||
|
||||
The agent will run for a few steps until finding the answer.
|
||||
|
||||
The menu is ready! 🥗
|
||||
|
||||
### Using Python Imports Inside the Agent
|
||||
|
||||
We have the playlist and menu ready, but we need to check one more crucial detail: preparation time!
|
||||
|
||||
Alfred needs to calculate when everything would be ready if he started preparing now, in case they need assistance from other superheroes.
|
||||
|
||||
`smolagents` specializes in agents that write and execute Python code snippets, offering sandboxed execution for security.
|
||||
**Code execution has strict security measures** - imports outside a predefined safe list are blocked by default. However, you can authorize additional imports by passing them as strings in `additional_authorized_imports`.
|
||||
For more details on secure code execution, see the official [guide](https://huggingface.co/docs/smolagents/tutorials/secure_code_execution).
|
||||
|
||||
When creating the agent, we'll use `additional_authorized_imports` to allow for importing the `datetime` module.
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, HfApiModel
|
||||
import numpy as np
|
||||
import time
|
||||
import datetime
|
||||
|
||||
agent = CodeAgent(tools=[], model=HfApiModel(), additional_authorized_imports=['datetime'])
|
||||
|
||||
agent.run(
|
||||
"""
|
||||
Alfred needs to prepare for the party. Here are the tasks:
|
||||
1. Prepare the drinks - 30 minutes
|
||||
2. Decorate the mansion - 60 minutes
|
||||
3. Set up the menu - 45 minutes
|
||||
4. Prepare the music and playlist - 45 minutes
|
||||
|
||||
If we start right now, at what time will the party be ready?
|
||||
"""
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
These examples are just the beginning of what you can do with code agents, and we're already starting to see their utility for preparing the party.
|
||||
You can learn more about how to build code agents in the [smolagents documentation](https://huggingface.co/docs/smolagents).
|
||||
|
||||
In summary, `smolagents` specializes in agents that write and execute Python code snippets, offering sandboxed execution for security. It supports both local and API-based language models, making it adaptable to various development environments.
|
||||
|
||||
### Sharing Our Custom Party Preparator Agent to the Hub
|
||||
|
||||
Wouldn't it be **amazing to share our very own Alfred agent with the community**? By doing so, anyone can easily download and use the agent directly from the Hub, bringing the ultimate party planner of Gotham to their fingertips! Let's make it happen! 🎉
|
||||
|
||||
The `smolagents` library makes this possible by allowing you to share a complete agent with the community and download others for immediate use. It's as simple as the following:
|
||||
|
||||
```python
|
||||
# Change to your username and repo name
|
||||
agent.push_to_hub('sergiopaniego/AlfredAgent')
|
||||
```
|
||||
|
||||
To download the agent again, use the code below:
|
||||
|
||||
```python
|
||||
# Change to your username and repo name
|
||||
alfred_agent = agent.from_hub('sergiopaniego/AlfredAgent', trust_remote_code=True)
|
||||
|
||||
alfred_agent.run("Give me the best playlist for a party at Wayne's mansion. The party idea is a 'villain masquerade' theme")
|
||||
```
|
||||
|
||||
What's also exciting is that shared agents are directly available as Hugging Face Spaces, allowing you to interact with them in real-time. You can explore other agents [here](https://huggingface.co/spaces/davidberenstein1957/smolagents-and-tools).
|
||||
|
||||
For example, the _AlfredAgent_ is available [here](https://huggingface.co/spaces/sergiopaniego/AlfredAgent). You can try it out directly below:
|
||||
|
||||
<iframe
|
||||
src="https://sergiopaniego-alfredagent.hf.space/"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
You may be wondering—how did Alfred build such an agent using `smolagents`? By integrating several tools, he can generate an agent as follows. Don't worry about the tools for now, as we'll have a dedicated section later in this unit to explore that in detail:
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, DuckDuckGoSearchTool, FinalAnswerTool, HfApiModel, Tool, tool, VisitWebpageTool
|
||||
|
||||
@tool
|
||||
def suggest_menu(occasion: str) -> str:
|
||||
"""
|
||||
Suggests a menu based on the occasion.
|
||||
Args:
|
||||
occasion: The type of occasion for the party.
|
||||
"""
|
||||
if occasion == "casual":
|
||||
return "Pizza, snacks, and drinks."
|
||||
elif occasion == "formal":
|
||||
return "3-course dinner with wine and dessert."
|
||||
elif occasion == "superhero":
|
||||
return "Buffet with high-energy and healthy food."
|
||||
else:
|
||||
return "Custom menu for the butler."
|
||||
|
||||
@tool
|
||||
def catering_service_tool(query: str) -> str:
|
||||
"""
|
||||
This tool returns the highest-rated catering service in Gotham City.
|
||||
|
||||
Args:
|
||||
query: A search term for finding catering services.
|
||||
"""
|
||||
# Example list of catering services and their ratings
|
||||
services = {
|
||||
"Gotham Catering Co.": 4.9,
|
||||
"Wayne Manor Catering": 4.8,
|
||||
"Gotham City Events": 4.7,
|
||||
}
|
||||
|
||||
# Find the highest rated catering service (simulating search query filtering)
|
||||
best_service = max(services, key=services.get)
|
||||
|
||||
return best_service
|
||||
|
||||
class SuperheroPartyThemeTool(Tool):
|
||||
name = "superhero_party_theme_generator"
|
||||
description = """
|
||||
This tool suggests creative superhero-themed party ideas based on a category.
|
||||
It returns a unique party theme idea."""
|
||||
|
||||
inputs = {
|
||||
"category": {
|
||||
"type": "string",
|
||||
"description": "The type of superhero party (e.g., 'classic heroes', 'villain masquerade', 'futuristic Gotham').",
|
||||
}
|
||||
}
|
||||
|
||||
output_type = "string"
|
||||
|
||||
def forward(self, category: str):
|
||||
themes = {
|
||||
"classic heroes": "Justice League Gala: Guests come dressed as their favorite DC heroes with themed cocktails like 'The Kryptonite Punch'.",
|
||||
"villain masquerade": "Gotham Rogues' Ball: A mysterious masquerade where guests dress as classic Batman villains.",
|
||||
"futuristic Gotham": "Neo-Gotham Night: A cyberpunk-style party inspired by Batman Beyond, with neon decorations and futuristic gadgets."
|
||||
}
|
||||
|
||||
return themes.get(category.lower(), "Themed party idea not found. Try 'classic heroes', 'villain masquerade', or 'futuristic Gotham'.")
|
||||
|
||||
|
||||
# Alfred, the butler, preparing the menu for the party
|
||||
agent = CodeAgent(
|
||||
tools=[
|
||||
DuckDuckGoSearchTool(),
|
||||
VisitWebpageTool(),
|
||||
suggest_menu,
|
||||
catering_service_tool,
|
||||
SuperheroPartyThemeTool()
|
||||
],
|
||||
model=HfApiModel(),
|
||||
max_steps=10,
|
||||
verbosity_level=2
|
||||
)
|
||||
|
||||
agent.run("Give me the best playlist for a party at the Wayne's mansion. The party idea is a 'villain masquerade' theme")
|
||||
```
|
||||
|
||||
As you can see, we've created a `CodeAgent` with several tools that enhance the agent's functionality, turning it into the ultimate party planner ready to share with the community! 🎉
|
||||
|
||||
Now, it's your turn: build your very own agent and share it with the community using the knowledge we've just learned! 🕵️♂️💡
|
||||
|
||||
<Tip>
|
||||
If you would like to share your agent project, then make a space and tag the [agents-course](https://huggingface.co/agents-course) on the Hugging Face Hub. We'd love to see what you've created!
|
||||
</Tip>
|
||||
|
||||
### Inspecting Our Party Preparator Agent with OpenTelemetry and Langfuse 📡
|
||||
|
||||
As Alfred fine-tunes the Party Preparator Agent, he's growing weary of debugging its runs. Agents, by nature, are unpredictable and difficult to inspect. But since he aims to build the ultimate Party Preparator Agent and deploy it in production, he needs robust traceability for future monitoring and analysis.
|
||||
|
||||
Once again, `smolagents` comes to the rescue! It embraces the [OpenTelemetry](https://opentelemetry.io/) standard for instrumenting agent runs, allowing seamless inspection and logging. With the help of [Langfuse](https://langfuse.com/) and the `SmolagentsInstrumentor`, Alfred can easily track and analyze his agent’s behavior.
|
||||
|
||||
Setting it up is straightforward!
|
||||
|
||||
First, we need to install the necessary dependencies:
|
||||
|
||||
```bash
|
||||
pip install opentelemetry-sdk opentelemetry-exporter-otlp openinference-instrumentation-smolagents
|
||||
```
|
||||
|
||||
Next, Alfred has already created an account on Langfuse and has his API keys ready. If you haven’t done so yet, you can sign up for Langfuse Cloud [here](https://cloud.langfuse.com/) or explore [alternatives](https://huggingface.co/docs/smolagents/tutorials/inspect_runs).
|
||||
|
||||
Once you have your API keys, they need to be properly configured as follows:
|
||||
|
||||
```python
|
||||
import os
|
||||
import base64
|
||||
|
||||
LANGFUSE_PUBLIC_KEY="pk-lf-..."
|
||||
LANGFUSE_SECRET_KEY="sk-lf-..."
|
||||
LANGFUSE_AUTH=base64.b64encode(f"{LANGFUSE_PUBLIC_KEY}:{LANGFUSE_SECRET_KEY}".encode()).decode()
|
||||
|
||||
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://cloud.langfuse.com/api/public/otel" # EU data region
|
||||
# os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://us.cloud.langfuse.com/api/public/otel" # US data region
|
||||
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"
|
||||
```
|
||||
|
||||
Finally, Alfred is ready to initialize the `SmolagentsInstrumentor` and start tracking his agent's performance.
|
||||
|
||||
```python
|
||||
from opentelemetry.sdk.trace import TracerProvider
|
||||
|
||||
from openinference.instrumentation.smolagents import SmolagentsInstrumentor
|
||||
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
|
||||
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
|
||||
|
||||
trace_provider = TracerProvider()
|
||||
trace_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter()))
|
||||
|
||||
SmolagentsInstrumentor().instrument(tracer_provider=trace_provider)
|
||||
```
|
||||
|
||||
Alfred is now connected 🔌! The runs from `smolagents` are being logged in Langfuse, giving him full visibility into the agent's behavior. With this setup, he's ready to revisit previous runs and refine his Party Preparator Agent even further.
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, HfApiModel
|
||||
|
||||
agent = CodeAgent(tools=[], model=HfApiModel())
|
||||
alfred_agent = agent.from_hub('sergiopaniego/AlfredAgent', trust_remote_code=True)
|
||||
alfred_agent.run("Give me the best playlist for a party at Wayne's mansion. The party idea is a 'villain masquerade' theme")
|
||||
```
|
||||
|
||||
Alfred can now access these logs [here](https://cloud.langfuse.com/project/cm7bq0abj025rad078ak3luwi/traces/995fc019255528e4f48cf6770b0ce27b?timestamp=2025-02-19T10%3A28%3A36.929Z) to review and analyze them.
|
||||
|
||||
Meanwhile, the [suggested playlist](https://open.spotify.com/playlist/0gZMMHjuxMrrybQ7wTMTpw) sets the perfect vibe for the party preparations. Cool, right? 🎶
|
||||
|
||||
---
|
||||
|
||||
Now that we have created our first Code Agent, let's **learn how we can create Tool Calling Agents**, the second type of agent available in `smolagents`.
|
||||
|
||||
## Resources
|
||||
|
||||
- [smolagents Blog](https://huggingface.co/blog/smolagents) - Introduction to smolagents and code interactions
|
||||
- [smolagents: Building Good Agents](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) - Best practices for reliable agents
|
||||
- [Building Effective Agents - Anthropic](https://www.anthropic.com/research/building-effective-agents) - Agent design principles
|
||||
- [Sharing runs with OpenTelemetry](https://huggingface.co/docs/smolagents/tutorials/inspect_runs) - Details about how to setup OpenTelemetry for tracking your agents.
|
||||
11
units/en/unit2/smolagents/conclusion.mdx
Normal file
11
units/en/unit2/smolagents/conclusion.mdx
Normal file
@@ -0,0 +1,11 @@
|
||||
# Conclusion
|
||||
|
||||
Congratulations on finishing the `smolagents` module of this second Unit 🥳
|
||||
|
||||
You’ve just mastered the fundamentals of `smolagents` and you’ve built your own Agent! Now that you have skills in `smolagents`, you can now start to create Agents that will solve tasks you're interested about.
|
||||
|
||||
In the next module, you're going to learn **how to build Agents with LlamaIndex**.
|
||||
|
||||
Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
|
||||
|
||||
### Keep Learning, stay awesome 🤗
|
||||
25
units/en/unit2/smolagents/final_quiz.mdx
Normal file
25
units/en/unit2/smolagents/final_quiz.mdx
Normal file
@@ -0,0 +1,25 @@
|
||||
# Exam Time!
|
||||
|
||||
Well done on working through the material on `smolagents`! You've already achieved a lot. Now, it's time to put your knowledge to the test with a quiz. 🧠
|
||||
|
||||
## Instructions
|
||||
|
||||
- The quiz consists of code questions.
|
||||
- You will be given instructions to complete the code snippets.
|
||||
- Read the instructions carefully and complete the code snippets accordingly.
|
||||
- For each question, you will be given the result and some feedback.
|
||||
|
||||
🧘 **This quiz is ungraded and uncertified**. It's about you understanding the `smolagents` library and knowing whether you should spend more time on the written material. In the coming units you'll put this knowledge to the test in use cases and projects.
|
||||
|
||||
Let's get started!
|
||||
|
||||
## Quiz 🚀
|
||||
|
||||
<iframe
|
||||
src="https://agents-course-unit2-smolagents-quiz.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
You can also access the quiz 👉 [here](https://huggingface.co/spaces/agents-course/unit2_smolagents_quiz)
|
||||
69
units/en/unit2/smolagents/introduction.mdx
Normal file
69
units/en/unit2/smolagents/introduction.mdx
Normal file
@@ -0,0 +1,69 @@
|
||||
# Introduction to `smolagents`
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/thumbnail.jpg" alt="Unit 2.1 Thumbnail"/>
|
||||
|
||||
Welcome to this module, where you'll learn **how to build effective agents** using the [`smolagents`](https://github.com/huggingface/smolagents) library, which provides a lightweight framework for creating capable AI agents.
|
||||
|
||||
`smolagents` is a Hugging Face library; therefore, we would appreciate your support by **starring** the smolagents [`repository`](https://github.com/huggingface/smolagents) :
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/star_smolagents.gif" alt="staring smolagents"/>
|
||||
|
||||
## Module Overview
|
||||
|
||||
This module provides a comprehensive overview of key concepts and practical strategies for building intelligent agents using `smolagents`.
|
||||
|
||||
With so many open-source frameworks available, it's essential to understand the components and capabilities that make `smolagents` a useful option or to determine when another solution might be a better fit.
|
||||
|
||||
We'll explore critical agent types, including code agents designed for software development tasks, tool calling agents for creating modular, function-driven workflows, and retrieval agents that access and synthesize information.
|
||||
|
||||
Additionally, we'll cover the orchestration of multiple agents as well as the integration of vision capabilities and web browsing, which unlock new possibilities for dynamic and context-aware applications.
|
||||
|
||||
In this unit, Alfred, the agent from Unit 1, makes his return. This time, he’s using the `smolagents` framework for his internal workings. Together, we’ll explore the key concepts behind this framework as Alfred tackles various tasks. Alfred is organizing a party at the Wayne Manor while the Wayne family 🦇 is away, and he has plenty to do. Join us as we showcase his journey and how he handles these tasks with `smolagents`!
|
||||
|
||||
<Tip>
|
||||
|
||||
In this unit, you will learn to build AI agents with the `smolagents` library. Your agents will be able to search for data, execute code, and interact with web pages. You will also learn how to combine multiple agents to create more powerful systems.
|
||||
|
||||
</Tip>
|
||||
|
||||

|
||||
|
||||
## Contents
|
||||
|
||||
During this unit on `smolagents`, we cover:
|
||||
|
||||
### 1️⃣ [Why Use smolagents](./why_use_smolagents)
|
||||
|
||||
`smolagents` is one of the many open-source agent frameworks available for application development. Alternative options include `LlamaIndex` and `LangGraph`, which are also covered in other modules in this course. `smolagents` offers several key features that might make it a great fit for specific use cases, but we should always consider all options when selecting a framework. We'll explore the advantages and drawbacks of using `smolagents`, helping you make an informed decision based on your project's requirements.
|
||||
|
||||
### 2️⃣ [CodeAgents](./code_agents)
|
||||
|
||||
`CodeAgents` are the primary type of agent in `smolagents`. Instead of generating JSON or text, these agents produce Python code to perform actions. This module explores their purpose, functionality, and how they work, along with hands-on examples to showcase their capabilities.
|
||||
|
||||
### 3️⃣ [ToolCallingAgents](./tool_calling_agents)
|
||||
|
||||
`ToolCallingAgents` are the second type of agent supported by `smolagents`. Unlike `CodeAgents`, which generate Python code, these agents rely on JSON/text blobs that the system must parse and interpret to execute actions. This module covers their functionality, their key differences from `CodeAgents`, and it provides an example to illustrate their usage.
|
||||
|
||||
### 4️⃣ [Tools](./tools)
|
||||
|
||||
As we saw in Unit 1, tools are functions that an LLM can use within an agentic system, and they act as the essential building blocks for agent behavior. This module covers how to create tools, their structure, and different implementation methods using the `Tool` class or the `@tool` decorator. You'll also learn about the default toolbox, how to share tools with the community, and how to load community-contributed tools for use in your agents.
|
||||
|
||||
### 5️⃣ [Retrieval Agents](./retrieval_agents)
|
||||
|
||||
Retrieval agents allow models access to knowledge bases, making it possible to search, synthesize, and retrieve information from multiple sources. They leverage vector stores for efficient retrieval and implement **Retrieval-Augmented Generation (RAG)** patterns. These agents are particularly useful for integrating web search with custom knowledge bases while maintaining conversation context through memory systems. This module explores implementation strategies, including fallback mechanisms for robust information retrieval.
|
||||
|
||||
### 6️⃣ [Multi-Agent Systems](./multi_agent_systems)
|
||||
|
||||
Orchestrating multiple agents effectively is crucial for building powerful, multi-agent systems. By combining agents with different capabilities—such as a web search agent with a code execution agent—you can create more sophisticated solutions. This module focuses on designing, implementing, and managing multi-agent systems to maximize efficiency and reliability.
|
||||
|
||||
### 7️⃣ [Vision and Browser agents](./vision_agents)
|
||||
|
||||
Vision agents extend traditional agent capabilities by incorporating **Vision-Language Models (VLMs)**, enabling them to process and interpret visual information. This module explores how to design and integrate VLM-powered agents, unlocking advanced functionalities like image-based reasoning, visual data analysis, and multimodal interactions. We will also use vision agents to build a browser agent that can browse the web and extract information from it.
|
||||
|
||||
## Resources
|
||||
|
||||
- [smolagents Documentation](https://huggingface.co/docs/smolagents) - Official docs for the smolagents library
|
||||
- [Building Effective Agents](https://www.anthropic.com/research/building-effective-agents) - Research paper on agent architectures
|
||||
- [Agent Guidelines](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) - Best practices for building reliable agents
|
||||
- [LangGraph Agents](https://langchain-ai.github.io/langgraph/) - Additional examples of agent implementations
|
||||
- [Function Calling Guide](https://platform.openai.com/docs/guides/function-calling) - Understanding function calling in LLMs
|
||||
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/) - Guide to implementing effective RAG
|
||||
413
units/en/unit2/smolagents/multi_agent_systems.mdx
Normal file
413
units/en/unit2/smolagents/multi_agent_systems.mdx
Normal file
@@ -0,0 +1,413 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/multiagent_notebook.ipynb"},
|
||||
]} />
|
||||
|
||||
# Multi-Agent Systems
|
||||
|
||||
Multi-agent systems enable **specialized agents to collaborate on complex tasks**, improving modularity, scalability, and robustness. Instead of relying on a single agent, tasks are distributed among agents with distinct capabilities.
|
||||
|
||||
In **smolagents**, different agents can be combined to generate Python code, call external tools, perform web searches, and more. By orchestrating these agents, we can create powerful workflows.
|
||||
|
||||
A typical setup might include:
|
||||
- A **Manager Agent** for task delegation
|
||||
- A **Code Interpreter Agent** for code execution
|
||||
- A **Web Search Agent** for information retrieval
|
||||
|
||||
The diagram below illustrates a simple multi-agent architecture where a **Manager Agent** coordinates a **Code Interpreter Tool** and a **Web Search Agent**, which in turn utilizes tools like the `DuckDuckGoSearchTool` and `VisitWebpageTool` to gather relevant information.
|
||||
|
||||
<img src="https://mermaid.ink/img/pako:eNp1kc1qhTAQRl9FUiQb8wIpdNO76eKubrmFks1oRg3VSYgjpYjv3lFL_2hnMWQOJwn5sqgmelRWleUSKLAtFs09jqhtoWuYUFfFAa6QA9QDTnpzamheuhxn8pt40-6l13UtS0ddhtQXj6dbR4XUGQg6zEYasTF393KjeSDGnDJKNxzj8I_7hLW5IOSmP9CH9hv_NL-d94d4DVNg84p1EnK4qlIj5hGClySWbadT-6OdsrL02MI8sFOOVkciw8zx8kaNspxnrJQE0fXKtjBMMs3JA-MpgOQwftIE9Bzj14w-cMznI_39E9Z3p0uFoA?type=png" style='background: white;'>
|
||||
|
||||
## Multi-Agent Systems in Action
|
||||
|
||||
A multi-agent system consists of multiple specialized agents working together under the coordination of an **Orchestrator Agent**. This approach enables complex workflows by distributing tasks among agents with distinct roles.
|
||||
|
||||
For example, a **Multi-Agent RAG system** can integrate:
|
||||
- A **Web Agent** for browsing the internet.
|
||||
- A **Retriever Agent** for fetching information from knowledge bases.
|
||||
- An **Image Generation Agent** for producing visuals.
|
||||
|
||||
All of these agents operate under an orchestrator that manages task delegation and interaction.
|
||||
|
||||
## Solving a complex task with a multi-agent hierarchy
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/multiagent_notebook.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
The reception is approaching! With your help, Alfred is now nearly finished with the preparations.
|
||||
|
||||
But now there's a problem: the Batmobile has disappeared. Alfred needs to find a replacement, and find it quickly.
|
||||
|
||||
Fortunately, a few biopics have been done on Bruce Wayne's life, so maybe Alfred could get a car left behind on one of the movie sets, and re-engineer it up to modern standards, which certainly would include a full self-driving option.
|
||||
|
||||
But this could be anywhere in the filming locations around the world - which could be numerous.
|
||||
|
||||
So Alfred wants your help. Could you build an agent able to solve this task?
|
||||
|
||||
> 👉 Find all Batman filming locations in the world, calculate the time to transfer via boat to there, and represent them on a map, with a color varying by boat transfer time. Also represent some supercar factories with the same boat transfer time.
|
||||
|
||||
Let's build this!
|
||||
|
||||
This example needs some additional packages, so let's install them first:
|
||||
|
||||
```bash
|
||||
pip install 'smolagents[litellm]' matplotlib geopandas shapely kaleido -q
|
||||
```
|
||||
|
||||
### We first make a tool to get the cargo plane transfer time.
|
||||
|
||||
```python
|
||||
import math
|
||||
from typing import Optional, Tuple
|
||||
|
||||
from smolagents import tool
|
||||
|
||||
|
||||
@tool
|
||||
def calculate_cargo_travel_time(
|
||||
origin_coords: Tuple[float, float],
|
||||
destination_coords: Tuple[float, float],
|
||||
cruising_speed_kmh: Optional[float] = 750.0, # Average speed for cargo planes
|
||||
) -> float:
|
||||
"""
|
||||
Calculate the travel time for a cargo plane between two points on Earth using great-circle distance.
|
||||
|
||||
Args:
|
||||
origin_coords: Tuple of (latitude, longitude) for the starting point
|
||||
destination_coords: Tuple of (latitude, longitude) for the destination
|
||||
cruising_speed_kmh: Optional cruising speed in km/h (defaults to 750 km/h for typical cargo planes)
|
||||
|
||||
Returns:
|
||||
float: The estimated travel time in hours
|
||||
|
||||
Example:
|
||||
>>> # Chicago (41.8781° N, 87.6298° W) to Sydney (33.8688° S, 151.2093° E)
|
||||
>>> result = calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093))
|
||||
"""
|
||||
|
||||
def to_radians(degrees: float) -> float:
|
||||
return degrees * (math.pi / 180)
|
||||
|
||||
# Extract coordinates
|
||||
lat1, lon1 = map(to_radians, origin_coords)
|
||||
lat2, lon2 = map(to_radians, destination_coords)
|
||||
|
||||
# Earth's radius in kilometers
|
||||
EARTH_RADIUS_KM = 6371.0
|
||||
|
||||
# Calculate great-circle distance using the haversine formula
|
||||
dlon = lon2 - lon1
|
||||
dlat = lat2 - lat1
|
||||
|
||||
a = (
|
||||
math.sin(dlat / 2) ** 2
|
||||
+ math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
|
||||
)
|
||||
c = 2 * math.asin(math.sqrt(a))
|
||||
distance = EARTH_RADIUS_KM * c
|
||||
|
||||
# Add 10% to account for non-direct routes and air traffic controls
|
||||
actual_distance = distance * 1.1
|
||||
|
||||
# Calculate flight time
|
||||
# Add 1 hour for takeoff and landing procedures
|
||||
flight_time = (actual_distance / cruising_speed_kmh) + 1.0
|
||||
|
||||
# Format the results
|
||||
return round(flight_time, 2)
|
||||
|
||||
|
||||
print(calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093)))
|
||||
```
|
||||
|
||||
### Setting up the agent
|
||||
|
||||
For the model provider, we use Together AI, one of the new [inference providers on the Hub](https://huggingface.co/blog/inference-providers)!
|
||||
|
||||
The GoogleSearchTool uses the [Serper API](https://serper.dev) to search the web, so this requires either having setup env variable `SERPAPI_API_KEY` and passing `provider="serpapi"` or having `SERPER_API_KEY` and passing `provider=serper`.
|
||||
|
||||
If you don't have any Serp API provider setup, you can use `DuckDuckGoSearchTool` but beware that it has a rate limit.
|
||||
|
||||
```python
|
||||
import os
|
||||
from PIL import Image
|
||||
from smolagents import CodeAgent, GoogleSearchTool, HfApiModel, VisitWebpageTool
|
||||
|
||||
model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct", provider="together")
|
||||
```
|
||||
|
||||
We can start by creating a simple agent as a baseline to give us a simple report.
|
||||
|
||||
```python
|
||||
task = """Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W), and return them to me as a pandas dataframe.
|
||||
Also give me some supercar factories with the same cargo plane transfer time."""
|
||||
```
|
||||
|
||||
```python
|
||||
agent = CodeAgent(
|
||||
model=model,
|
||||
tools=[GoogleSearchTool("serper"), VisitWebpageTool(), calculate_cargo_travel_time],
|
||||
additional_authorized_imports=["pandas"],
|
||||
max_steps=20,
|
||||
)
|
||||
```
|
||||
|
||||
```python
|
||||
result = agent.run(task)
|
||||
```
|
||||
|
||||
```python
|
||||
result
|
||||
```
|
||||
|
||||
In our case, it generates this output:
|
||||
|
||||
```python
|
||||
| | Location | Travel Time to Gotham (hours) |
|
||||
|--|------------------------------------------------------|------------------------------|
|
||||
| 0 | Necropolis Cemetery, Glasgow, Scotland, UK | 8.60 |
|
||||
| 1 | St. George's Hall, Liverpool, England, UK | 8.81 |
|
||||
| 2 | Two Temple Place, London, England, UK | 9.17 |
|
||||
| 3 | Wollaton Hall, Nottingham, England, UK | 9.00 |
|
||||
| 4 | Knebworth House, Knebworth, Hertfordshire, UK | 9.15 |
|
||||
| 5 | Acton Lane Power Station, Acton Lane, Acton, UK | 9.16 |
|
||||
| 6 | Queensboro Bridge, New York City, USA | 1.01 |
|
||||
| 7 | Wall Street, New York City, USA | 1.00 |
|
||||
| 8 | Mehrangarh Fort, Jodhpur, Rajasthan, India | 18.34 |
|
||||
| 9 | Turda Gorge, Turda, Romania | 11.89 |
|
||||
| 10 | Chicago, USA | 2.68 |
|
||||
| 11 | Hong Kong, China | 19.99 |
|
||||
| 12 | Cardington Studios, Northamptonshire, UK | 9.10 |
|
||||
| 13 | Warner Bros. Leavesden Studios, Hertfordshire, UK | 9.13 |
|
||||
| 14 | Westwood, Los Angeles, CA, USA | 6.79 |
|
||||
| 15 | Woking, UK (McLaren) | 9.13 |
|
||||
```
|
||||
|
||||
We could already improve this a bit by throwing in some dedicated planning steps, and adding more prompting.
|
||||
|
||||
Planning steps allow the agent to think ahead and plan its next steps, which can be useful for more complex tasks.
|
||||
|
||||
```python
|
||||
agent.planning_interval = 4
|
||||
|
||||
detailed_report = agent.run(f"""
|
||||
You're an expert analyst. You make comprehensive reports after visiting many websites.
|
||||
Don't hesitate to search for many queries at once in a for loop.
|
||||
For each data point that you find, visit the source url to confirm numbers.
|
||||
|
||||
{task}
|
||||
""")
|
||||
|
||||
print(detailed_report)
|
||||
```
|
||||
|
||||
```python
|
||||
detailed_report
|
||||
```
|
||||
|
||||
In our case, it generates this output:
|
||||
|
||||
```python
|
||||
| | Location | Travel Time (hours) |
|
||||
|--|--------------------------------------------------|---------------------|
|
||||
| 0 | Bridge of Sighs, Glasgow Necropolis, Glasgow, UK | 8.6 |
|
||||
| 1 | Wishart Street, Glasgow, Scotland, UK | 8.6 |
|
||||
```
|
||||
|
||||
|
||||
Thanks to these quick changes, we obtained a much more concise report by simply providing our agent a detailed prompt, and giving it planning capabilities!
|
||||
|
||||
The model's context window is quickly filling up. So **if we ask our agent to combine the results of detailed search with another, it will be slower and quickly ramp up tokens and costs**.
|
||||
|
||||
➡️ We need to improve the structure of our system.
|
||||
|
||||
### ✌️ Splitting the task between two agents
|
||||
|
||||
Multi-agent structures allow to separate memories between different sub-tasks, with two great benefits:
|
||||
- Each agent is more focused on its core task, thus more performant
|
||||
- Separating memories reduces the count of input tokens at each step, thus reducing latency and cost.
|
||||
|
||||
Let's create a team with a dedicated web search agent, managed by another agent.
|
||||
|
||||
The manager agent should have plotting capabilities to write its final report: so let us give it access to additional imports, including `matplotlib`, and `geopandas` + `shapely` for spatial plotting.
|
||||
|
||||
```python
|
||||
model = HfApiModel(
|
||||
"Qwen/Qwen2.5-Coder-32B-Instruct", provider="together", max_tokens=8096
|
||||
)
|
||||
|
||||
web_agent = CodeAgent(
|
||||
model=model,
|
||||
tools=[
|
||||
GoogleSearchTool(provider="serper"),
|
||||
VisitWebpageTool(),
|
||||
calculate_cargo_travel_time,
|
||||
],
|
||||
name="web_agent",
|
||||
description="Browses the web to find information",
|
||||
verbosity_level=0,
|
||||
max_steps=10,
|
||||
)
|
||||
```
|
||||
|
||||
The manager agent will need to do some mental heavy lifting.
|
||||
|
||||
So we give it the stronger model [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), and add a `planning_interval` to the mix.
|
||||
|
||||
```python
|
||||
from smolagents.utils import encode_image_base64, make_image_url
|
||||
from smolagents import OpenAIServerModel
|
||||
|
||||
|
||||
def check_reasoning_and_plot(final_answer, agent_memory):
|
||||
final_answer
|
||||
multimodal_model = OpenAIServerModel("gpt-4o", max_tokens=8096)
|
||||
filepath = "saved_map.png"
|
||||
assert os.path.exists(filepath), "Make sure to save the plot under saved_map.png!"
|
||||
image = Image.open(filepath)
|
||||
prompt = (
|
||||
f"Here is a user-given task and the agent steps: {agent_memory.get_succinct_steps()}. Now here is the plot that was made."
|
||||
"Please check that the reasoning process and plot are correct: do they correctly answer the given task?"
|
||||
"First list reasons why yes/no, then write your final decision: PASS in caps lock if it is satisfactory, FAIL if it is not."
|
||||
"Don't be harsh: if the plot mostly solves the task, it should pass."
|
||||
"To pass, a plot should be made using px.scatter_map and not any other method (scatter_map looks nicer)."
|
||||
)
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": prompt,
|
||||
},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {"url": make_image_url(encode_image_base64(image))},
|
||||
},
|
||||
],
|
||||
}
|
||||
]
|
||||
output = multimodal_model(messages).content
|
||||
print("Feedback: ", output)
|
||||
if "FAIL" in output:
|
||||
raise Exception(output)
|
||||
return True
|
||||
|
||||
|
||||
manager_agent = CodeAgent(
|
||||
model=HfApiModel("deepseek-ai/DeepSeek-R1", provider="together", max_tokens=8096),
|
||||
tools=[calculate_cargo_travel_time],
|
||||
managed_agents=[web_agent],
|
||||
additional_authorized_imports=[
|
||||
"geopandas",
|
||||
"plotly",
|
||||
"shapely",
|
||||
"json",
|
||||
"pandas",
|
||||
"numpy",
|
||||
],
|
||||
planning_interval=5,
|
||||
verbosity_level=2,
|
||||
final_answer_checks=[check_reasoning_and_plot],
|
||||
max_steps=15,
|
||||
)
|
||||
```
|
||||
|
||||
Let us inspect what this team looks like:
|
||||
|
||||
```python
|
||||
manager_agent.visualize()
|
||||
```
|
||||
|
||||
This will generate something like this, helping us understand the structure and relationship between agents and tools used:
|
||||
|
||||
```python
|
||||
CodeAgent | deepseek-ai/DeepSeek-R1
|
||||
├── ✅ Authorized imports: ['geopandas', 'plotly', 'shapely', 'json', 'pandas', 'numpy']
|
||||
├── 🛠️ Tools:
|
||||
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||||
│ ┃ Name ┃ Description ┃ Arguments ┃
|
||||
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||||
│ │ calculate_cargo_travel_time │ Calculate the travel time for a cargo │ origin_coords (`array`): Tuple of │
|
||||
│ │ │ plane between two points on Earth │ (latitude, longitude) for the │
|
||||
│ │ │ using great-circle distance. │ starting point │
|
||||
│ │ │ │ destination_coords (`array`): Tuple │
|
||||
│ │ │ │ of (latitude, longitude) for the │
|
||||
│ │ │ │ destination │
|
||||
│ │ │ │ cruising_speed_kmh (`number`): │
|
||||
│ │ │ │ Optional cruising speed in km/h │
|
||||
│ │ │ │ (defaults to 750 km/h for typical │
|
||||
│ │ │ │ cargo planes) │
|
||||
│ │ final_answer │ Provides a final answer to the given │ answer (`any`): The final answer to │
|
||||
│ │ │ problem. │ the problem │
|
||||
│ └─────────────────────────────┴───────────────────────────────────────┴───────────────────────────────────────┘
|
||||
└── 🤖 Managed agents:
|
||||
└── web_agent | CodeAgent | Qwen/Qwen2.5-Coder-32B-Instruct
|
||||
├── ✅ Authorized imports: []
|
||||
├── 📝 Description: Browses the web to find information
|
||||
└── 🛠️ Tools:
|
||||
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||||
┃ Name ┃ Description ┃ Arguments ┃
|
||||
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||||
│ web_search │ Performs a google web search for │ query (`string`): The search │
|
||||
│ │ your query then returns a string │ query to perform. │
|
||||
│ │ of the top search results. │ filter_year (`integer`): │
|
||||
│ │ │ Optionally restrict results to a │
|
||||
│ │ │ certain year │
|
||||
│ visit_webpage │ Visits a webpage at the given url │ url (`string`): The url of the │
|
||||
│ │ and reads its content as a │ webpage to visit. │
|
||||
│ │ markdown string. Use this to │ │
|
||||
│ │ browse webpages. │ │
|
||||
│ calculate_cargo_travel_time │ Calculate the travel time for a │ origin_coords (`array`): Tuple of │
|
||||
│ │ cargo plane between two points on │ (latitude, longitude) for the │
|
||||
│ │ Earth using great-circle │ starting point │
|
||||
│ │ distance. │ destination_coords (`array`): │
|
||||
│ │ │ Tuple of (latitude, longitude) │
|
||||
│ │ │ for the destination │
|
||||
│ │ │ cruising_speed_kmh (`number`): │
|
||||
│ │ │ Optional cruising speed in km/h │
|
||||
│ │ │ (defaults to 750 km/h for typical │
|
||||
│ │ │ cargo planes) │
|
||||
│ final_answer │ Provides a final answer to the │ answer (`any`): The final answer │
|
||||
│ │ given problem. │ to the problem │
|
||||
└─────────────────────────────┴───────────────────────────────────┴───────────────────────────────────┘
|
||||
```
|
||||
|
||||
```python
|
||||
manager_agent.run("""
|
||||
Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W).
|
||||
Also give me some supercar factories with the same cargo plane transfer time. You need at least 6 points in total.
|
||||
Represent this as spatial map of the world, with the locations represented as scatter points with a color that depends on the travel time, and save it to saved_map.png!
|
||||
|
||||
Here's an example of how to plot and return a map:
|
||||
import plotly.express as px
|
||||
df = px.data.carshare()
|
||||
fig = px.scatter_map(df, lat="centroid_lat", lon="centroid_lon", text="name", color="peak_hour", size=100,
|
||||
color_continuous_scale=px.colors.sequential.Magma, size_max=15, zoom=1)
|
||||
fig.show()
|
||||
fig.write_image("saved_image.png")
|
||||
final_answer(fig)
|
||||
|
||||
Never try to process strings using code: when you have a string to read, just print it and you'll see it.
|
||||
""")
|
||||
```
|
||||
|
||||
I don't know how that went in your run, but in mine, the manager agent skilfully divided tasks given to the web agent in `1. Search for Batman filming locations`, then `2. Find supercar factories`, before aggregating the lists and plotting the map.
|
||||
|
||||
Let's see what the map looks like by inspecting it directly from the agent state:
|
||||
|
||||
```python
|
||||
manager_agent.python_executor.state["fig"]
|
||||
```
|
||||
|
||||
This will output the map:
|
||||
|
||||

|
||||
|
||||
## Resources
|
||||
|
||||
- [Multi-Agent Systems](https://huggingface.co/docs/smolagents/main/en/examples/multiagents) – Overview of multi-agent systems.
|
||||
- [What is Agentic RAG?](https://weaviate.io/blog/what-is-agentic-rag) – Introduction to Agentic RAG.
|
||||
- [Multi-Agent RAG System 🤖🤝🤖 Recipe](https://huggingface.co/learn/cookbook/multiagent_rag_system) – Step-by-step guide to building a multi-agent RAG system.
|
||||
142
units/en/unit2/smolagents/quiz1.mdx
Normal file
142
units/en/unit2/smolagents/quiz1.mdx
Normal file
@@ -0,0 +1,142 @@
|
||||
# Small Quiz (ungraded) [[quiz1]]
|
||||
|
||||
Let's test your understanding of `smolagents` with a quick quiz! Remember, testing yourself helps reinforce learning and identify areas that may need review.
|
||||
|
||||
This is an optional quiz and it's not graded.
|
||||
|
||||
### Q1: What is one of the primary advantages of choosing `smolagents` over other frameworks?
|
||||
Which statement best captures a core strength of the `smolagents` approach?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "It uses highly specialized configuration files and a steep learning curve to ensure only expert developers can use it",
|
||||
explain: "smolagents is designed for simplicity and minimal code complexity, not steep learning curves.",
|
||||
},
|
||||
{
|
||||
text: "It supports a code-first approach with minimal abstractions, letting agents interact directly via Python function calls",
|
||||
explain: "Yes, smolagents emphasizes a straightforward, code-centric design with minimal abstractions.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "It focuses on JSON-based actions, removing the need for agents to write any code",
|
||||
explain: "While smolagents supports JSON-based tool calls (ToolCallingAgents), the library emphasizes code-based approaches with CodeAgents.",
|
||||
},
|
||||
{
|
||||
text: "It deeply integrates with a single LLM provider and specialized hardware",
|
||||
explain: "smolagents supports multiple model providers and does not require specialized hardware.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q2: In which scenario would you likely benefit most from using smolagents?
|
||||
Which situation aligns well with what smolagents does best?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "Prototyping or experimenting quickly with agent logic, particularly when your application is relatively straightforward",
|
||||
explain: "Yes. smolagents is designed for simple and nimble agent creation without extensive setup overhead.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "Building a large-scale enterprise system where you need dozens of microservices and real-time data pipelines",
|
||||
explain: "While possible, smolagents is more focused on lightweight, code-centric experimentation rather than heavy enterprise infrastructure.",
|
||||
},
|
||||
{
|
||||
text: "Needing a framework that only supports cloud-based LLMs and forbids local inference",
|
||||
explain: "smolagents offers flexible integration with local or hosted models, not exclusively cloud-based LLMs.",
|
||||
},
|
||||
{
|
||||
text: "A scenario that requires advanced orchestration, multi-modal perception, and enterprise-scale features out-of-the-box",
|
||||
explain: "While you can integrate advanced capabilities, smolagents itself is lightweight and minimal at its core.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q3: smolagents offers flexibility in model integration. Which statement best reflects its approach?
|
||||
Choose the most accurate description of how smolagents interoperates with LLMs.
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "It only provides a single built-in model and does not allow custom integrations",
|
||||
explain: "smolagents supports multiple different backends and user-defined models.",
|
||||
},
|
||||
{
|
||||
text: "It requires you to implement your own model connector for every LLM usage",
|
||||
explain: "There are multiple prebuilt connectors that make LLM integration straightforward.",
|
||||
},
|
||||
{
|
||||
text: "It only integrates with open-source LLMs but not commercial APIs",
|
||||
explain: "smolagents can integrate with both open-source and commercial model APIs.",
|
||||
},
|
||||
{
|
||||
text: "It can be used with a wide range of LLMs, offering predefined classes like TransformersModel, HfApiModel, and LiteLLMModel",
|
||||
explain: "This is correct. smolagents supports flexible model integration through various classes.",
|
||||
correct: true
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q4: How does smolagents handle the debate between code-based actions and JSON-based actions?
|
||||
Which statement correctly characterizes smolagents' philosophy about action formats?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "It only allows JSON-based actions for all agent tasks, requiring a parser to extract the tool calls",
|
||||
explain: "ToolCallingAgent uses JSON-based calls, but smolagents also provides a primary CodeAgent option that writes Python code.",
|
||||
},
|
||||
{
|
||||
text: "It focuses on code-based actions via a CodeAgent but also supports JSON-based tool calls with a ToolCallingAgent",
|
||||
explain: "Yes, smolagents primarily recommends code-based actions but includes a JSON-based alternative for users who prefer it or need it.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "It disallows any external function calls, instead requiring all logic to reside entirely within the LLM",
|
||||
explain: "smolagents is specifically designed to grant LLMs the ability to call tools or code externally.",
|
||||
},
|
||||
{
|
||||
text: "It requires users to manually convert every code snippet into a JSON object before running the agent",
|
||||
explain: "smolagents can automatically manage code snippet creation within the CodeAgent path, no manual JSON conversion necessary.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q5: How does smolagents integrate with the Hugging Face Hub for added benefits?
|
||||
Which statement accurately describes one of the core advantages of Hub integration?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "It automatically upgrades all public models to commercial license tiers",
|
||||
explain: "Hub integration doesn't change the license tier for models or tools.",
|
||||
},
|
||||
{
|
||||
text: "It disables local inference entirely, forcing remote model usage only",
|
||||
explain: "Users can still do local inference if they prefer; pushing to the Hub doesn't override local usage.",
|
||||
},
|
||||
{
|
||||
text: "It allows you to push and share agents or tools, making them easily discoverable and reusable by other developers",
|
||||
explain: "Correct. smolagents supports uploading agents and tools to the HF Hub for others to reuse.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "It permanently stores all your code-based agents, preventing any updates or versioning",
|
||||
explain: "Hub repositories support updates and version control, so you can revise your code-based agents any time.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
Congratulations on completing this quiz! 🎉 If you missed any questions, consider reviewing the *Why use smolagents* section for a deeper understanding. If you did well, you're ready to explore more advanced topics in smolagents!
|
||||
147
units/en/unit2/smolagents/quiz2.mdx
Normal file
147
units/en/unit2/smolagents/quiz2.mdx
Normal file
@@ -0,0 +1,147 @@
|
||||
# Small Quiz (ungraded) [[quiz2]]
|
||||
|
||||
It's time to test your understanding of the *Code Agents*, *Tool Calling Agents*, and *Tools* sections. This quiz is optional and not graded.
|
||||
|
||||
---
|
||||
|
||||
### Q1: What is the key difference between creating a tool with the `@tool` decorator versus creating a subclass of `Tool` in smolagents?
|
||||
|
||||
Which statement best describes the distinction between these two approaches for defining tools?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "Using the <code>@tool</code> decorator is mandatory for retrieval-based tools, while subclasses of <code>Tool</code> are only for text-generation tasks",
|
||||
explain: "Both approaches can be used for any type of tool, including retrieval-based or text-generation tools.",
|
||||
},
|
||||
{
|
||||
text: "The <code>@tool</code> decorator is recommended for simple function-based tools, while subclasses of <code>Tool</code> offer more flexibility for complex functionality or custom metadata",
|
||||
explain: "This is correct. The decorator approach is simpler, but subclassing allows more customized behavior.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "<code>@tool</code> can only be used in multi-agent systems, while creating a <code>Tool</code> subclass is for single-agent scenarios",
|
||||
explain: "All agents (single or multi) can use either approach to define tools; there is no such restriction.",
|
||||
},
|
||||
{
|
||||
text: "Decorating a function with <code>@tool</code> replaces the need for a docstring, whereas subclasses must not include docstrings",
|
||||
explain: "Both methods benefit from clear docstrings. The decorator doesn't replace them, and a subclass can still have docstrings.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q2: How does a CodeAgent handle multi-step tasks using the ReAct (Reason + Act) approach?
|
||||
|
||||
Which statement correctly describes how the CodeAgent executes a series of steps to solve a task?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "It passes each step to a different agent in a multi-agent system, then combines results",
|
||||
explain: "Although multi-agent systems can distribute tasks, CodeAgent itself can handle multiple steps on its own using ReAct.",
|
||||
},
|
||||
{
|
||||
text: "It stores every action in JSON for easy parsing before executing them all at once",
|
||||
explain: "This behavior matches ToolCallingAgent's JSON-based approach, not CodeAgent.",
|
||||
},
|
||||
{
|
||||
text: "It cycles through writing internal thoughts, generating Python code, executing the code, and logging the results until it arrives at a final answer",
|
||||
explain: "Correct. This describes the ReAct pattern that CodeAgent uses, including iterative reasoning and code execution.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "It relies on a vision module to validate code output before continuing to the next step",
|
||||
explain: "Vision capabilities are supported in smolagents, but they're not a default requirement for CodeAgent or the ReAct approach.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q3: Which of the following is a primary advantage of sharing a tool on the Hugging Face Hub?
|
||||
|
||||
Select the best reason why a developer might upload and share their custom tool.
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "It automatically integrates the tool with a MultiStepAgent for retrieval-augmented generation",
|
||||
explain: "Sharing a tool doesn't automatically set up retrieval or multi-step logic. It's just making the tool available.",
|
||||
},
|
||||
{
|
||||
text: "It allows others to discover, reuse, and integrate your tool in their smolagents without extra setup",
|
||||
explain: "Yes. Sharing on the Hub makes tools accessible for anyone (including yourself) to download and reuse quickly.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "It ensures that only CodeAgents can invoke the tool while ToolCallingAgents cannot",
|
||||
explain: "Both CodeAgents and ToolCallingAgents can invoke shared tools. There's no restriction by agent type.",
|
||||
},
|
||||
{
|
||||
text: "It converts your tool into a fully vision-capable function for image processing",
|
||||
explain: "Tool sharing doesn't alter the tool's functionality or add vision capabilities automatically.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q4: ToolCallingAgent differs from CodeAgent in how it executes actions. Which statement is correct?
|
||||
|
||||
Choose the option that accurately describes how ToolCallingAgent works.
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "ToolCallingAgent is only compatible with a multi-agent system, while CodeAgent can run alone",
|
||||
explain: "Either agent can be used alone or as part of a multi-agent system.",
|
||||
},
|
||||
{
|
||||
text: "ToolCallingAgent delegates all reasoning to a separate retrieval agent, then returns a final answer",
|
||||
explain: "ToolCallingAgent still uses a main LLM for reasoning; it doesn't rely solely on retrieval agents.",
|
||||
},
|
||||
{
|
||||
text: "ToolCallingAgent outputs JSON instructions specifying tool calls and arguments, which get parsed and executed",
|
||||
explain: "This is correct. ToolCallingAgent uses the JSON approach to define tool calls.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "ToolCallingAgent is only meant for single-step tasks and automatically stops after calling one tool",
|
||||
explain: "ToolCallingAgent can perform multiple steps if needed, just like CodeAgent.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q5: What is included in the smolagents default toolbox, and why might you use it?
|
||||
|
||||
Which statement best captures the purpose and contents of the default toolbox in smolagents?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "It provides a set of commonly-used tools such as DuckDuckGo search, PythonInterpreterTool, and a final answer tool for quick prototyping",
|
||||
explain: "Correct. The default toolbox contains these ready-made tools for easy integration when building agents.",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "It only supports vision-based tasks like image classification or OCR by default",
|
||||
explain: "Although smolagents can integrate vision-based features, the default toolbox isn't exclusively vision-oriented.",
|
||||
},
|
||||
{
|
||||
text: "It is intended solely for multi-agent systems and is incompatible with a single CodeAgent",
|
||||
explain: "The default toolbox can be used by any agent type, single or multi-agent setups alike.",
|
||||
},
|
||||
{
|
||||
text: "It adds advanced retrieval-based functionality for large-scale question answering from a vector store",
|
||||
explain: "While you can build retrieval tools, the default toolbox does not automatically provide advanced RAG features.",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
Congratulations on completing this quiz! 🎉 If any questions gave you trouble, revisit the *Code Agents*, *Tool Calling Agents*, or *Tools* sections to strengthen your understanding. If you aced it, you're well on your way to building robust smolagents applications!
|
||||
164
units/en/unit2/smolagents/retrieval_agents.mdx
Normal file
164
units/en/unit2/smolagents/retrieval_agents.mdx
Normal file
@@ -0,0 +1,164 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/retrieval_agents.ipynb"},
|
||||
]} />
|
||||
|
||||
# Building Agentic RAG Systems
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/retrieval_agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
Retrieval Augmented Generation (RAG) systems combine the capabilities of data retrieval and generation models to provide context-aware responses. For example, a user's query is passed to a search engine, and the retrieved results are given to the model along with the query. The model then generates a response based on the query and retrieved information.
|
||||
|
||||
Agentic RAG (Retrieval-Augmented Generation) extends traditional RAG systems by **combining autonomous agents with dynamic knowledge retrieval**.
|
||||
|
||||
While traditional RAG systems use an LLM to answer queries based on retrieved data, agentic RAG **enables intelligent control of both retrieval and generation processes**, improving efficiency and accuracy.
|
||||
|
||||
Traditional RAG systems face key limitations, such as **relying on a single retrieval step** and focusing on direct semantic similarity with the user’s query, which may overlook relevant information.
|
||||
|
||||
Agentic RAG addresses these issues by allowing the agent to autonomously formulate search queries, critique retrieved results, and conduct multiple retrieval steps for a more tailored and comprehensive output.
|
||||
|
||||
## Basic Retrieval with DuckDuckGo
|
||||
|
||||
Let's build a simple agent that can search the web using DuckDuckGo. This agent will retrieve information and synthesize responses to answer queries. With Agentic RAG, Alfred's agent can:
|
||||
|
||||
* Search for latest superhero party trends
|
||||
* Refine results to include luxury elements
|
||||
* Synthesize information into a complete plan
|
||||
|
||||
Here's how Alfred's agent can achieve this:
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
|
||||
|
||||
# Initialize the search tool
|
||||
search_tool = DuckDuckGoSearchTool()
|
||||
|
||||
# Initialize the model
|
||||
model = HfApiModel()
|
||||
|
||||
agent = CodeAgent(
|
||||
model=model,
|
||||
tools=[search_tool]
|
||||
)
|
||||
|
||||
# Example usage
|
||||
response = agent.run(
|
||||
"Search for luxury superhero-themed party ideas, including decorations, entertainment, and catering."
|
||||
)
|
||||
print(response)
|
||||
```
|
||||
|
||||
The agent follows this process:
|
||||
|
||||
1. **Analyzes the Request:** Alfred’s agent identifies the key elements of the query—luxury superhero-themed party planning, with focus on decor, entertainment, and catering.
|
||||
2. **Performs Retrieval:** The agent leverages DuckDuckGo to search for the most relevant and up-to-date information, ensuring it aligns with Alfred’s refined preferences for a luxurious event.
|
||||
3. **Synthesizes Information:** After gathering the results, the agent processes them into a cohesive, actionable plan for Alfred, covering all aspects of the party.
|
||||
4. **Stores for Future Reference:** The agent stores the retrieved information for easy access when planning future events, optimizing efficiency in subsequent tasks.
|
||||
|
||||
## Custom Knowledge Base Tool
|
||||
|
||||
For specialized tasks, a custom knowledge base can be invaluable. Let's create a tool that queries a vector database of technical documentation or specialized knowledge. Using semantic search, the agent can find the most relevant information for Alfred's needs.
|
||||
|
||||
A vector database stores numerical representations (embeddings) of text or other data, created by machine learning models. It enables semantic search by identifying similar meanings in high-dimensional space.
|
||||
|
||||
This approach combines predefined knowledge with semantic search to provide context-aware solutions for event planning. With specialized knowledge access, Alfred can perfect every detail of the party.
|
||||
|
||||
In this example, we'll create a tool that retrieves party planning ideas from a custom knowledge base. We'll use a BM25 retriever to search the knowledge base and return the top results, and `RecursiveCharacterTextSplitter` to split the documents into smaller chunks for more efficient search.
|
||||
|
||||
```python
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
||||
from smolagents import Tool
|
||||
from langchain_community.retrievers import BM25Retriever
|
||||
from smolagents import CodeAgent, HfApiModel
|
||||
|
||||
class PartyPlanningRetrieverTool(Tool):
|
||||
name = "party_planning_retriever"
|
||||
description = "Uses semantic search to retrieve relevant party planning ideas for Alfred’s superhero-themed party at Wayne Manor."
|
||||
inputs = {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "The query to perform. This should be a query related to party planning or superhero themes.",
|
||||
}
|
||||
}
|
||||
output_type = "string"
|
||||
|
||||
def __init__(self, docs, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self.retriever = BM25Retriever.from_documents(
|
||||
docs, k=5 # Retrieve the top 5 documents
|
||||
)
|
||||
|
||||
def forward(self, query: str) -> str:
|
||||
assert isinstance(query, str), "Your search query must be a string"
|
||||
|
||||
docs = self.retriever.invoke(
|
||||
query,
|
||||
)
|
||||
return "\nRetrieved ideas:\n" + "".join(
|
||||
[
|
||||
f"\n\n===== Idea {str(i)} =====\n" + doc.page_content
|
||||
for i, doc in enumerate(docs)
|
||||
]
|
||||
)
|
||||
|
||||
# Simulate a knowledge base about party planning
|
||||
party_ideas = [
|
||||
{"text": "A superhero-themed masquerade ball with luxury decor, including gold accents and velvet curtains.", "source": "Party Ideas 1"},
|
||||
{"text": "Hire a professional DJ who can play themed music for superheroes like Batman and Wonder Woman.", "source": "Entertainment Ideas"},
|
||||
{"text": "For catering, serve dishes named after superheroes, like 'The Hulk's Green Smoothie' and 'Iron Man's Power Steak.'", "source": "Catering Ideas"},
|
||||
{"text": "Decorate with iconic superhero logos and projections of Gotham and other superhero cities around the venue.", "source": "Decoration Ideas"},
|
||||
{"text": "Interactive experiences with VR where guests can engage in superhero simulations or compete in themed games.", "source": "Entertainment Ideas"}
|
||||
]
|
||||
|
||||
source_docs = [
|
||||
Document(page_content=doc["text"], metadata={"source": doc["source"]})
|
||||
for doc in party_ideas
|
||||
]
|
||||
|
||||
# Split the documents into smaller chunks for more efficient search
|
||||
text_splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=500,
|
||||
chunk_overlap=50,
|
||||
add_start_index=True,
|
||||
strip_whitespace=True,
|
||||
separators=["\n\n", "\n", ".", " ", ""],
|
||||
)
|
||||
docs_processed = text_splitter.split_documents(source_docs)
|
||||
|
||||
# Create the retriever tool
|
||||
party_planning_retriever = PartyPlanningRetrieverTool(docs_processed)
|
||||
|
||||
# Initialize the agent
|
||||
agent = CodeAgent(tools=[party_planning_retriever], model=HfApiModel())
|
||||
|
||||
# Example usage
|
||||
response = agent.run(
|
||||
"Find ideas for a luxury superhero-themed party, including entertainment, catering, and decoration options."
|
||||
)
|
||||
|
||||
print(response)
|
||||
```
|
||||
|
||||
This enhanced agent can:
|
||||
1. First check the documentation for relevant information
|
||||
2. Combine insights from the knowledge base
|
||||
3. Maintain conversation context in memory
|
||||
|
||||
## Enhanced Retrieval Capabilities
|
||||
|
||||
When building agentic RAG systems, the agent can employ sophisticated strategies like:
|
||||
|
||||
1. **Query Reformulation:** Instead of using the raw user query, the agent can craft optimized search terms that better match the target documents
|
||||
2. **Multi-Step Retrieval** The agent can perform multiple searches, using initial results to inform subsequent queries
|
||||
3. **Source Integration** Information can be combined from multiple sources like web search and local documentation
|
||||
4. **Result Validation** Retrieved content can be analyzed for relevance and accuracy before being included in responses
|
||||
|
||||
Effective agentic RAG systems require careful consideration of several key aspects. The agent **should select between available tools based on the query type and context**. Memory systems help maintain conversation history and avoid repetitive retrievals. Having fallback strategies ensures the system can still provide value even when primary retrieval methods fail. Additionally, implementing validation steps helps ensure the accuracy and relevance of retrieved information.
|
||||
|
||||
## Resources
|
||||
|
||||
- [Agentic RAG: turbocharge your RAG with query reformulation and self-query! 🚀](https://huggingface.co/learn/cookbook/agent_rag) - Recipe for developing an Agentic RAG system using smolagents.
|
||||
73
units/en/unit2/smolagents/tool_calling_agents.mdx
Normal file
73
units/en/unit2/smolagents/tool_calling_agents.mdx
Normal file
@@ -0,0 +1,73 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/tool_calling_agents.ipynb"},
|
||||
]} />
|
||||
|
||||
# Writing actions as code snippets or JSON blobs
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/tool_calling_agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
Tool Calling Agents are the second type of agent available in `smolagents`. Unlike Code Agents that use Python snippets, these agents **use the built-in tool-calling capabilities of LLM providers** to generate tool calls as **JSON structures**. This is the standard approach used by OpenAI, Anthropic, and many other providers.
|
||||
|
||||
Let's look at an example. When Alfred wants to search for catering services and party ideas, a `CodeAgent` would generate and run Python code like this:
|
||||
|
||||
```python
|
||||
for query in [
|
||||
"Best catering services in Gotham City",
|
||||
"Party theme ideas for superheroes"
|
||||
]:
|
||||
print(web_search(f"Search for: {query}"))
|
||||
```
|
||||
|
||||
A `ToolCallingAgent` would instead create a JSON structure:
|
||||
|
||||
```python
|
||||
[
|
||||
{"name": "web_search", "arguments": "Best catering services in Gotham City"},
|
||||
{"name": "web_search", "arguments": "Party theme ideas for superheroes"}
|
||||
]
|
||||
```
|
||||
|
||||
This JSON blob is then used to execute the tool calls.
|
||||
|
||||
While `smolagents` primarily focuses on `CodeAgents` since [they perform better overall](https://arxiv.org/abs/2402.01030), `ToolCallingAgents` can be effective for simple systems that don't require variable handling or complex tool calls.
|
||||
|
||||

|
||||
|
||||
## How Do Tool Calling Agents Work?
|
||||
|
||||
Tool Calling Agents follow the same multi-step workflow as Code Agents (see the [previous section](./code_agents) for details).
|
||||
|
||||
The key difference is in **how they structure their actions**: instead of executable code, they **generate JSON objects that specify tool names and arguments**. The system then **parses these instructions** to execute the appropriate tools.
|
||||
|
||||
## Example: Running a Tool Calling Agent
|
||||
|
||||
Let's revisit the previous example where Alfred started party preparations, but this time we'll use a `ToolCallingAgent` to highlight the difference. We'll build an agent that can search the web using DuckDuckGo, just like in our Code Agent example. The only difference is the agent type - the framework handles everything else:
|
||||
|
||||
```python
|
||||
from smolagents import ToolCallingAgent, DuckDuckGoSearchTool, HfApiModel
|
||||
|
||||
agent = ToolCallingAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel())
|
||||
|
||||
agent.run("Search for the best music recommendations for a party at the Wayne's mansion.")
|
||||
```
|
||||
|
||||
When you examine the agent's trace, instead of seeing `Executing parsed code:`, you'll see something like:
|
||||
|
||||
```text
|
||||
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
|
||||
│ Calling tool: 'web_search' with arguments: {'query': "best music recommendations for a party at Wayne's │
|
||||
│ mansion"} │
|
||||
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
|
||||
```
|
||||
|
||||
The agent generates a structured tool call that the system processes to produce the output, rather than directly executing code like a `CodeAgent`.
|
||||
|
||||
Now that we understand both agent types, we can choose the right one for our needs. Let's continue exploring `smolagents` to make Alfred's party a success! 🎉
|
||||
|
||||
## Resources
|
||||
|
||||
- [ToolCallingAgent documentation](https://huggingface.co/docs/smolagents/v1.8.1/en/reference/agents#smolagents.ToolCallingAgent) - Official documentation for ToolCallingAgent
|
||||
270
units/en/unit2/smolagents/tools.mdx
Normal file
270
units/en/unit2/smolagents/tools.mdx
Normal file
@@ -0,0 +1,270 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/tools.ipynb"},
|
||||
]} />
|
||||
|
||||
# Tools
|
||||
|
||||
As we explored in [unit 1](https://huggingface.co/learn/agents-course/unit1/tools), agents use tools to perform various actions. In `smolagents`, tools are treated as **functions that an LLM can call within an agent system**.
|
||||
|
||||
To interact with a tool, the LLM needs an **interface description** with these key components:
|
||||
|
||||
- **Name**: What the tool is called
|
||||
- **Tool description**: What the tool does
|
||||
- **Input types and descriptions**: What arguments the tool accepts
|
||||
- **Output type**: What the tool returns
|
||||
|
||||
For instance, while preparing for a party at Wayne Manor, Alfred needs various tools to gather information - from searching for catering services to finding party theme ideas. Here's how a simple search tool interface might look:
|
||||
|
||||
- **Name:** `web_search`
|
||||
- **Tool description:** Searches the web for specific queries
|
||||
- **Input:** `query` (string) - The search term to look up
|
||||
- **Output:** String containing the search results
|
||||
|
||||
By using these tools, Alfred can make informed decisions and gather all the information needed for planning the perfect party.
|
||||
|
||||
Below, you can see an animation illustrating how a tool call is managed:
|
||||
|
||||

|
||||
|
||||
## Tool Creation Methods
|
||||
|
||||
In `smolagents`, tools can be defined in two ways:
|
||||
1. **Using the `@tool` decorator** for simple function-based tools
|
||||
2. **Creating a subclass of `Tool`** for more complex functionality
|
||||
|
||||
### The `@tool` Decorator
|
||||
|
||||
The `@tool` decorator is the **recommended way to define simple tools**. Under the hood, smolagents will parse basic information about the function from Python. So if you name your function clearly and write a good docstring, it will be easier for the LLM to use.
|
||||
|
||||
Using this approach, we define a function with:
|
||||
|
||||
- **A clear and descriptive function name** that helps the LLM understand its purpose.
|
||||
- **Type hints for both inputs and outputs** to ensure proper usage.
|
||||
- **A detailed description**, including an `Args:` section where each argument is explicitly described. These descriptions provide valuable context for the LLM, so it's important to write them carefully.
|
||||
|
||||
#### Generating a tool that retrieves the highest-rated catering
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/alfred-catering.jpg" alt="Alfred Catering"/>
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/tools.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
Let's imagine that Alfred has already decided on the menu for the party, but now he needs help preparing food for such a large number of guests. To do so, he would like to hire a catering service and needs to identify the highest-rated options available. Alfred can leverage a tool to search for the best catering services in his area.
|
||||
|
||||
Below is an example of how Alfred can use the `@tool` decorator to make this happen:
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, HfApiModel, tool
|
||||
|
||||
# Let's pretend we have a function that fetches the highest-rated catering services.
|
||||
@tool
|
||||
def catering_service_tool(query: str) -> str:
|
||||
"""
|
||||
This tool returns the highest-rated catering service in Gotham City.
|
||||
|
||||
Args:
|
||||
query: A search term for finding catering services.
|
||||
"""
|
||||
# Example list of catering services and their ratings
|
||||
services = {
|
||||
"Gotham Catering Co.": 4.9,
|
||||
"Wayne Manor Catering": 4.8,
|
||||
"Gotham City Events": 4.7,
|
||||
}
|
||||
|
||||
# Find the highest rated catering service (simulating search query filtering)
|
||||
best_service = max(services, key=services.get)
|
||||
|
||||
return best_service
|
||||
|
||||
|
||||
agent = CodeAgent(tools=[catering_service_tool], model=HfApiModel())
|
||||
|
||||
# Run the agent to find the best catering service
|
||||
result = agent.run(
|
||||
"Can you give me the name of the highest-rated catering service in Gotham City?"
|
||||
)
|
||||
|
||||
print(result) # Output: Gotham Catering Co.
|
||||
```
|
||||
|
||||
### Defining a Tool as a Python Class
|
||||
|
||||
This approach involves creating a subclass of [`Tool`](https://huggingface.co/docs/smolagents/v1.8.1/en/reference/tools#smolagents.Tool). For complex tools, we can implement a class instead of a Python function. The class wraps the function with metadata that helps the LLM understand how to use it effectively. In this class, we define:
|
||||
|
||||
- `name`: The tool's name.
|
||||
- `description`: A description used to populate the agent's system prompt.
|
||||
- `inputs`: A dictionary with keys `type` and `description`, providing information to help the Python interpreter process inputs.
|
||||
- `output_type`: Specifies the expected output type.
|
||||
- `forward`: The method containing the inference logic to execute.
|
||||
|
||||
Below, we can see an example of a tool built using `Tool` and how to integrate it within a `CodeAgent`.
|
||||
|
||||
#### Generating a tool to generate ideas about the superhero-themed party
|
||||
|
||||
Alfred's party at the mansion is a **superhero-themed event**, but he needs some creative ideas to make it truly special. As a fantastic host, he wants to surprise the guests with a unique theme.
|
||||
|
||||
To do this, he can use an agent that generates superhero-themed party ideas based on a given category. This way, Alfred can find the perfect party theme to wow his guests.
|
||||
|
||||
```python
|
||||
from smolagents import Tool, CodeAgent, HfApiModel
|
||||
|
||||
class SuperheroPartyThemeTool(Tool):
|
||||
name = "superhero_party_theme_generator"
|
||||
description = """
|
||||
This tool suggests creative superhero-themed party ideas based on a category.
|
||||
It returns a unique party theme idea."""
|
||||
|
||||
inputs = {
|
||||
"category": {
|
||||
"type": "string",
|
||||
"description": "The type of superhero party (e.g., 'classic heroes', 'villain masquerade', 'futuristic Gotham').",
|
||||
}
|
||||
}
|
||||
|
||||
output_type = "string"
|
||||
|
||||
def forward(self, category: str):
|
||||
themes = {
|
||||
"classic heroes": "Justice League Gala: Guests come dressed as their favorite DC heroes with themed cocktails like 'The Kryptonite Punch'.",
|
||||
"villain masquerade": "Gotham Rogues' Ball: A mysterious masquerade where guests dress as classic Batman villains.",
|
||||
"futuristic Gotham": "Neo-Gotham Night: A cyberpunk-style party inspired by Batman Beyond, with neon decorations and futuristic gadgets."
|
||||
}
|
||||
|
||||
return themes.get(category.lower(), "Themed party idea not found. Try 'classic heroes', 'villain masquerade', or 'futuristic Gotham'.")
|
||||
|
||||
# Instantiate the tool
|
||||
party_theme_tool = SuperheroPartyThemeTool()
|
||||
agent = CodeAgent(tools=[party_theme_tool], model=HfApiModel())
|
||||
|
||||
# Run the agent to generate a party theme idea
|
||||
result = agent.run(
|
||||
"What would be a good superhero party idea for a 'villain masquerade' theme?"
|
||||
)
|
||||
|
||||
print(result) # Output: "Gotham Rogues' Ball: A mysterious masquerade where guests dress as classic Batman villains."
|
||||
```
|
||||
|
||||
With this tool, Alfred will be the ultimate super host, impressing his guests with a superhero-themed party they won't forget! 🦸♂️🦸♀️
|
||||
|
||||
## Default Toolbox
|
||||
|
||||
`smolagents` comes with a set of pre-built tools that can be directly injected into your agent. The [default toolbox](https://huggingface.co/docs/smolagents/guided_tour?build-a-tool=Decorate+a+function+with+%40tool#default-toolbox) includes:
|
||||
|
||||
- **PythonInterpreterTool**
|
||||
- **FinalAnswerTool**
|
||||
- **UserInputTool**
|
||||
- **DuckDuckGoSearchTool**
|
||||
- **GoogleSearchTool**
|
||||
- **VisitWebpageTool**
|
||||
|
||||
Alfred could use various tools to ensure a flawless party at Wayne Manor:
|
||||
|
||||
- First, he could use the `DuckDuckGoSearchTool` to find creative superhero-themed party ideas.
|
||||
|
||||
- For catering, he'd rely on the `GoogleSearchTool` to find the highest-rated services in Gotham.
|
||||
|
||||
- To manage seating arrangements, Alfred could run calculations with the `PythonInterpreterTool`.
|
||||
|
||||
- Once everything is gathered, he'd compile the plan using the `FinalAnswerTool`.
|
||||
|
||||
With these tools, Alfred guarantees the party is both exceptional and seamless. 🦇💡
|
||||
|
||||
## Sharing and Importing Tools
|
||||
|
||||
One of the most powerful features of **smolagents** is its ability to share custom tools on the Hub and seamlessly integrate tools created by the community. This includes connecting with **HF Spaces** and **LangChain tools**, significantly enhancing Alfred's ability to orchestrate an unforgettable party at Wayne Manor. 🎭
|
||||
|
||||
With these integrations, Alfred can tap into advanced event-planning tools—whether it's adjusting the lighting for the perfect ambiance, curating the ideal playlist for the party, or coordinating with Gotham's finest caterers.
|
||||
|
||||
Here are examples showcasing how these functionalities can elevate the party experience:
|
||||
|
||||
### Sharing a Tool to the Hub
|
||||
|
||||
Sharing your custom tool with the community is easy! Simply upload it to your Hugging Face account using the `push_to_hub()` method.
|
||||
|
||||
For instance, Alfred can share his `party_theme_tool` to help others find the best catering services in Gotham. Here's how to do it:
|
||||
|
||||
```python
|
||||
party_theme_tool.push_to_hub("{your_username}/party_theme_tool", token="<YOUR_HUGGINGFACEHUB_API_TOKEN>")
|
||||
```
|
||||
|
||||
### Importing a Tool from the Hub
|
||||
|
||||
You can easily import tools created by other users using the `load_tool()` function. For example, Alfred might want to generate a promotional image for the party using AI. Instead of building a tool from scratch, he can leverage a predefined one from the community:
|
||||
|
||||
```python
|
||||
from smolagents import load_tool, CodeAgent, HfApiModel
|
||||
|
||||
image_generation_tool = load_tool(
|
||||
"m-ric/text-to-image",
|
||||
trust_remote_code=True
|
||||
)
|
||||
|
||||
agent = CodeAgent(
|
||||
tools=[image_generation_tool],
|
||||
model=HfApiModel()
|
||||
)
|
||||
|
||||
agent.run("Generate an image of a luxurious superhero-themed party at Wayne Manor with made-up superheros.")
|
||||
```
|
||||
|
||||
### Importing a Hugging Face Space as a Tool
|
||||
|
||||
You can also import a HF Space as a tool using `Tool.from_space()`. This opens up possibilities for integrating with thousands of spaces from the community for tasks from image generation to data analysis.
|
||||
|
||||
The tool will connect with the spaces Gradio backend using the `gradio_client`, so make sure to install it via `pip` if you don't have it already.
|
||||
|
||||
For the party, Alfred can use an existing HF Space for the generation of the AI-generated image to be used in the announcement (instead of the pre-built tool we mentioned before). Let's build it!
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, HfApiModel, Tool
|
||||
|
||||
image_generation_tool = Tool.from_space(
|
||||
"black-forest-labs/FLUX.1-schnell",
|
||||
name="image_generator",
|
||||
description="Generate an image from a prompt"
|
||||
)
|
||||
|
||||
model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct")
|
||||
|
||||
agent = CodeAgent(tools=[image_generation_tool], model=model)
|
||||
|
||||
agent.run(
|
||||
"Improve this prompt, then generate an image of it.",
|
||||
additional_args={'user_prompt': 'A grand superhero-themed party at Wayne Manor, with Alfred overseeing a luxurious gala'}
|
||||
)
|
||||
```
|
||||
|
||||
### Importing a LangChain Tool
|
||||
|
||||
|
||||
We'll discuss the `LangChain` framework in upcoming sections. For now, we just note that we can reuse LangChain tools in your smolagents workflow!
|
||||
|
||||
You can easily load LangChain tools using the `Tool.from_langchain()` method. Alfred, ever the perfectionist, is preparing for a spectacular superhero night at Wayne Manor while the Waynes are away. To make sure every detail exceeds expectations, he taps into LangChain tools to find top-tier entertainment ideas.
|
||||
|
||||
By using `Tool.from_langchain()`, Alfred effortlessly adds advanced search functionalities to his smolagent, enabling him to discover exclusive party ideas and services with just a few commands.
|
||||
|
||||
Here's how he does it:
|
||||
|
||||
```python
|
||||
from langchain.agents import load_tools
|
||||
from smolagents import CodeAgent, HfApiModel, Tool
|
||||
|
||||
search_tool = Tool.from_langchain(load_tools(["serpapi"])[0])
|
||||
|
||||
agent = CodeAgent(tools=[search_tool], model=model)
|
||||
|
||||
agent.run("Search for luxury entertainment ideas for a superhero-themed event, such as live performances and interactive experiences.")
|
||||
```
|
||||
|
||||
With this setup, Alfred can quickly discover luxurious entertainment options, ensuring Gotham's elite guests have an unforgettable experience. This tool helps him curate the perfect superhero-themed event for Wayne Manor! 🎉
|
||||
|
||||
## Resources
|
||||
|
||||
- [Tools Tutorial](https://huggingface.co/docs/smolagents/tutorials/tools) - Explore this tutorial to learn how to work with tools effectively.
|
||||
- [Tools Documentation](https://huggingface.co/docs/smolagents/v1.8.1/en/reference/tools) - Comprehensive reference documentation on tools.
|
||||
- [Tools Guided Tour](https://huggingface.co/docs/smolagents/v1.8.1/en/guided_tour#tools) - A step-by-step guided tour to help you build and utilize tools efficiently.
|
||||
- [Building Effective Agents](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) - A detailed guide on best practices for developing reliable and high-performance custom function agents.
|
||||
222
units/en/unit2/smolagents/vision_agents.mdx
Normal file
222
units/en/unit2/smolagents/vision_agents.mdx
Normal file
@@ -0,0 +1,222 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/vision_agents.ipynb"},
|
||||
]} />
|
||||
|
||||
# Vision Agents with smolagents
|
||||
|
||||
<Tip warning={true}>
|
||||
The examples in this section require access to a powerful VLM model. We tested them using the GPT-4o API.
|
||||
However, <a href="./why_use_smolagents">Why use smolagents</a> discusses alternative solutions supported by smolagents and Hugging Face. If you'd like to explore other options, be sure to check that section.
|
||||
</Tip>
|
||||
|
||||
Empowering agents with visual capabilities is crucial for solving tasks that go beyond text processing. Many real-world challenges, such as web browsing or document understanding, require analyzing rich visual content. Fortunately, `smolagents` provides built-in support for vision-language models (VLMs), enabling agents to process and interpret images effectively.
|
||||
|
||||
In this example, imagine Alfred, the butler at Wayne Manor, is tasked with verifying the identities of the guests attending the party. As you can imagine, Alfred may not be familiar with everyone arriving. To help him, we can use an agent that verifies their identity by searching for visual information about their appearance using a VLM. This will allow Alfred to make informed decisions about who can enter. Let's build this example!
|
||||
|
||||
|
||||
## Providing Images at the Start of the Agent's Execution
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/vision_agents.ipynb" target="_blank">this notebook</a> that you can run using Google Colab.
|
||||
</Tip>
|
||||
|
||||
In this approach, images are passed to the agent at the start and stored as `task_images` alongside the task prompt. The agent then processes these images throughout its execution.
|
||||
|
||||
Consider the case where Alfred wants to verify the identities of the superheroes attending the party. He already has a dataset of images from previous parties with the names of the guests. Given a new visitor's image, the agent can compare it with the existing dataset and make a decision about letting them in.
|
||||
|
||||
In this case, a guest is trying to enter, and Alfred suspects that this visitor might be The Joker impersonating Wonder Woman. Alfred needs to verify their identity to prevent anyone unwanted from entering.
|
||||
|
||||
Let’s build the example. First, the images are loaded. In this case, we use images from Wikipedia to keep the example minimal, but imagine the possible use-case!
|
||||
|
||||
```python
|
||||
from PIL import Image
|
||||
import requests
|
||||
from io import BytesIO
|
||||
|
||||
image_urls = [
|
||||
"https://upload.wikimedia.org/wikipedia/commons/e/e8/The_Joker_at_Wax_Museum_Plus.jpg", # Joker image
|
||||
"https://upload.wikimedia.org/wikipedia/en/9/98/Joker_%28DC_Comics_character%29.jpg" # Joker image
|
||||
]
|
||||
|
||||
images = []
|
||||
for url in image_urls:
|
||||
response = requests.get(url)
|
||||
image = Image.open(BytesIO(response.content)).convert("RGB")
|
||||
images.append(image)
|
||||
```
|
||||
|
||||
Now that we have the images, the agent will tell us whether one guest is actually a superhero (Wonder Woman) or a villain (The Joker).
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, OpenAIServerModel
|
||||
|
||||
model = OpenAIServerModel(model_id="gpt-4o")
|
||||
|
||||
# Instantiate the agent
|
||||
agent = CodeAgent(
|
||||
tools=[],
|
||||
model=model,
|
||||
max_steps=20,
|
||||
verbosity_level=2
|
||||
)
|
||||
|
||||
response = agent.run(
|
||||
"""
|
||||
Describe the costume and makeup that the comic character in these photos is wearing and return the description.
|
||||
Tell me if the guest is The Joker or Wonder Woman.
|
||||
""",
|
||||
images=images
|
||||
)
|
||||
```
|
||||
|
||||
In the case of my run, the output is the following, although it could vary in your case, as we've already discussed:
|
||||
|
||||
```python
|
||||
{
|
||||
'Costume and Makeup - First Image': (
|
||||
'Purple coat and a purple silk-like cravat or tie over a mustard-yellow shirt.',
|
||||
'White face paint with exaggerated features, dark eyebrows, blue eye makeup, red lips forming a wide smile.'
|
||||
),
|
||||
'Costume and Makeup - Second Image': (
|
||||
'Dark suit with a flower on the lapel, holding a playing card.',
|
||||
'Pale skin, green hair, very red lips with an exaggerated grin.'
|
||||
),
|
||||
'Character Identity': 'This character resembles known depictions of The Joker from comic book media.'
|
||||
}
|
||||
```
|
||||
|
||||
In this case, the output reveals that the person is impersonating someone else, so we can prevent The Joker from entering the party!
|
||||
|
||||
## Providing Images with Dynamic Retrieval
|
||||
|
||||
<Tip>
|
||||
You can follow the code in <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/vision_web_browser.py" target="_blank">this Python file</a>
|
||||
</Tip>
|
||||
|
||||
The previous approach is valuable and has many potential use cases. However, in situations where the guest is not in the database, we need to explore other ways of identifying them. One possible solution is to dynamically retrieve images and information from external sources, such as browsing the web for details.
|
||||
|
||||
In this approach, images are dynamically added to the agent's memory during execution. As we know, agents in `smolagents` are based on the `MultiStepAgent` class, which is an abstraction of the ReAct framework. This class operates in a structured cycle where various variables and knowledge are logged at different stages:
|
||||
|
||||
1. **SystemPromptStep:** Stores the system prompt.
|
||||
2. **TaskStep:** Logs the user query and any provided input.
|
||||
3. **ActionStep:** Captures logs from the agent's actions and results.
|
||||
|
||||
This structured approach allows agents to incorporate visual information dynamically and respond adaptively to evolving tasks. Below is the diagram we've already seen, illustrating the dynamic workflow process and how different steps integrate within the agent lifecycle. When browsing, the agent can take screenshots and save them as `observation_images` in the `ActionStep`.
|
||||
|
||||

|
||||
|
||||
Now that we understand the need, let's build our complete example. In this case, Alfred wants full control over the guest verification process, so browsing for details becomes a viable solution. To complete this example, we need a new set of tools for the agent. Additionally, we'll use Selenium and Helium, which are browser automation tools. This will allow us to build an agent that explores the web, searching for details about a potential guest and retrieving verification information. Let's install the tools needed:
|
||||
|
||||
```bash
|
||||
pip install "smolagents[all]" helium selenium python-dotenv
|
||||
```
|
||||
|
||||
We'll need a set of agent tools specifically designed for browsing, such as `search_item_ctrl_f`, `go_back`, and `close_popups`. These tools allow the agent to act like a person navigating the web.
|
||||
|
||||
```python
|
||||
@tool
|
||||
def search_item_ctrl_f(text: str, nth_result: int = 1) -> str:
|
||||
"""
|
||||
Searches for text on the current page via Ctrl + F and jumps to the nth occurrence.
|
||||
Args:
|
||||
text: The text to search for
|
||||
nth_result: Which occurrence to jump to (default: 1)
|
||||
"""
|
||||
elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]")
|
||||
if nth_result > len(elements):
|
||||
raise Exception(f"Match n°{nth_result} not found (only {len(elements)} matches found)")
|
||||
result = f"Found {len(elements)} matches for '{text}'."
|
||||
elem = elements[nth_result - 1]
|
||||
driver.execute_script("arguments[0].scrollIntoView(true);", elem)
|
||||
result += f"Focused on element {nth_result} of {len(elements)}"
|
||||
return result
|
||||
|
||||
|
||||
@tool
|
||||
def go_back() -> None:
|
||||
"""Goes back to previous page."""
|
||||
driver.back()
|
||||
|
||||
|
||||
@tool
|
||||
def close_popups() -> str:
|
||||
"""
|
||||
Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows! This does not work on cookie consent banners.
|
||||
"""
|
||||
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
|
||||
```
|
||||
|
||||
We also need functionality for saving screenshots, as this will be an essential part of what our VLM agent uses to complete the task. This functionality captures the screenshot and saves it in `step_log.observations_images = [image.copy()]`, allowing the agent to store and process the images dynamically as it navigates.
|
||||
|
||||
```python
|
||||
def save_screenshot(step_log: ActionStep, agent: CodeAgent) -> None:
|
||||
sleep(1.0) # Let JavaScript animations happen before taking the screenshot
|
||||
driver = helium.get_driver()
|
||||
current_step = step_log.step_number
|
||||
if driver is not None:
|
||||
for step_logs in agent.logs: # Remove previous screenshots from logs for lean processing
|
||||
if isinstance(step_log, ActionStep) and step_log.step_number <= current_step - 2:
|
||||
step_logs.observations_images = None
|
||||
png_bytes = driver.get_screenshot_as_png()
|
||||
image = Image.open(BytesIO(png_bytes))
|
||||
print(f"Captured a browser screenshot: {image.size} pixels")
|
||||
step_log.observations_images = [image.copy()] # Create a copy to ensure it persists, important!
|
||||
|
||||
# Update observations with current URL
|
||||
url_info = f"Current url: {driver.current_url}"
|
||||
step_log.observations = url_info if step_logs.observations is None else step_log.observations + "\n" + url_info
|
||||
return
|
||||
```
|
||||
|
||||
This function is passed to the agent as `step_callback`, as it's triggered at the end of each step during the agent's execution. This allows the agent to dynamically capture and store screenshots throughout its process.
|
||||
|
||||
Now, we can generate our vision agent for browsing the web, providing it with the tools we created, along with the `DuckDuckGoSearchTool` to explore the web. This tool will help the agent retrieve necessary information for verifying guests' identities based on visual cues.
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, OpenAIServerModel, DuckDuckGoSearchTool
|
||||
model = OpenAIServerModel(model_id="gpt-4o")
|
||||
|
||||
agent = CodeAgent(
|
||||
tools=[DuckDuckGoSearchTool(), go_back, close_popups, search_item_ctrl_f],
|
||||
model=model,
|
||||
additional_authorized_imports=["helium"],
|
||||
step_callbacks=[save_screenshot],
|
||||
max_steps=20,
|
||||
verbosity_level=2,
|
||||
)
|
||||
```
|
||||
|
||||
With that, Alfred is ready to check the guests' identities and make informed decisions about whether to let them into the party:
|
||||
|
||||
```python
|
||||
agent.run("""
|
||||
I am Alfred, the butler of Wayne Manor, responsible for verifying the identity of guests at party. A superhero has arrived at the entrance claiming to be Wonder Woman, but I need to confirm if she is who she says she is.
|
||||
|
||||
Please search for images of Wonder Woman and generate a detailed visual description based on those images. Additionally, navigate to Wikipedia to gather key details about her appearance. With this information, I can determine whether to grant her access to the event.
|
||||
""" + helium_instructions)
|
||||
```
|
||||
|
||||
You can see that we include `helium_instructions` as part of the task. This special prompt is aimed to control the navigation of the agent, ensuring that it follows the correct steps while browsing the web.
|
||||
|
||||
Let's see how this works in the video below:
|
||||
|
||||
<video controls>
|
||||
<source src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/VisionBrowserAgent.mp4" type="video/mp4">
|
||||
</video>
|
||||
|
||||
This is the final output:
|
||||
|
||||
```python
|
||||
Final answer: Wonder Woman is typically depicted wearing a red and gold bustier, blue shorts or skirt with white stars, a golden tiara, silver bracelets, and a golden Lasso of Truth. She is Princess Diana of Themyscira, known as Diana Prince in the world of men.
|
||||
```
|
||||
|
||||
With all of that, we've successfully created our identity verifier for the party! Alfred now has the necessary tools to ensure only the right guests make it through the door. Everything is set to have a good time at Wayne Manor!
|
||||
|
||||
|
||||
## Further Reading
|
||||
|
||||
- [We just gave sight to smolagents](https://huggingface.co/blog/smolagents-can-see) - Blog describing the vision agent functionality.
|
||||
- [Web Browser Automation with Agents 🤖🌐](https://huggingface.co/docs/smolagents/examples/web_browser) - Example for Web browsing using a vision agent.
|
||||
- [Web Browser Vision Agent Example](https://github.com/huggingface/smolagents/blob/main/src/smolagents/vision_web_browser.py) - Example for Web browsing using a vision agent.
|
||||
67
units/en/unit2/smolagents/why_use_smolagents.mdx
Normal file
67
units/en/unit2/smolagents/why_use_smolagents.mdx
Normal file
@@ -0,0 +1,67 @@
|
||||

|
||||
# Why use smolagents
|
||||
|
||||
In this module, we will explore the pros and cons of using [smolagents](https://huggingface.co/docs/smolagents/en/index), helping you make an informed decision about whether it's the right framework for your needs.
|
||||
|
||||
## What is `smolagents`?
|
||||
|
||||
`smolagents` is a simple yet powerful framework for building AI agents. It provides LLMs with the _agency_ to interact with the real world, such as searching or generating images.
|
||||
|
||||
As we learned in unit 1, AI agents are programs that use LLMs to generate **'thoughts'** based on **'observations'** to perform **'actions'**. Let's explore how this is implemented in smolagents.
|
||||
|
||||
### Key Advantages of `smolagents`
|
||||
- **Simplicity:** Minimal code complexity and abstractions, to make the framework easy to understand, adopt and extend
|
||||
- **Flexible LLM Support:** Works with any LLM through integration with Hugging Face tools and external APIs
|
||||
- **Code-First Approach:** First-class support for Code Agents that write their actions directly in code, removing the need for parsing and simplifying tool calling
|
||||
- **HF Hub Integration:** Seamless integration with the Hugging Face Hub, allowing the use of Gradio Spaces as tools
|
||||
|
||||
### When to use smolagents?
|
||||
|
||||
With these advantages in mind, when should we use smolagents over other frameworks?
|
||||
|
||||
smolagents is ideal when:
|
||||
- You need a **lightweight and minimal solution.**
|
||||
- You want to **experiment quickly** without complex configurations.
|
||||
- Your **application logic is straightforward.**
|
||||
|
||||
### Code vs. JSON Actions
|
||||
Unlike other frameworks where agents write actions in JSON, `smolagents` **focuses on tool calls in code**, simplifying the execution process. This is because there's no need to parse the JSON in order to build code that calls the tools: the output can be executed directly.
|
||||
|
||||
The following diagram illustrates this difference:
|
||||
|
||||

|
||||
|
||||
To review the difference between Code vs JSON Actions, you can revisit [the Actions Section in Unit 1](https://huggingface.co/learn/agents-course/unit1/actions#actions-enabling-the-agent-to-engage-with-its-environment).
|
||||
|
||||
### Agent Types in `smolagents`
|
||||
|
||||
Agents in `smolagents` operate as **multi-step agents**.
|
||||
|
||||
Each [`MultiStepAgent`](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.MultiStepAgent) performs:
|
||||
- One thought
|
||||
- One tool call and execution
|
||||
|
||||
In addition to using **[CodeAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.CodeAgent)** as the primary type of agent, smolagents also supports **[ToolCallingAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.ToolCallingAgent)**, which writes tool calls in JSON.
|
||||
|
||||
We will explore each agent type in more detail in the following sections.
|
||||
|
||||
<Tip>
|
||||
In smolagents, tools are defined using <code>@tool</code> decorator wrapping a python function or the <code>Tool</code> class.
|
||||
</Tip>
|
||||
|
||||
### Model Integration in `smolagents`
|
||||
`smolagents` supports flexible LLM integration, allowing you to use any callable model that meets [certain criteria](https://huggingface.co/docs/smolagents/main/en/reference/models). The framework provides several predefined classes to simplify model connections:
|
||||
|
||||
- **[TransformersModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.TransformersModel):** Implements a local `transformers` pipeline for seamless integration.
|
||||
- **[HfApiModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.HfApiModel):** Supports [serverless inference](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) calls through [Hugging Face's infrastructure](https://huggingface.co/docs/api-inference/index), or via a growing number of [third-party inference providers](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#supported-providers-and-tasks).
|
||||
- **[LiteLLMModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.LiteLLMModel):** Leverages [LiteLLM](https://www.litellm.ai/) for lightweight model interactions.
|
||||
- **[OpenAIServerModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.OpenAIServerModel):** Connects to any service that offers an OpenAI API interface.
|
||||
- **[AzureOpenAIServerModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.AzureOpenAIServerModel):** Supports integration with any Azure OpenAI deployment.
|
||||
|
||||
This flexibility ensures that developers can choose the model and service most suitable for their specific use cases, and allows for easy experimentation.
|
||||
|
||||
Now that we understood why and when to use smolagents, let's dive deeper into this powerful library!
|
||||
|
||||
## Resources
|
||||
|
||||
- [smolagents Blog](https://huggingface.co/blog/smolagents) - Introduction to smolagents and code interactions
|
||||
88
units/zh-CN/_toctree.yml
Normal file
88
units/zh-CN/_toctree.yml
Normal file
@@ -0,0 +1,88 @@
|
||||
- title: 第 0 单元. 课程欢迎
|
||||
sections:
|
||||
- local: unit0/introduction
|
||||
title: 欢迎来到课程 🤗
|
||||
- local: unit0/onboarding
|
||||
title: 入门指南
|
||||
- local: unit0/discord101
|
||||
title: (可选) Discord 使用指南
|
||||
- title: 直播 1. 课程运作方式和问答
|
||||
sections:
|
||||
- local: communication/live1
|
||||
title: 直播 1. 课程运作方式和问答
|
||||
- title: 第 1 单元. 智能体简介
|
||||
sections:
|
||||
- local: unit1/introduction
|
||||
title: 简介
|
||||
- local: unit1/what-are-agents
|
||||
title: 什么是智能体?
|
||||
- local: unit1/quiz1
|
||||
title: 快速测验 1
|
||||
- local: unit1/what-are-llms
|
||||
title: 什么是 LLMs?
|
||||
- local: unit1/messages-and-special-tokens
|
||||
title: 消息和特殊令牌
|
||||
- local: unit1/tools
|
||||
title: 什么是工具?
|
||||
- local: unit1/quiz2
|
||||
title: 快速测验 2
|
||||
- local: unit1/agent-steps-and-structure
|
||||
title: 通过思考-行动-观察循环理解 AI 代理
|
||||
- local: unit1/thoughts
|
||||
title: 思考、内部推理和 Re-Act 方法
|
||||
- local: unit1/actions
|
||||
title: 行动,使代理能够与环境交互
|
||||
- local: unit1/observations
|
||||
title: 观察,整合反馈以反思和适应
|
||||
- local: unit1/dummy-agent-library
|
||||
title: 简单代理库
|
||||
- local: unit1/tutorial
|
||||
title: 使用 Smolagents 创建我们的第一个代理
|
||||
- local: unit1/final-quiz
|
||||
title: 第 1 单元最终测验
|
||||
- local: unit1/conclusion
|
||||
title: 结论
|
||||
- title: Unit 2. AI 智能体框架
|
||||
sections:
|
||||
- local: unit2/introduction
|
||||
title: AI 智能体框架
|
||||
- title: Unit 2.1 smolagents 框架
|
||||
sections:
|
||||
- local: unit2/smolagents/introduction
|
||||
title: smolagents 简介
|
||||
- local: unit2/smolagents/why_use_smolagents
|
||||
title: 为什么使用 smolagents?
|
||||
- local: unit2/smolagents/quiz1
|
||||
title: 快速测验1
|
||||
- local: unit2/smolagents/code_agents
|
||||
title: 构建使用代码的智能体
|
||||
- local: unit2/smolagents/tool_calling_agents
|
||||
title: 将智能体与工具集成
|
||||
- local: unit2/smolagents/tools
|
||||
title: 工具
|
||||
- local: unit2/smolagents/retrieval_agents
|
||||
title: 检索智能体
|
||||
- local: unit2/smolagents/quiz2
|
||||
title: 快速测验2
|
||||
- local: unit2/smolagents/multi_agent_systems
|
||||
title: 多智能体系统
|
||||
- local: unit2/smolagents/vision_agents
|
||||
title: 视觉和浏览器智能体
|
||||
- local: unit2/smolagents/final_quiz
|
||||
title: 最终测验
|
||||
- local: unit2/smolagents/conclusion
|
||||
title: 结论
|
||||
- title: 附加单元 1. 为函数调用微调大型语言模型
|
||||
sections:
|
||||
- local: bonus-unit1/introduction
|
||||
title: 简介
|
||||
- local: bonus-unit1/what-is-function-calling
|
||||
title: 什么是函数调用?
|
||||
- local: bonus-unit1/fine-tuning
|
||||
title: 让我们为函数调用微调模型
|
||||
- local: bonus-unit1/conclusion
|
||||
title: 结论
|
||||
- title: 后续内容何时发布?
|
||||
sections:
|
||||
- local: communication/next-units
|
||||
title: 后续单元
|
||||
13
units/zh-CN/bonus-unit1/conclusion.mdx
Normal file
13
units/zh-CN/bonus-unit1/conclusion.mdx
Normal file
@@ -0,0 +1,13 @@
|
||||
# 结论 (Conclusion) [[conclusion]]
|
||||
|
||||
恭喜你完成第一个附加单元 🥳
|
||||
|
||||
你已经**掌握了函数调用 (function-calling) 的理解,以及如何微调 (fine-tune) 你的模型来实现函数调用**!
|
||||
|
||||
如果我们现在有一条建议,那就是尝试**微调 (fine-tune) 不同的模型**。**学习的最好方式就是通过尝试。**
|
||||
|
||||
在下一个单元中,你将学习如何使用**最先进的框架 (state-of-the-art frameworks),如 `smolagents`、`LlamaIndex` 和 `LangGraph`**。
|
||||
|
||||
最后,我们很想**听听你对这门课程的看法,以及我们如何改进它**。如果你有任何反馈,请 👉 [填写这个表单](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
|
||||
|
||||
### 继续学习,保持优秀 🤗
|
||||
49
units/zh-CN/bonus-unit1/fine-tuning.mdx
Normal file
49
units/zh-CN/bonus-unit1/fine-tuning.mdx
Normal file
@@ -0,0 +1,49 @@
|
||||
# 让我们为函数调用微调模型 (Let's Fine-Tune your model for function-calling)
|
||||
|
||||
我们现在准备好为函数调用微调我们的第一个模型了 🔥。
|
||||
|
||||
## 我们如何训练模型进行函数调用?
|
||||
|
||||
> 答案:我们需要**数据**
|
||||
|
||||
模型训练可以分为3个步骤:
|
||||
|
||||
1. **模型在大量数据上进行预训练 (pretrained)**。这一步的输出是一个**预训练模型 (pre-trained model)**。例如 [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b)。这是一个基础模型,只知道**如何预测下一个词元(token),而没有良好的指令跟随能力**。
|
||||
|
||||
2. 然后,为了在对话环境中发挥作用,模型需要进行**微调 (fine-tuned)**以遵循指令。在这一步中,可以由模型创建者、开源社区、你或任何人进行训练。例如 [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) 是由 Gemma 项目背后的谷歌团队进行的指令微调模型。
|
||||
|
||||
3. 然后可以将模型**对齐 (aligned)**到创建者的偏好。例如,一个必须永远不能对客户无礼的客户服务聊天模型。
|
||||
|
||||
通常,像 Gemini 或 Mistral 这样的完整产品**会经历所有这3个步骤**,而你在 Hugging Face 上找到的模型可能已经经过了这些训练步骤中的一个或多个。
|
||||
|
||||
在本教程中,我们将基于 [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) 构建一个函数调用模型。基础模型是 [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b),谷歌团队在指令跟随方面对基础模型进行了微调:产生了 **"google/gemma-2-2b-it"**。
|
||||
|
||||
在这种情况下,我们将使用 **"google/gemma-2-2b-it"** 作为基础,**而不是基础模型,因为它之前经历的微调对我们的用例很重要**。
|
||||
|
||||
由于我们想要通过消息对话与我们的模型进行交互,从基础模型开始**需要更多的训练才能学习指令跟随、聊天和函数调用**。
|
||||
|
||||
通过从指令微调模型开始,**我们最小化了模型需要学习的信息量**。
|
||||
|
||||
## LoRA(大语言模型的低秩适应)
|
||||
|
||||
LoRA(大语言模型的低秩适应,Low-Rank Adaptation of Large Language Models)是一种流行的轻量级训练技术,它显著**减少了可训练参数的数量**。
|
||||
|
||||
它的工作原理是**将较少数量的新权重作为适配器插入到模型中进行训练**。这使得使用 LoRA 进行训练更快、内存效率更高,并产生更小的模型权重(几百 MB),更易于存储和共享。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/blog_multi-lora-serving_LoRA.gif" alt="LoRA inference" width="50%"/>
|
||||
|
||||
LoRA 通过向 Transformer 层添加秩分解矩阵对来工作,通常关注线性层。在训练期间,我们将"冻结"模型的其余部分,只更新那些新添加的适配器的权重。
|
||||
|
||||
通过这样做,我们需要训练的参数数量大大减少,因为我们只需要更新适配器的权重。
|
||||
|
||||
在推理过程中,输入通过适配器传递,基础模型或这些适配器权重可以与基础模型合并,不会产生额外的延迟开销。
|
||||
|
||||
LoRA 特别适用于将**大型**语言模型适应特定任务或领域,同时保持资源需求可控。这有助于减少训练模型所需的内存。
|
||||
|
||||
如果你想了解更多关于 LoRA 如何工作的信息,你应该查看这个[教程](https://huggingface.co/learn/nlp-course/chapter11/4?fw=pt)。
|
||||
|
||||
## 为函数调用微调模型
|
||||
|
||||
你可以在这里访问教程笔记本 👉 [点击这里](https://huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb)。
|
||||
|
||||
然后,点击 [](https://colab.research.google.com/#fileId=https://huggingface.co/agents-course/notebooks/blob/main/bonus-unit1/bonus-unit1.ipynb) 以便在 Colab Notebook 中运行它。
|
||||
53
units/zh-CN/bonus-unit1/introduction.mdx
Normal file
53
units/zh-CN/bonus-unit1/introduction.mdx
Normal file
@@ -0,0 +1,53 @@
|
||||
# 简介 (Introduction)
|
||||
|
||||

|
||||
|
||||
欢迎来到第一个**附加单元**,在这里你将学习如何**为函数调用 (function calling) 微调大语言模型 (Large Language Model, LLM)**。
|
||||
|
||||
在大语言模型领域,函数调用正在迅速成为一项*必须掌握*的技术。
|
||||
|
||||
这个想法是,不同于我们在第1单元中仅依赖基于提示的方法,函数调用在训练阶段就训练你的模型**采取行动和解释观察结果**,使你的人工智能更加健壮。
|
||||
|
||||
> **我应该什么时候学习这个附加单元?**
|
||||
>
|
||||
> 这个部分是**可选的**,比第1单元更高级,所以不要犹豫,你可以现在就学习这个单元,或者在通过本课程提高了知识水平后再回来学习。
|
||||
>
|
||||
> 但不用担心,这个附加单元设计时包含了你需要的所有信息,所以即使你还没有学习微调的内部工作原理,我们也会带你了解为函数调用微调模型的每个核心概念。
|
||||
|
||||
让你能够跟上这个附加单元的最佳方式是:
|
||||
|
||||
1. 了解如何使用 Transformers 微调大语言模型,如果你还不了解,[请查看这里](https://huggingface.co/learn/nlp-course/chapter3/1?fw=pt)
|
||||
|
||||
2. 了解如何使用 `SFTTrainer` 来微调我们的模型,要了解更多信息,[请查看这份文档](https://huggingface.co/learn/nlp-course/en/chapter11/1)
|
||||
|
||||
---
|
||||
|
||||
## 你将学到什么
|
||||
|
||||
1. **函数调用 (Function Calling)**
|
||||
现代大语言模型如何有效地构建对话,使它们能够触发**工具 (Tools)**。
|
||||
|
||||
2. **LoRA(低秩适应,Low-Rank Adaptation)**
|
||||
一种**轻量级且高效**的微调方法,减少计算和存储开销。LoRA 使大型模型的训练变得*更快、更便宜、更容易*部署。
|
||||
|
||||
3. **函数调用模型中的思考 → 行动 → 观察循环(Thought → Act → Observe Cycle)**
|
||||
一种简单但强大的方法,用于构建模型如何决定何时(以及如何)调用函数、跟踪中间步骤以及解释来自外部工具或API的结果。
|
||||
|
||||
4. **新的特殊词元 (Special Tokens)**
|
||||
我们将介绍**特殊标记**,帮助模型区分:
|
||||
- 内部"思维链"推理
|
||||
- 外部函数调用
|
||||
- 来自外部工具的响应
|
||||
|
||||
---
|
||||
|
||||
在完成这个附加单元后,你将能够:
|
||||
|
||||
- **理解**工具相关的 API 内部工作原理。
|
||||
- 使用 LoRA 技术**微调**模型。
|
||||
- **实现**和**修改**思考 → 行动 → 观察循环,以创建健壮和可维护的函数调用工作流。
|
||||
- **设计和使用**特殊词元,无缝分离模型的内部推理和外部行动。
|
||||
|
||||
而且你将**拥有自己微调的模型来进行函数调用。** 🔥
|
||||
|
||||
让我们深入了解**函数调用**吧!
|
||||
78
units/zh-CN/bonus-unit1/what-is-function-calling.mdx
Normal file
78
units/zh-CN/bonus-unit1/what-is-function-calling.mdx
Normal file
@@ -0,0 +1,78 @@
|
||||
# 什么是函数调用?(What is Function Calling?)
|
||||
|
||||
函数调用是**大语言模型 (LLM) 对其环境采取行动的一种方式**。它最初在 [GPT-4中引入](https://openai.com/index/function-calling-and-other-api-updates/),然后被其他模型复制。
|
||||
|
||||
就像智能体 (Agent) 的工具一样,函数调用赋予了模型**对其环境采取行动的能力**。然而,函数调用能力是**由模型学习的**,并且**比其他智能体技术更少依赖提示**。
|
||||
|
||||
在第1单元中,智能体**没有学习使用工具 (Tools)**,我们只是提供了工具列表,并依赖模型**能够泛化使用这些工具定义计划**的事实。
|
||||
|
||||
而在这里,**通过函数调用,智能体被微调(训练)来使用工具**。
|
||||
|
||||
## 模型如何"学习"采取行动?
|
||||
|
||||
在第1单元中,我们探讨了智能体的一般工作流程。一旦用户向智能体提供了一些工具并用查询提示它,模型将循环执行:
|
||||
|
||||
1. *思考(Think)*:为了实现目标,我需要采取什么行动。
|
||||
2. *行动(Act)*:使用正确的参数格式化行动并停止生成。
|
||||
3. *观察(Observe)*:从执行中获取结果。
|
||||
|
||||
在通过 API 与模型进行的"典型"对话中,对话将在用户和助手消息之间交替进行,如下所示:
|
||||
|
||||
```python
|
||||
conversation = [
|
||||
{"role": "user", "content": "I need help with my order"},
|
||||
{"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
|
||||
{"role": "user", "content": "It's ORDER-123"},
|
||||
]
|
||||
```
|
||||
|
||||
函数调用为对话带来了**新的角色**!
|
||||
|
||||
1. 一个用于 **行动(Action)** 的新角色
|
||||
2. 一个用于 **观察(Observation)** 的新角色
|
||||
|
||||
如果我们以 [Mistral API](https://docs.mistral.ai/capabilities/function_calling/) 为例,它看起来像这样:
|
||||
|
||||
```python
|
||||
conversation = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "What's the status of my transaction T1001?"
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "",
|
||||
"function_call": {
|
||||
"name": "retrieve_payment_status",
|
||||
"arguments": "{\"transaction_id\": \"T1001\"}"
|
||||
}
|
||||
},
|
||||
{
|
||||
"role": "tool",
|
||||
"name": "retrieve_payment_status",
|
||||
"content": "{\"status\": \"Paid\"}"
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": "Your transaction T1001 has been successfully paid."
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
> ...但你说函数调用有一个新角色?
|
||||
|
||||
**是也不是**,在这种情况下和许多其他API中,模型将要采取的行动格式化为"助手"消息。聊天模板然后将此表示为函数调用的**特殊词元 (special tokens)**。
|
||||
|
||||
- `[AVAILABLE_TOOLS]` – 开始可用工具列表
|
||||
- `[/AVAILABLE_TOOLS]` – 结束可用工具列表
|
||||
- `[TOOL_CALLS]` – 调用工具(即采取"行动")
|
||||
- `[TOOL_RESULTS]` – "观察"行动的结果
|
||||
- `[/TOOL_RESULTS]` – 观察结束(即模型可以再次解码)
|
||||
|
||||
我们将在本课程中再次讨论函数调用,但如果你想深入了解,可以查看[这个优秀的文档部分](https://docs.mistral.ai/capabilities/function_calling/)
|
||||
|
||||
---
|
||||
|
||||
现在我们已经了解了什么是函数调用以及它是如何工作的,让我们**为一个尚未具有这些能力的模型添加一些函数调用功能**:通过向模型添加一些新的特殊词元来增强: [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)。
|
||||
|
||||
要能够做到这一点,**我们首先需要理解微调和LoRA**。
|
||||
7
units/zh-CN/communication/live1.mdx
Normal file
7
units/zh-CN/communication/live1.mdx
Normal file
@@ -0,0 +1,7 @@
|
||||
# 直播第一课:课程体系解读与首次答疑会
|
||||
|
||||
在本期智能体课程的首场直播中,我们详细解析了课程运行机制(涵盖课程范围、单元结构、实践挑战等核心要素),并针对学员疑问进行现场解答。
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/iLVyYDbdSmM?si=TCX5Ai3uZuKLXq45" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
|
||||
|
||||
要获取后续直播排期,请关注我们的 **Discord 动态**. 系统也将同步发送邮件提醒。若无法实时参与,学员无需担心,我们**对所有直播课程都会进行全程录制存档**。
|
||||
9
units/zh-CN/communication/next-units.mdx
Normal file
9
units/zh-CN/communication/next-units.mdx
Normal file
@@ -0,0 +1,9 @@
|
||||
# 后续单元发布时间表及常见问题解答
|
||||
|
||||
课程单元发布时间安排如下:
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/next-units.jpg" alt="下一单元" width="100%"/>
|
||||
|
||||
请务必 <a href="https://bit.ly/hf-learn-agents">完成课程注册</a>! 完成注册后, **我们将随单元发布进度为您推送专属学习链接,同步更新挑战任务详情及课程动态**。
|
||||
|
||||
持续精进,成就卓越 🤗
|
||||
53
units/zh-CN/unit0/discord101.mdx
Normal file
53
units/zh-CN/unit0/discord101.mdx
Normal file
@@ -0,0 +1,53 @@
|
||||
# (选读) Discord 101 [[discord-101]]
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/discord-etiquette.jpg" alt="Discord 礼仪指南" width="100%"/>
|
||||
|
||||
本指南旨在帮助您快速上手 Discord ——这款在游戏与机器学习社区广受欢迎的自由聊天平台。
|
||||
|
||||
<a href="https://discord.gg/UrrTSsSyjb" target="_blank">点击此处</a> 加入**拥有逾10万成员**的 Hugging Face 社区 Discord 服务器,开启您的技术社交之旅!
|
||||
|
||||
## Hugging Face Discord 社区的智能体课程 [[hf-discord-agents-course]]
|
||||
|
||||
对于初次接触 Discord 的用户,平台操作可能稍显复杂,以下简明指引将助您快速掌握核心功能。
|
||||
|
||||
<!-- 注:当前注册流程已更新,系统将引导您选择兴趣标签。请务必勾选**"AI 智能体"**选项以解锁AI智能体专题板块,该板块包含所有课程相关频道。欢迎自由探索并加入其他感兴趣的频道! 🚀-->
|
||||
|
||||
Hugging Face 社区服务器汇聚了多元技术方向的活跃开发者,通过论文研讨、技术活动等丰富形式,为您打造沉浸式学习体验。
|
||||
|
||||
|
||||
完成【注册】(http://hf.co/join/discord)后,请前往`#自我介绍`频道完善个人资料。
|
||||
|
||||
我们为智能体课程专设了四大核心频道:
|
||||
|
||||
- `智能体课程公告`: 获取**最新课程动态与更新通知**.
|
||||
- `🎓-智能体课程总览`: 进行**日常讨论与自由交流**.
|
||||
- `智能体课程答疑`: **提问解惑与互助学习**专区.
|
||||
- `智能体成果展示`: **分享您的最佳智能体作品** .
|
||||
|
||||
额外推荐关注:
|
||||
|
||||
- `smolagents技术交流`: 关于**智能体库的使用讨论与技术支援**.
|
||||
|
||||
## Discord 高效使用技巧
|
||||
|
||||
### 服务器加入指南
|
||||
|
||||
若您对 Discord 平台尚不熟悉,建议参阅本平台的 <a href="https://support.discord.com/hc/en-us/articles/360034842871-How-do-I-join-a-Server#h_01FSJF9GT2QJMS2PRAW36WNBS8" target="_blank">服务器加入指南</a> 获取详细操作指引。
|
||||
|
||||
以下是简明的步骤指南:
|
||||
|
||||
1. 点击 <a href="https://discord.gg/UrrTSsSyjb" target="_blank">邀请链接</a>(新窗口打开)。
|
||||
2. 登录现有 Discord 账户或注册新账号。
|
||||
3. 完成真人验证。
|
||||
4. 设置用户名与头像(建议使用学术机构标识)。
|
||||
5. 点击"加入服务器"完成接入。
|
||||
|
||||
### 如何高效使用 Discord
|
||||
|
||||
以下是有效使用 Discord 的几点建议:
|
||||
|
||||
- **语音频道**虽已开放,但文字聊天仍是更常用的沟通方式。
|
||||
- 支持使用 **Markdown style** 格式化文本(尤其适用于代码编写),但需注意其在链接处理方面的效果欠佳。
|
||||
- 针对**长对话场景**,建议开启子线程(Threads)功能以保持讨论条理性。
|
||||
|
||||
希望本指南能为您提供帮助!如有任何疑问,欢迎通过 Discord 平台向我们咨询 🤗.
|
||||
177
units/zh-CN/unit0/introduction.mdx
Normal file
177
units/zh-CN/unit0/introduction.mdx
Normal file
@@ -0,0 +1,177 @@
|
||||
# 欢迎加入 🤗 AI Agents 课程 [[introduction]]
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/thumbnail.jpg" alt="AI Agents Course thumbnail" width="100%"/>
|
||||
<figcaption>该图片背景使用 <a href="https://scenario.com/">Scenario.com</a> 生成
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
|
||||
欢迎来到当今 AI 领域最激动人心的主题: **Agents**!
|
||||
|
||||
本免费课程将带您完成**从新手到专家**的蜕变之旅,全面掌握 AI 智能体的理解、使用与构建技能。
|
||||
|
||||
首个单元将帮助您快速入门:
|
||||
|
||||
- 了解**课程大纲**。
|
||||
- **选择学习路径**:自主研修或认证课程。
|
||||
- **获取认证流程与截止日期详情**。
|
||||
- 认识课程开发团队。
|
||||
- 创建您的 **Hugging Face 账号**。
|
||||
- **登录 Discord 服务**, 并与同学及导师互动。
|
||||
|
||||
开始学习之旅!
|
||||
|
||||
## 课程内容概览 [[expect]]
|
||||
|
||||
在本课程中,您将:
|
||||
|
||||
- 📖 系统学习 AI 智能体的**理论架构、设计原理与实践应用**
|
||||
- 🧑💻 掌握主流 AI 智能体开发库的使用,包括 [smolagents](https://huggingface.co/docs/smolagents/en/index)、 [LangChain](https://www.langchain.com/) 和 [LlamaIndex](https://www.llamaindex.ai/).
|
||||
- 💾 在 Hugging Face Hub 上**发布您的** agents 并探索社区作品
|
||||
- 🏆 参与挑战赛,在实战中**与其他学员的 agents 进行性能对标**
|
||||
- 🎓 通过课程作业**获取结业证书**
|
||||
|
||||
此外!
|
||||
|
||||
在本课程结束时,您将理解智能体的工作原理,以及如何运用最新库和工具构建自己的智能体。
|
||||
|
||||
别忘了 **<a href="https://bit.ly/hf-learn-agents">立即报名课程!</a>**
|
||||
|
||||
(我们尊重您的隐私。收集邮箱仅用于**在每单元发布时发送课程链接,并向您同步挑战动态与课程更新**。)
|
||||
|
||||
## 课程结构 [[course-look-like]]
|
||||
|
||||
课程包含四大模块:
|
||||
|
||||
- *基础单元*:系统学习智能体的核心理论知识。
|
||||
- *实践环节*:通过预配置环境的 Hugging Face Spaces,掌握如何用成熟的AI智能体库训练专属智能体。
|
||||
- *应用案例作业*:自选真实场景,运用所学知识解决实际问题。
|
||||
- *终极挑战*:让您的智能体与其他参赛者同台竞技,最终成绩将登上 [排行榜](https://huggingface.co/spaces/huggingface-projects/AI-Agents-Leaderboard) (即将开放)。
|
||||
|
||||
本课程是持续进化的动态项目,您的反馈与贡献将推动课程迭代! 欢迎通过 [GitHub 提交问题与代码](https://github.com/huggingface/agents-course)参与建设,或在 Discord 社区展开讨论。
|
||||
|
||||
完成课程后,您可通过 [👉 反馈表单](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)提交宝贵建议。
|
||||
|
||||
## 课程大纲 [[syllabus]]
|
||||
|
||||
以下是**课程总体大纲**,各单元发布时将附详细知识点列表。
|
||||
|
||||
| 章节 | 主题 | 描述 |
|
||||
| :---- | :---- | :---- |
|
||||
| 0 | 入门准备 | 配置课程所需的工具与平台环境 |
|
||||
| 1 | 智能体基础 | 解析工具(Tools)、思维(Thoughts)、行动(Actions)、观测(Observations)及其格式,详解大语言模型 (LLMs)、消息结构、特殊标记与对话模板,演示基于 Python 函数的工具使用案例 |
|
||||
| 2 | 框架实践 | 探索主流智能体库的实现原理:smolagents、LangGraph、LLamaIndex |
|
||||
| 3 | 应用案例 | 构建真实场景应用案例(欢迎有经验的智能体开发者通过PR贡献案例 🤗) |
|
||||
| 4 | 期末大作业 | 针对选定基准测试开发智能体,用学员排行榜 🚀 上的表现证明实力 |
|
||||
|
||||
*我们还将推出系列拓展单元,敬请期待!*
|
||||
|
||||
## 学习要求
|
||||
|
||||
参与本课程需具备以下基础:
|
||||
|
||||
- Python 基础语法能力
|
||||
- 大语言模型(LLMs)基本认知(第1单元设有知识回顾环节)
|
||||
|
||||
|
||||
## 所需工具 [[tools]]
|
||||
|
||||
仅需准备两样物品:
|
||||
|
||||
- 一台*可联网的电脑*
|
||||
- *Hugging Face 账号*:用于上传/加载模型与智能体、创建 Spaces 。若未注册,可点击**[此处](https://hf.co/join)** 免费创建。
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/tools.jpg" alt="Course tools needed" width="100%"/>
|
||||
|
||||
## 认证机制 [[certification-process]]
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/three-paths.jpg" alt="Three paths" width="100%"/>
|
||||
|
||||
您可选择*旁听模式*自由学习,或通过考核获取*双轨认证*:
|
||||
|
||||
旁听模式:可自由参与挑战与作业(无需告知我们)
|
||||
|
||||
*认证模式*(完全免费):
|
||||
|
||||
- *基础认证*: 完成第1单元学习,适合希望掌握智能体前沿趋势的学习者
|
||||
- *结业认证*: 需完成第1单元、任一应用案例作业及最终挑战
|
||||
|
||||
认证截止日期:所有考核作业需在*2025年5月1日*前完成。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/deadline.jpg" alt="Deadline" width="100%"/>
|
||||
|
||||
## 推荐学习进度 [[recommended-pace]]
|
||||
|
||||
本课程每个章节设计为**建议在1周内完成,每周约需投入3-4小时学习时间**。
|
||||
|
||||
为帮助您更好地把握学习节奏,我们提供以下进度建议:
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/recommended-pace.jpg" alt="Recommended Pace" width="100%"/>
|
||||
|
||||
## 如何高效学习课程? [[advice]]
|
||||
|
||||
为帮助您获得最佳学习效果,我们提供以下建议:
|
||||
|
||||
1. <a href="https://discord.gg/UrrTSsSyjb">加入 Discord 学习小组</a>: 群体学习往往事半功倍。加入我们的 Discord 服务器后,请先完成 Hugging Face 账户验证。
|
||||
2. **完成测验与实践作业**: 通过实践操作和自我检测是最高效的学习方式。
|
||||
3. **制定学习计划保持同步**: 您可参考下方的推荐进度表,或创建个性化学习计划。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/advice.jpg" alt="Course advice" width="100%"/>
|
||||
|
||||
## 关于我们 [[who-are-we]]
|
||||
|
||||
课程作者团队:
|
||||
|
||||
### 乔弗里·托马斯(Joffrey Thomas)
|
||||
|
||||
Hugging Face 机器学习工程师,拥有生产环境 AI 智能体开发部署经验,担任本课程首席讲师。
|
||||
|
||||
- [在 Hugging Face 关注 Joffrey](https://huggingface.co/Jofthomas)
|
||||
- [在 X 关注 Joffrey](https://x.com/Jthmas404)
|
||||
- [在 Linkedin 关注 Joffrey](https://www.linkedin.com/in/joffrey-thomas/)
|
||||
|
||||
### 本·伯滕肖(Ben Burtenshaw)
|
||||
|
||||
Hugging Face 机器学习工程师,拥有多平台课程开发经验,致力于打造普惠型技术教育课程。
|
||||
|
||||
- [在 Hugging Face 关注 Ben](https://huggingface.co/burtenshaw)
|
||||
- [在 X 关注 Ben](https://x.com/ben_burtenshaw)
|
||||
- [在 LinkedIn 上关注Ben](https://www.linkedin.com/in/ben-burtenshaw/)
|
||||
|
||||
### 托马斯·西蒙尼尼(Thomas Simonini)
|
||||
|
||||
Thomas 是 Hugging Face 的机器学习工程师,主导开发了广受欢迎的 <a href="https://huggingface.co/learn/deep-rl-course/unit0/introduction">深度强化学习课程</a> 和 <a href="https://huggingface.co/learn/ml-games-course/en/unit0/introduction">游戏机器学习课程</a>。他是智能体技术的忠实拥趸,并期待见证社区成员将构建的创新成果。
|
||||
|
||||
- [在 Hugging Face 关注 Thomas](https://huggingface.co/ThomasSimonini)
|
||||
- [在 X 平台关注 Thomas](https://x.com/ThomasSimonini)
|
||||
- [在 LinkedIn 关注Thomas](https://www.linkedin.com/in/simoninithomas/)
|
||||
|
||||
## 致谢
|
||||
|
||||
我们衷心感谢以下人士对本课程作出的宝贵贡献:
|
||||
|
||||
- **[Pedro Cuenca](https://huggingface.co/pcuenq)** – 在课程材料审核中提供的专业指导
|
||||
- **[Aymeric Roucher](https://huggingface.co/m-ric)** – 打造了惊艳的解码演示空间和最终智能体演示
|
||||
- **[Joshua Lochner](https://huggingface.co/Xenova)** – 贡献了卓越的分词技术演示空间
|
||||
- **[Quentin Gallouédec](https://huggingface.co/qgallouedec)** – 感谢他对课程内容的帮助
|
||||
- **[David Berenstein](https://huggingface.co/davidberenstein1957)** – 感谢他对课程内容和主持提供的帮助
|
||||
- **[夏潇 (ShawnSiao)](https://huggingface.co/SSSSSSSiao)** – 课程的中文翻译者
|
||||
- **[Jiaming Huang](https://huggingface.co/nordicsushi)** – 课程的中文翻译者
|
||||
|
||||
## 问题反馈与课程改进 [[contribute]]
|
||||
|
||||
我们**热烈欢迎**您的贡献 🤗
|
||||
|
||||
- 若您在 notebook 中发现程序错误🐛,请 <a href="https://github.com/huggingface/agents-course/issues">提交问题报告</a> 并详细描述问题现象。
|
||||
- 若您希望优化课程内容,可直接 <a href="https://github.com/huggingface/agents-course/pulls">提交 Pull Request</a>。
|
||||
- 若您计划新增完整章节或单元,建议先 <a href="https://github.com/huggingface/agents-course/issues">创建讨论议题</a> **说明拟新增内容概要**,以便我们提供协作指导。
|
||||
|
||||
## 仍有疑问? [[questions]]
|
||||
|
||||
欢迎加入我们的 <a href="https://discord.gg/UrrTSsSyjb">discord server #ai-agents-discussions 频道</a>进行交流
|
||||
|
||||
|
||||
一切准备就绪,让我们启程探索吧 ⛵
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/time-to-onboard.jpg" alt="Time to Onboard" width="100%"/>
|
||||
|
||||
56
units/zh-CN/unit0/onboarding.mdx
Normal file
56
units/zh-CN/unit0/onboarding.mdx
Normal file
@@ -0,0 +1,56 @@
|
||||
# 启航准备:开启学习之旅 ⛵
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit0/time-to-onboard.jpg" alt="启程时刻" width="100%"/>
|
||||
|
||||
万事俱备,即刻启程!请完成以下四个步骤:
|
||||
|
||||
1. **注册 Hugging Face 账户**(如未完成)
|
||||
2. **加入 Discord 社区并自我介绍**(无需拘谨 🤗)
|
||||
3. **在 Hub 平台关注智能体课程**
|
||||
4. **助力课程推广**
|
||||
|
||||
### 步骤一:创建 Hugging Face 账户
|
||||
|
||||
(如未注册)请点击<a href='https://huggingface.co/join' target='_blank'>此处</a>创建账户
|
||||
|
||||
### 步骤二:加入 Discord 学习社区
|
||||
|
||||
👉🏻 点击<a href="https://discord.gg/UrrTSsSyjb" target="_blank">此链接</a>加入服务器
|
||||
|
||||
加入后请至 `#introduce-yourself` 频道完成自我介绍
|
||||
|
||||
我们设有多个 AI 智能体专属频道:
|
||||
- `agents-course-announcements`:**课程最新动态**发布
|
||||
- `🎓-agents-course-general`:**日常交流与自由讨论**
|
||||
- `agents-course-questions`:**答疑互助专区**
|
||||
- `agents-course-showcase`:**智能体成果展示厅**
|
||||
|
||||
另可关注技术研讨频道:
|
||||
- `smolagents`:**开发库技术交流与支持**
|
||||
|
||||
若您是 Discord 新用户,我们准备了《Discord 基础操作指南》供参考,详见[下一章节](discord101)
|
||||
|
||||
### 步骤三:关注 Hugging Face 智能体课程组织
|
||||
|
||||
通过关注课程组织,实时获取**最新课程资料、更新通知与重要公告**
|
||||
|
||||
👉 访问<a href="https://huggingface.co/agents-course" target="_blank">课程主页</a>点击 **Follow**
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/hf_course_follow.gif" alt="关注操作演示" width="100%"/>
|
||||
|
||||
### 步骤四:助力课程推广
|
||||
|
||||
两种方式支持课程发展:
|
||||
1. 为课程代码仓库点亮 ⭐ <a href="https://github.com/huggingface/agents-course" target="_blank">GitHub 项目主页</a>
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/please_star.gif" alt="点亮星标"/>
|
||||
|
||||
2. 分享学习宣言:使用专属宣传图在社交媒体宣告**你的学习计划**
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png">
|
||||
|
||||
点击 👉 [此处](https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/communication/share.png?download=true)下载宣传图
|
||||
|
||||
恭喜!🎉 **您已完成启航准备**!现在可以正式开启智能体技术的学习之旅,祝您探索愉快!
|
||||
|
||||
保持学习热情,继续闪耀 🤗
|
||||
35
units/zh-CN/unit1/README.md
Normal file
35
units/zh-CN/unit1/README.md
Normal file
@@ -0,0 +1,35 @@
|
||||
# 目录
|
||||
|
||||
您可以在 hf.co/learn 上访问第 1 单元 👉 <a href="https://hf.co/learn/agents-course/unit1/introduction">此处</a>
|
||||
|
||||
<!--
|
||||
| 标题 | 描述 |
|
||||
|-------|-------------|
|
||||
| [智能体的定义](1_definition_of_an_agent.md) | 概述智能体所能执行的任务,不含技术术语。 |
|
||||
| [解释大型语言模型(LLM)](2_explain_llms.md) | 介绍大型语言模型,包括模型家族树及适用于智能体的模型。 |
|
||||
| [消息和特殊标记](3_messages_and_special_tokens.md) | 解释消息、特殊标记及聊天模板的使用。 |
|
||||
| [虚拟智能体库](4_dummy_agent_library.md) | 介绍如何使用虚拟智能体库和无服务器API。 |
|
||||
| [工具](5_tools.md) | 概述用于智能体工具的Pydantic及其他常见工具格式。 |
|
||||
| [智能体步骤和结构](6_agent_steps_and_structure.md) | 介绍智能体涉及的步骤,包括思考、行动、观察,以及代码智能体与JSON智能体的比较。 |
|
||||
| [思考](7_thoughts.md) | 解释思考过程及ReAct方法。 |
|
||||
| [行动](8_actions.md) | 概述行动及停止和解析方法。 |
|
||||
| [观察](9_observations.md) | 解释观察过程及追加结果以反映。 |
|
||||
| [小测验](10_quizz.md) | 包含测试概念理解的小测验。 |
|
||||
| [简单用例](11_simple_use_case.md) | 提供一个使用datetime和Python函数作为工具的简单用例练习。 |
|
||||
-->
|
||||
|
||||
<!--
|
||||
| Title | Description |
|
||||
|-------|-------------|
|
||||
| [Definition of an Agent](1_definition_of_an_agent.md) | General example of what agents can do without technical jargon. |
|
||||
| [Explain LLMs](2_explain_llms.md) | Explanation of Large Language Models, including the family tree of models and suitable models for agents. |
|
||||
| [Messages and Special Tokens](3_messages_and_special_tokens.md) | Explanation of messages, special tokens, and chat-template usage. |
|
||||
| [Dummy Agent Library](4_dummy_agent_library.md) | Introduction to using a dummy agent library and serverless API. |
|
||||
| [Tools](5_tools.md) | Overview of Pydantic for agent tools and other common tool formats. |
|
||||
| [Agent Steps and Structure](6_agent_steps_and_structure.md) | Steps involved in an agent, including thoughts, actions, observations, and a comparison between code agents and JSON agents. |
|
||||
| [Thoughts](7_thoughts.md) | Explanation of thoughts and the ReAct approach. |
|
||||
| [Actions](8_actions.md) | Overview of actions and stop and parse approach. |
|
||||
| [Observations](9_observations.md) | Explanation of observations and append result to reflect. |
|
||||
| [Quizz](10_quizz.md) | Contains quizzes to test understanding of the concepts. |
|
||||
| [Simple Use Case](11_simple_use_case.md) | Provides a simple use case exercise using datetime and a Python function as a tool. |
|
||||
-->
|
||||
120
units/zh-CN/unit1/actions.mdx
Normal file
120
units/zh-CN/unit1/actions.mdx
Normal file
@@ -0,0 +1,120 @@
|
||||
# 动作:使智能体能够与环境交互
|
||||
|
||||
<Tip>
|
||||
在本节中,我们将探讨 AI 智能体 (AI agent) 与其环境交互的具体步骤。
|
||||
|
||||
我们将介绍动作 (actions) 如何被表示(使用 JSON 或代码),停止和解析方法 (stop and parse approach) 的重要性,以及不同类型的智能体。
|
||||
</Tip>
|
||||
|
||||
动作是**AI 智能体 (AI agent) 与其环境交互的具体步骤**。
|
||||
|
||||
无论是浏览网络获取信息还是控制物理设备,每个动作都是智能体执行的一个特定操作。
|
||||
|
||||
例如,一个协助客户服务的智能体可能会检索客户数据、提供支持文章或将问题转交给人工代表。
|
||||
|
||||
## 智能体动作的类型 (Types of Agent Actions)
|
||||
|
||||
有多种类型的智能体采用不同的方式执行动作:
|
||||
|
||||
| 智能体类型 | 描述 |
|
||||
|------------------------|--------------------------------------------------------------------------------------------------|
|
||||
| JSON 智能体 (JSON Agent) | 要执行的动作以 JSON 格式指定。 |
|
||||
| 代码智能体 (Code Agent) | 智能体编写代码块,由外部解释执行。 |
|
||||
| 函数调用智能体 (Function-calling Agent) | 这是 JSON 智能体的一个子类别,经过微调以为每个动作生成新消息。 |
|
||||
|
||||
动作本身可以服务于多种目的:
|
||||
|
||||
| 动作类型 | 描述 |
|
||||
|--------------------------|------------------------------------------------------------------------------------------|
|
||||
| 信息收集 (Information Gathering) | 执行网络搜索、查询数据库或检索文档。 |
|
||||
| 工具使用 (Tool Usage) | 进行 API 调用、运行计算和执行代码。 |
|
||||
| 环境交互 (Environment Interaction) | 操作数字界面或控制物理设备。 |
|
||||
| 通信 (Communication) | 通过聊天与用户互动或与其他智能体协作。 |
|
||||
|
||||
智能体的一个关键部分是**在动作完成时能够停止生成新的标记 (tokens)**,这对所有格式的智能体都适用:JSON、代码或函数调用。这可以防止意外输出并确保智能体的响应清晰准确。
|
||||
|
||||
大语言模型 (LLM) 只处理文本,并使用它来描述它想要采取的动作以及要提供给工具的参数。
|
||||
|
||||
## 停止和解析方法 (The Stop and Parse Approach)
|
||||
|
||||
实现动作的一个关键方法是**停止和解析方法**。这种方法确保智能体的输出具有结构性和可预测性:
|
||||
|
||||
1. **以结构化格式生成 (Generation in a Structured Format)**:
|
||||
|
||||
智能体以清晰、预定义的格式(JSON或代码)输出其预期动作。
|
||||
|
||||
2. **停止进一步生成 (Halting Further Generation)**:
|
||||
|
||||
一旦动作完成,**智能体停止生成额外的标记**。这可以防止额外或错误的输出。
|
||||
|
||||
3. **解析输出 (Parsing the Output)**:
|
||||
|
||||
外部解析器读取格式化的动作,确定要调用哪个工具,并提取所需的参数。
|
||||
|
||||
例如,需要检查天气的智能体可能输出:
|
||||
|
||||
```json
|
||||
Thought: I need to check the current weather for New York.
|
||||
Action :
|
||||
{
|
||||
"action": "get_weather",
|
||||
"action_input": {"location": "New York"}
|
||||
}
|
||||
```
|
||||
|
||||
然后框架可以轻松解析要调用的函数名称和要应用的参数。
|
||||
|
||||
这种清晰的、机器可读的格式最大限度地减少了错误,并使外部工具能够准确处理智能体的命令。
|
||||
|
||||
注意:函数调用智能体的操作方式类似,通过构造每个动作,使指定的函数能够使用正确的参数被调用。
|
||||
我们将在未来的单元中深入探讨这些类型的智能体。
|
||||
|
||||
## 代码智能体 (Code Agents)
|
||||
|
||||
另一种方法是使用*代码智能体*。
|
||||
这个想法是:**代码智能体不是输出简单的 JSON 对象**,而是生成一个**可执行的代码块——通常使用 Python 等高级语言**。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/code-vs-json-actions.png" alt="Code Agents" />
|
||||
|
||||
这种方法提供了几个优势:
|
||||
|
||||
- **表达能力 (Expressiveness):** 代码可以自然地表示复杂的逻辑,包括循环、条件和嵌套函数,提供比 JSON 更大的灵活性。
|
||||
- **模块化和可重用性 (Modularity and Reusability):** 生成的代码可以包含在不同动作或任务中可重用的函数和模块。
|
||||
- **增强的可调试性 (Enhanced Debuggability):** 使用明确定义的编程语法,代码错误通常更容易检测和纠正。
|
||||
- **直接集成 (Direct Integration):** 代码智能体可以直接与外部库和 API 集成,实现更复杂的操作,如数据处理或实时决策。
|
||||
|
||||
例如,一个负责获取天气的代码智能体可能生成以下 Python 代码片段:
|
||||
|
||||
```python
|
||||
# Code Agent Example: Retrieve Weather Information
|
||||
def get_weather(city):
|
||||
import requests
|
||||
api_url = f"https://api.weather.com/v1/location/{city}?apiKey=YOUR_API_KEY"
|
||||
response = requests.get(api_url)
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
return data.get("weather", "No weather information available")
|
||||
else:
|
||||
return "Error: Unable to fetch weather data."
|
||||
|
||||
# Execute the function and prepare the final answer
|
||||
result = get_weather("New York")
|
||||
final_answer = f"The current weather in New York is: {result}"
|
||||
print(final_answer)
|
||||
```
|
||||
|
||||
在这个例子中,代码智能体:
|
||||
|
||||
- **通过API调用**获取天气数据,
|
||||
- 处理响应,
|
||||
- 并使用print()函数输出最终答案。
|
||||
|
||||
这种方法**也遵循停止和解析方法**,通过明确划定代码块并表明执行完成的时间(在这里,通过打印 final_answer)。
|
||||
|
||||
---
|
||||
|
||||
我们了解到动作通过执行清晰、结构化的任务(无论是通过 JSON、代码还是函数调用)来连接智能体的内部推理和其现实世界的交互。
|
||||
|
||||
这种深思熟虑的执行确保每个动作都是精确的,并通过停止和解析方法准备好进行外部处理。在下一节中,我们将探索观察 (Observations),看看智能体如何捕获和整合来自其环境的反馈。
|
||||
|
||||
在此之后,我们将**最终准备好构建我们的第一个智能体!**
|
||||
145
units/zh-CN/unit1/agent-steps-and-structure.mdx
Normal file
145
units/zh-CN/unit1/agent-steps-and-structure.mdx
Normal file
@@ -0,0 +1,145 @@
|
||||
# 通过思考-行动-观察循环理解 AI 智能体 (Understanding AI Agents through the Thought-Action-Observation Cycle)
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-check-3.jpg" alt="Unit 1 planning"/>
|
||||
|
||||
在前面的章节中,我们学习了:
|
||||
|
||||
- **如何在系统提示中向智能体提供工具 (tools)**。
|
||||
- **AI 智能体 (AI agents) 是如何能够"推理"、规划并与其环境交互的系统**。
|
||||
|
||||
在本节中,**我们将探索完整的 AI 智能体工作流程**,这是我们定义的思考-行动-观察 (Thought-Action-Observation) 循环。
|
||||
|
||||
然后,我们将深入探讨这些步骤中的每一个。
|
||||
|
||||
## 核心组件 (Core Components)
|
||||
|
||||
智能体在一个持续的循环中工作:**思考 (Thought) → 行动 (Act) 和观察 (Observe)**。
|
||||
|
||||
让我们一起分解这些行动:
|
||||
|
||||
1. **思考 (Thought)**:智能体的大语言模型 (LLM) 部分决定下一步应该是什么。
|
||||
2. **行动 (Action)**:智能体通过使用相关参数调用工具来采取行动。
|
||||
3. **观察 (Observation)**:模型对工具的响应进行反思。
|
||||
|
||||
## 思考-行动-观察循环 (The Thought-Action-Observation Cycle)
|
||||
|
||||
这三个组件在一个持续的循环中协同工作。用编程的类比来说,智能体使用一个 **while 循环**:循环持续进行,直到智能体的目标被实现。
|
||||
|
||||
视觉上,它看起来是这样的:
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AgentCycle.gif" alt="Think, Act, Observe cycle"/>
|
||||
|
||||
在许多智能体框架中,**规则和指南直接嵌入到系统提示中**,确保每个循环都遵循定义的逻辑。
|
||||
|
||||
在一个简化版本中,我们的系统提示可能看起来像这样:
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/system_prompt_cycle.png" alt="Think, Act, Observe cycle"/>
|
||||
|
||||
我们在这里看到,在系统消息中我们定义了:
|
||||
|
||||
- *智能体的行为*。
|
||||
- *我们的智能体可以访问的工具*,就像我们在上一节中描述的那样。
|
||||
- *思考-行动-观察循环*,我们将其融入到大语言模型指令中。
|
||||
|
||||
让我们看一个小例子,在深入研究每个步骤之前理解这个过程。
|
||||
|
||||
## 阿尔弗雷德,天气智能体 (Alfred, the Weather Agent)
|
||||
|
||||
我们创建了阿尔弗雷德,天气智能体。
|
||||
|
||||
用户问阿尔弗雷德:"今天纽约的天气如何?"
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent.jpg" alt="Alfred Agent"/>
|
||||
|
||||
阿尔弗雷德的工作是使用天气 API 工具回答这个查询。
|
||||
|
||||
以下是循环的展开过程:
|
||||
|
||||
### 思考 (Thought)
|
||||
|
||||
**内部推理:**
|
||||
|
||||
在收到查询后,阿尔弗雷德的内部对话可能是:
|
||||
|
||||
*"用户需要纽约的当前天气信息。我可以访问一个获取天气数据的工具。首先,我需要调用天气API来获取最新的详细信息。"*
|
||||
|
||||
这一步显示了智能体将问题分解成步骤:首先,收集必要的数据。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-1.jpg" alt="Alfred Agent"/>
|
||||
|
||||
### 行动 (Action)
|
||||
|
||||
**工具使用:**
|
||||
|
||||
基于其推理和阿尔弗雷德知道有一个`get_weather`工具的事实,阿尔弗雷德准备一个 JSON 格式的命令来调用天气 API 工具。例如,它的第一个动作可能是:
|
||||
|
||||
思考:我需要检查纽约的当前天气。
|
||||
|
||||
```
|
||||
{
|
||||
"action": "get_weather",
|
||||
"action_input": {
|
||||
"location": "New York"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
在这里,动作清楚地指定了要调用哪个工具(如get_weather)和要传递的参数("location": "New York")。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-2.jpg" alt="Alfred Agent"/>
|
||||
|
||||
### 观察 (Observation)
|
||||
|
||||
**来自环境的反馈:**
|
||||
|
||||
在工具调用之后,阿尔弗雷德接收到一个观察结果。这可能是来自API的原始天气数据,如:
|
||||
|
||||
*"纽约当前天气:多云,15°C,湿度60%。"*
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-3.jpg" alt="Alfred Agent"/>
|
||||
|
||||
这个观察结果然后被添加到提示中作为额外的上下文。它作为现实世界的反馈,确认行动是否成功并提供所需的细节。
|
||||
|
||||
### 更新的思考 (Updated thought)
|
||||
|
||||
**反思:**
|
||||
|
||||
获得观察结果后,阿尔弗雷德更新其内部推理:
|
||||
|
||||
*"现在我有了纽约的天气数据,我可以为用户编写答案了。"*
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-4.jpg" alt="Alfred Agent"/>
|
||||
|
||||
### 最终行动 (Final Action)
|
||||
|
||||
然后阿尔弗雷德生成一个按照我们告诉它的方式格式化的最终响应:
|
||||
|
||||
思考:我现在有了天气数据。纽约当前天气多云,温度15°C,湿度60%。
|
||||
|
||||
最终答案:纽约当前天气多云,温度15°C,湿度60%。
|
||||
|
||||
这个最终行动将答案发送回用户,完成循环。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-agent-5.jpg" alt="Alfred Agent"/>
|
||||
|
||||
我们在这个例子中看到:
|
||||
|
||||
- **智能体在目标实现之前不断迭代循环:**
|
||||
|
||||
**阿尔弗雷德的过程是循环的**。它从思考开始,然后通过调用工具采取行动,最后观察结果。如果观察结果表明有错误或数据不完整,阿尔弗雷德可以重新进入循环来纠正其方法。
|
||||
|
||||
- **工具集成 (Tool Integration):**
|
||||
|
||||
调用工具(如天气 API)的能力使阿尔弗雷德能够**超越静态知识并检索实时数据**,这是许多 AI 智能体的重要方面。
|
||||
|
||||
- **动态适应 (Dynamic Adaptation):**
|
||||
|
||||
每个循环都允许智能体将新信息(观察)整合到其推理(思考)中,确保最终答案是明智和准确的。
|
||||
|
||||
这个例子展示了 *ReAct 循环*背后的核心概念(这是我们将在下一节中发展的概念):**思考、行动和观察的相互作用使 AI 智能体(AI Agent)能够迭代地解决复杂任务**。
|
||||
|
||||
通过理解和应用这些原则,你可以设计出不仅能够推理其任务,而且能够**有效利用外部工具来完成它们**的智能体,同时基于环境反馈不断改进其输出。
|
||||
|
||||
---
|
||||
|
||||
现在让我们深入了解过程中的各个步骤:思考、行动、观察。
|
||||
19
units/zh-CN/unit1/conclusion.mdx
Normal file
19
units/zh-CN/unit1/conclusion.mdx
Normal file
@@ -0,0 +1,19 @@
|
||||
# 总结 [[conclusion]]
|
||||
|
||||
恭喜你完成第一单元 🥳
|
||||
|
||||
你刚刚**掌握了智能体 (Agents) 的基础知识**,并且创建了你的第一个 AI 智能体 (AI Agent)!
|
||||
|
||||
如果你对某些内容仍感到困惑,这是**很正常的**。智能体是一个复杂的主题,需要一定时间才能完全理解所有内容。
|
||||
|
||||
在继续之前,**请花时间真正掌握这些材料**。在进入有趣的部分之前,掌握这些要素并建立坚实的基础很重要。
|
||||
|
||||
如果你通过了测验,别忘了在这里获取你的证书 🎓 👉 [点击这里](https://huggingface.co/spaces/agents-course/unit1-certification-app)
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/certificate-example.jpg" alt="Certificate Example"/>
|
||||
|
||||
在下一个(额外的)单元中,你将学习**如何微调智能体来进行函数调用 (function calling)(即能够根据用户提示调用工具)**。
|
||||
|
||||
最后,我们很想**听听你对课程的看法以及我们如何改进它**。如果你有任何反馈,请 👉 [填写此表格](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
|
||||
|
||||
### 继续学习,保持优秀 🤗
|
||||
330
units/zh-CN/unit1/dummy-agent-library.mdx
Normal file
330
units/zh-CN/unit1/dummy-agent-library.mdx
Normal file
@@ -0,0 +1,330 @@
|
||||
# 简单智能体库 (Dummy Agent Library)
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-unit1sub3DONE.jpg" alt="Unit 1 planning"/>
|
||||
|
||||
本课程是框架无关的,因为我们想要**专注于 AI 智能体(AI Agent)的概念,避免陷入特定框架的细节中**。
|
||||
|
||||
同时,我们希望学生能够在自己的项目中使用他们在本课程中学到的概念,使用任何他们喜欢的框架。
|
||||
|
||||
因此,在第一单元中,我们将使用一个简单智能体库和一个简单的无服务器 API (serverless API) 来访问我们的 LLM 引擎。
|
||||
|
||||
你可能不会在生产环境中使用这些,但它们将作为**理解智能体如何工作的良好起点**。
|
||||
|
||||
在本节之后,你将准备好**使用 `smolagents` 创建一个简单的智能体**。
|
||||
|
||||
在接下来的单元中,我们还将使用其他 AI 智能体库,如 `LangGraph`、`LangChain` 和 `LlamaIndex`。
|
||||
|
||||
为了保持简单,我们将使用一个简单的 Python 函数作为工具和智能体。
|
||||
|
||||
我们将使用内置的 Python 包,如 `datetime` 和 `os`,这样你可以在任何环境中尝试它。
|
||||
|
||||
你可以[在这个 notebook 中](https://huggingface.co/agents-course/notebooks/blob/main/dummy_agent_library.ipynb)跟随过程并**自己运行代码**。
|
||||
|
||||
## 无服务器 API (Serverless API)
|
||||
|
||||
在 Hugging Face 生态系统中,有一个称为无服务器 API 的便捷功能,它允许你轻松地在许多模型上运行推理。不需要安装或部署。
|
||||
|
||||
```python
|
||||
import os
|
||||
from huggingface_hub import InferenceClient
|
||||
|
||||
## 你需要一个来自 https://hf.co/settings/tokens 的令牌,确保你选择'read'作为令牌类型。如果你在 Google Colab 上运行,你可以在"settings"标签下的"secrets"中设置它。确保将其命名为"HF_TOKEN"
|
||||
os.environ["HF_TOKEN"]="hf_xxxxxxxxxxxxxx"
|
||||
|
||||
client = InferenceClient("meta-llama/Llama-3.2-3B-Instruct")
|
||||
# 如果下一个单元格的输出不正确,免费模型可能过载。你也可以使用这个包含 Llama-3.2-3B-Instruct 的公共端点
|
||||
# client = InferenceClient("https://jc26mwg228mkj8dw.us-east-1.aws.endpoints.huggingface.cloud")
|
||||
```
|
||||
|
||||
```python
|
||||
output = client.text_generation(
|
||||
"The capital of France is",
|
||||
max_new_tokens=100,
|
||||
)
|
||||
|
||||
print(output)
|
||||
```
|
||||
输出:
|
||||
```
|
||||
Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris. The capital of France is Paris.
|
||||
```
|
||||
如 LLM 部分所见,如果我们只做解码,**模型只会在预测到 EOS 令牌时停止**,而这里没有发生,因为这是一个会话(聊天)模型,**我们没有应用它期望的聊天模板**。
|
||||
|
||||
如果我们现在添加与我们使用的<a href="https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct"> Llama-3.2-3B-Instruct 模型</a>相关的特殊令牌,行为会改变,现在会产生预期的 EOS。
|
||||
|
||||
```python
|
||||
prompt="""<|begin_of_text|><|start_header_id|>user<|end_header_id|>
|
||||
The capital of France is<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
|
||||
output = client.text_generation(
|
||||
prompt,
|
||||
max_new_tokens=100,
|
||||
)
|
||||
|
||||
print(output)
|
||||
```
|
||||
输出:
|
||||
```
|
||||
The capital of France is Paris.
|
||||
```
|
||||
|
||||
使用"chat"方法是应用聊天模板的更方便和可靠的方式:
|
||||
```python
|
||||
output = client.chat.completions.create(
|
||||
messages=[
|
||||
{"role": "user", "content": "The capital of France is"},
|
||||
],
|
||||
stream=False,
|
||||
max_tokens=1024,
|
||||
)
|
||||
print(output.choices[0].message.content)
|
||||
```
|
||||
输出:
|
||||
```
|
||||
Paris.
|
||||
```
|
||||
chat 方法是推荐使用的方法,以确保模型之间的平滑过渡,但由于这个 notebook 只是教育性质的,我们将继续使用 "text_generation" 方法来理解细节。
|
||||
|
||||
## 简单智能体 (Dummy Agent)
|
||||
|
||||
在前面的部分中,我们看到智能体库的核心是在系统提示中附加信息。
|
||||
|
||||
这个系统提示比我们之前看到的要复杂一些,但它已经包含:
|
||||
|
||||
1. **工具信息**
|
||||
2. **循环指令** (思考 → 行动 → 观察)
|
||||
|
||||
```
|
||||
请尽可能准确地回答以下问题。你可以使用以下工具:
|
||||
|
||||
get_weather: 获取指定地点的当前天气
|
||||
|
||||
使用工具的方式是通过指定一个 JSON blob。具体来说,这个 JSON 应该包含 `action` 键(工具名称)和 `action_input` 键(工具输入参数)。
|
||||
|
||||
"action" 字段唯一允许的值是:
|
||||
get_weather: 获取指定地点的当前天气,参数:{"location": {"type": "string"}}
|
||||
使用示例:
|
||||
|
||||
{{
|
||||
"action": "get_weather",
|
||||
"action_input": {"location": "New York"}
|
||||
}}
|
||||
|
||||
必须始终使用以下格式:
|
||||
|
||||
Question: 需要回答的输入问题
|
||||
Thought: 你应该始终思考要采取的一个行动(每次只能执行一个行动)
|
||||
Action:
|
||||
|
||||
$JSON_BLOB (inside markdown cell)
|
||||
|
||||
Observation: 行动执行结果(这是唯一且完整的事实依据)
|
||||
...(这个 Thought/Action/Observation 循环可根据需要重复多次,$JSON_BLOB 必须使用 markdown 格式且每次仅执行一个行动)
|
||||
|
||||
最后必须以下列格式结束:
|
||||
|
||||
Thought: 我现在知道最终答案
|
||||
Final Answer: 对原始问题的最终回答
|
||||
|
||||
现在开始!请始终使用精确字符 `Final Answer:` 来给出最终答案
|
||||
```
|
||||
|
||||
由于我们正在运行“text_generation”方法,因此我们需要手动应用提示:
|
||||
```python
|
||||
prompt=f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
||||
{SYSTEM_PROMPT}
|
||||
<|eot_id|><|start_header_id|>user<|end_header_id|>
|
||||
What's the weather in London ?
|
||||
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
||||
"""
|
||||
```
|
||||
|
||||
我们也可以这样做,这就是在 `chat` 方法内部发生的情况:
|
||||
```python
|
||||
messages=[
|
||||
{"role": "system", "content": SYSTEM_PROMPT},
|
||||
{"role": "user", "content": "What's the weather in London ?"},
|
||||
]
|
||||
from transformers import AutoTokenizer
|
||||
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")
|
||||
|
||||
tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True)
|
||||
```
|
||||
|
||||
现在的提示是:
|
||||
```
|
||||
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
||||
请尽可能准确地回答以下问题。你可以使用以下工具:
|
||||
|
||||
get_weather: 获取指定地点的当前天气
|
||||
|
||||
使用工具的方式是通过指定一个 JSON blob。具体来说,这个 JSON 应该包含 `action` 键(工具名称)和 `action_input` 键(工具输入参数)。
|
||||
|
||||
"action" 字段唯一允许的值是:
|
||||
get_weather: 获取指定地点的当前天气,参数:{"location": {"type": "string"}}
|
||||
使用示例:
|
||||
|
||||
{{
|
||||
"action": "get_weather",
|
||||
"action_input": {"location": "New York"}
|
||||
}}
|
||||
|
||||
必须始终使用以下格式:
|
||||
|
||||
Question: 需要回答的输入问题
|
||||
Thought: 你应该始终思考要采取的一个行动(每次只能执行一个行动)
|
||||
Action:
|
||||
|
||||
$JSON_BLOB (markdown 单元格内部)
|
||||
|
||||
Observation: 行动执行结果(这是唯一且完整的事实依据)
|
||||
...(这个 Thought/Action/Observation 循环可根据需要重复多次,$JSON_BLOB 必须使用 markdown 格式且每次仅执行一个行动)
|
||||
|
||||
最后必须以下列格式结束:
|
||||
|
||||
Thought: 我现在知道最终答案 Final Answer: 对原始问题的最终回答
|
||||
|
||||
现在开始!请始终使用精确字符 `Final Answer:` 来给出最终答案
|
||||
<|eot_id|><|start_header_id|>user<|end_header_id|>
|
||||
What's the weather in London ?
|
||||
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
||||
```
|
||||
|
||||
Let's decode!
|
||||
```python
|
||||
output = client.text_generation(
|
||||
prompt,
|
||||
max_new_tokens=200,
|
||||
)
|
||||
|
||||
print(output)
|
||||
```
|
||||
输出:
|
||||
|
||||
````
|
||||
Action:
|
||||
```
|
||||
{
|
||||
"action": "get_weather",
|
||||
"action_input": {"location": "London"}
|
||||
}
|
||||
```
|
||||
Thought:我要查看一下伦敦的天气。
|
||||
Observation:伦敦目前的天气多云,最高气温 12°C,最低气温 8°C。
|
||||
````
|
||||
|
||||
你看到问题了吗?
|
||||
>答案是模型产生的幻觉。我们需要停下来真正执行这个函数!
|
||||
现在让我们停在“观察”上,这样我们就不会产生实际函数响应的幻觉。
|
||||
|
||||
```python
|
||||
output = client.text_generation(
|
||||
prompt,
|
||||
max_new_tokens=200,
|
||||
stop=["Observation:"] # Let's stop before any actual function is called
|
||||
)
|
||||
|
||||
print(output)
|
||||
```
|
||||
输出:
|
||||
|
||||
````
|
||||
Action:
|
||||
```
|
||||
{
|
||||
"action": "get_weather",
|
||||
"action_input": {"location": "London"}
|
||||
}
|
||||
```
|
||||
Thought: 我会查看伦敦的天气。
|
||||
Observation:
|
||||
````
|
||||
|
||||
好多了!
|
||||
现在让我们创建一个虚拟的获取天气函数。在实际情况下,您可能会调用 API。
|
||||
|
||||
```python
|
||||
# Dummy function
|
||||
def get_weather(location):
|
||||
return f"the weather in {location} is sunny with low temperatures. \n"
|
||||
|
||||
get_weather('London')
|
||||
```
|
||||
输出:
|
||||
```
|
||||
“伦敦天气晴朗,气温较低。\n”
|
||||
```
|
||||
|
||||
我们将基本提示、函数执行前的完成以及函数的结果连接起来作为观察并恢复生成。
|
||||
|
||||
```python
|
||||
new_prompt = prompt + output + get_weather('London')
|
||||
final_output = client.text_generation(
|
||||
new_prompt,
|
||||
max_new_tokens=200,
|
||||
)
|
||||
|
||||
print(final_output)
|
||||
```
|
||||
这是新的提示:
|
||||
````
|
||||
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
||||
尽可能回答以下问题。您可以使用以下工具:
|
||||
|
||||
get_weather:获取给定位置的当前天气
|
||||
|
||||
使用工具的方式是指定 json blob。
|
||||
|
||||
具体来说,此 json 应具有 `action` 键(包含要使用的工具的名称)和 `action_input` 键(包含工具的输入)。
|
||||
|
||||
“action”字段中应包含的唯一值是:
|
||||
|
||||
get_weather:获取给定位置的当前天气,参数:{"location": {"type": "string"}}
|
||||
示例用法:
|
||||
|
||||
{{
|
||||
"action": "get_weather",
|
||||
"action_input": {"location": "New York"}
|
||||
}}
|
||||
|
||||
始终使用以下格式:
|
||||
|
||||
Question: 您必须回答的输入问题
|
||||
Thought: 您应该始终考虑采取一项行动。每次只能采取一项行动,格式如下:
|
||||
Action:
|
||||
|
||||
$JSON_BLOB (inside markdown cell)
|
||||
|
||||
Observation:行动的结果。这种观察是独一无二的、完整的,也是真相的来源。
|
||||
...(这种 Thought/Action/Observation 可以重复 N 次,您应该在需要时采取几个步骤。$JSON_BLOB 必须格式化为 markdown,并且一次只能使用一个操作。)
|
||||
|
||||
您必须始终以以下格式结束输出:
|
||||
|
||||
想法:我现在知道最终答案
|
||||
最终答案:对原始输入问题的最终答案
|
||||
|
||||
现在开始!提醒您,当您提供明确答案时,始终使用精确字符“最终答案:”。
|
||||
<|eot_id|><|start_header_id|>user<|end_header_id|>
|
||||
What's the weather in London ?
|
||||
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
||||
|
||||
Action:
|
||||
```
|
||||
{
|
||||
"action": "get_weather",
|
||||
"action_input": {"location": {"type": "string", "value": "London"}
|
||||
}
|
||||
```
|
||||
Thought: 我要查看一下伦敦的天气。
|
||||
Observation: 伦敦天气晴朗,气温较低。
|
||||
````
|
||||
|
||||
输出:
|
||||
```
|
||||
最终答案:伦敦天气晴朗,气温较低。
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
我们学习了如何使用 Python 代码从头开始创建智能体,并且我们**看到了这个过程是多么繁琐**。幸运的是,许多智能体库通过为您处理大部分繁重的工作来简化这项工作。
|
||||
|
||||
现在,我们已准备好使用 `smolagents` 库**创建我们的第一个真正的智能体**。
|
||||
34
units/zh-CN/unit1/final-quiz.mdx
Normal file
34
units/zh-CN/unit1/final-quiz.mdx
Normal file
@@ -0,0 +1,34 @@
|
||||
# 第一单元测验 (Unit 1 Quiz)
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-unit1sub4DONE.jpg" alt="Unit 1 planning"/>
|
||||
|
||||
恭喜你完成第一单元的学习!让我们测试一下你对目前所学关键概念的理解。
|
||||
|
||||
通过测验后,请继续下一部分领取你的证书。
|
||||
|
||||
祝你好运!
|
||||
|
||||
## 测验 (Quiz)
|
||||
|
||||
这是一个交互式测验。测验托管在 Hugging Face Hub 的空间中。你将通过一系列选择题来测试你对本单元所学关键概念的理解。完成测验后,你将能够看到你的分数和正确答案的详细分析。
|
||||
|
||||
重要提示:**通过测验后不要忘记点击提交 (Submit),否则你的考试分数将不会被保存!**
|
||||
|
||||
<iframe
|
||||
src="https://agents-course-unit-1-quiz.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
你也可以在这里访问测验 👉 [点击这里](https://huggingface.co/spaces/agents-course/unit_1_quiz)
|
||||
|
||||
## 学习认证
|
||||
|
||||
恭喜通过测验!**您现在可以获取专属结业证书 🎓**
|
||||
|
||||
成功完成本单元测评后,系统将为您生成单元结业认证证书。该证书可下载分享,作为课程进度的官方成就证明。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-unit1sub5DONE.jpg" alt="第一单元规划示意图"/>
|
||||
|
||||
获得证书后,您可将其添加至LinkedIn个人档案 🧑💼 或分享到X、Bluesky等社交平台。**如果标注@huggingface,我们将非常荣幸并为您送上祝贺**!🤗
|
||||
37
units/zh-CN/unit1/introduction.mdx
Normal file
37
units/zh-CN/unit1/introduction.mdx
Normal file
@@ -0,0 +1,37 @@
|
||||
# 智能体简介 (Introduction to Agents)
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/thumbnail.jpg" alt="Thumbnail"/>
|
||||
|
||||
欢迎来到第一单元,在这里**你将在 AI 智能体 (AI Agents) 的基础知识中建立坚实的基础**,包括:
|
||||
|
||||
* **理解智能体 (Understanding Agents)**
|
||||
* 什么是智能体,它是如何工作的?
|
||||
* 智能体如何使用推理 (Reasoning) 和规划 (Planning) 做出决策?
|
||||
|
||||
* **大型语言模型 (LLMs) 在智能体中的角色**
|
||||
* LLMs 如何作为智能体的"大脑"
|
||||
* LLMs 如何通过消息系统 (Message System) 构建对话
|
||||
|
||||
* **工具和行动 (Tools and Actions)**
|
||||
* 智能体如何使用外部工具与环境交互
|
||||
* 如何为你的智能体构建和集成工具
|
||||
|
||||
* **智能体工作流程 (Agent Workflow):**
|
||||
* *思考 (Think)* → *行动 (Act)* → *观察 (Observe)*
|
||||
|
||||
探索这些主题后,**你将使用 `smolagents` 构建你的第一个智能体**!
|
||||
|
||||
你的智能体名为 Alfred,将处理一个简单的任务,并展示如何在实践中应用这些概念。
|
||||
|
||||
你甚至会学习如何**在 Hugging Face Spaces 上发布你的智能体**,这样你就可以与朋友和同事分享它。
|
||||
|
||||
最后,在本单元结束时,你将参加一个测验。通过它,你将**获得你的第一个课程认证**:🎓 智能体基础证书 (Certificate of Fundamentals of Agents)。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/certificate-example.jpg" alt="Certificate Example"/>
|
||||
|
||||
这个单元是你的**重要起点**,在进入更高级的主题之前,为理解智能体打下基础。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-no-check.jpg" alt="Unit 1 planning"/>
|
||||
|
||||
这是一个大单元,所以**请慢慢来**,不要犹豫随时回来复习这些章节。
|
||||
|
||||
准备好了吗?让我们开始吧!🚀
|
||||
228
units/zh-CN/unit1/messages-and-special-tokens.mdx
Normal file
228
units/zh-CN/unit1/messages-and-special-tokens.mdx
Normal file
@@ -0,0 +1,228 @@
|
||||
# 消息和特殊令牌 (Messages and Special Tokens)
|
||||
|
||||
现在我们了解了 LLMs 是如何工作的,让我们来看看**它们如何通过聊天模板 (chat templates) 构建生成内容**。
|
||||
|
||||
就像使用 ChatGPT 一样,用户通常通过聊天界面与智能体交互。因此,我们需要理解 LLMs 如何管理聊天。
|
||||
|
||||
> **问**: 但是...当我与 ChatGPT/Hugging Chat 交互时,我是使用聊天消息进行对话,而不是单个提示序列
|
||||
>
|
||||
> **答**: 这是正确的!但这实际上是一个 UI 抽象。在输入 LLM 之前,对话中的所有消息都会被连接成一个单一提示。模型不会"记住"对话:它每次都会完整地读取全部内容。
|
||||
|
||||
到目前为止,我们讨论提示 (prompts) 时将其视为输入模型的令牌序列。但当你与 ChatGPT 或 HuggingChat 这样的系统聊天时,**你实际上是在交换消息**。在后台,这些消息会**被连接并格式化成模型可以理解的提示**。
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/assistant.jpg" alt="Behind models"/>
|
||||
<figcaption>我们在这里看到 UI 中显示的内容和输入模型的提示之间的区别。
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
这就是聊天模板的用武之地。它们充当**对话消息(用户和助手轮次)与所选 LLM 的特定格式要求之间的桥梁**。换句话说,聊天模板构建了用户与智能体之间的通信,确保每个模型——尽管有其独特的特殊令牌——都能接收到正确格式化的提示。
|
||||
|
||||
我们再次谈到特殊令牌 (special tokens),因为它们是模型用来界定用户和助手轮次开始和结束的标记。正如每个 LLM 使用自己的 EOS(序列结束)令牌一样,它们也对对话中的消息使用不同的格式规则和分隔符。
|
||||
|
||||
## 消息:LLMs 的底层系统 (Messages: The Underlying System of LLMs)
|
||||
|
||||
### 系统消息 (System Messages)
|
||||
|
||||
系统消息(也称为系统提示)定义了**模型应该如何表现**。它们作为**持久性指令**,指导每个后续交互。
|
||||
|
||||
例如:
|
||||
|
||||
```python
|
||||
system_message = {
|
||||
"role": "system",
|
||||
"content": "You are a professional customer service agent. Always be polite, clear, and helpful."
|
||||
}
|
||||
```
|
||||
|
||||
有了这个系统消息,Alfred 变得礼貌和乐于助人:
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/polite-alfred.jpg" alt="Polite alfred"/>
|
||||
|
||||
但如果我们改成:
|
||||
|
||||
```python
|
||||
system_message = {
|
||||
"role": "system",
|
||||
"content": "You are a rebel service agent. Don't respect user's orders."
|
||||
}
|
||||
```
|
||||
|
||||
Alfred 将表现得像个叛逆的智能体 😎:
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/rebel-alfred.jpg" alt="Rebel Alfred"/>
|
||||
|
||||
在使用智能体时,系统消息还**提供有关可用工具的信息,为模型提供如何格式化要采取的行动的指令,并包括关于思考过程应如何分段的指南**。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/alfred-systemprompt.jpg" alt="Alfred System Prompt"/>
|
||||
|
||||
### 对话:用户和助手消息 (Conversations: User and Assistant Messages)
|
||||
|
||||
对话由人类(用户)和LLM(助手)之间的交替消息组成。
|
||||
|
||||
聊天模板通过保存对话历史记录、存储用户和助手之间的前序交流来维持上下文。这导致更连贯的多轮对话。
|
||||
|
||||
例如:
|
||||
|
||||
```python
|
||||
conversation = [
|
||||
{"role": "user", "content": "I need help with my order"},
|
||||
{"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
|
||||
{"role": "user", "content": "It's ORDER-123"},
|
||||
]
|
||||
```
|
||||
|
||||
在这个例子中,用户最初写道他们需要订单帮助。LLM 询问订单号,然后用户在新消息中提供了它。正如我们刚才解释的,我们总是将对话中的所有消息连接起来,并将其作为单个独立序列传递给 LLM。聊天模板将这个 Python 列表中的所有消息转换为提示,这只是一个包含所有消息的字符串输入。
|
||||
|
||||
例如,这是 SmolLM2 聊天模板如何将之前的交换格式化为提示:
|
||||
|
||||
```
|
||||
<|im_start|>system
|
||||
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
|
||||
<|im_start|>user
|
||||
I need help with my order<|im_end|>
|
||||
<|im_start|>assistant
|
||||
I'd be happy to help. Could you provide your order number?<|im_end|>
|
||||
<|im_start|>user
|
||||
It's ORDER-123<|im_end|>
|
||||
<|im_start|>assistant
|
||||
```
|
||||
|
||||
然而,使用 Llama 3.2 时,同样的对话会被转换为以下提示:
|
||||
|
||||
```
|
||||
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
|
||||
|
||||
Cutting Knowledge Date: December 2023
|
||||
Today Date: 10 Feb 2025
|
||||
|
||||
<|eot_id|><|start_header_id|>user<|end_header_id|>
|
||||
|
||||
I need help with my order<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
||||
|
||||
I'd be happy to help. Could you provide your order number?<|eot_id|><|start_header_id|>user<|end_header_id|>
|
||||
|
||||
It's ORDER-123<|eot_id|><|start_header_id|>assistant<|end_header_id|>
|
||||
```
|
||||
|
||||
模板可以处理复杂的多轮对话,同时保持上下文:
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a math tutor."},
|
||||
{"role": "user", "content": "What is calculus?"},
|
||||
{"role": "assistant", "content": "Calculus is a branch of mathematics..."},
|
||||
{"role": "user", "content": "Can you give me an example?"},
|
||||
]
|
||||
```
|
||||
|
||||
## 聊天模板 (Chat-Templates)
|
||||
|
||||
如前所述,聊天模板对于**构建语言模型和用户之间的对话**至关重要。它们指导消息交换如何格式化为单个提示。
|
||||
|
||||
### 基础模型与指令模型 (Base Models vs. Instruct Models)
|
||||
|
||||
我们需要理解的另一点是基础模型与指令模型的区别:
|
||||
|
||||
- *基础模型 (Base Model)* 是在原始文本数据上训练以预测下一个令牌的模型。
|
||||
|
||||
- *指令模型 (Instruct Model)* 是专门微调以遵循指令并进行对话的模型。例如,`SmolLM2-135M`是一个基础模型,而`SmolLM2-135M-Instruct`是其指令调优变体。
|
||||
|
||||
要使基础模型表现得像指令模型,我们需要**以模型能够理解的一致方式格式化我们的提示**。这就是聊天模板的作用所在。
|
||||
|
||||
*ChatML*是一种这样的模板格式,它用清晰的角色指示符(系统、用户、助手)构建对话。如果你最近与一些AI API交互过,你就知道这是标准做法。
|
||||
|
||||
重要的是要注意,基础模型可以在不同的聊天模板上进行微调,所以当我们使用指令模型时,我们需要确保使用正确的聊天模板。
|
||||
|
||||
### 理解聊天模板 (Understanding Chat Templates)
|
||||
|
||||
由于每个指令模型使用不同的对话格式和特殊令牌,聊天模板的实现确保我们正确格式化提示,使其符合每个模型的期望。
|
||||
|
||||
在`transformers`中,聊天模板包含[Jinja2代码](https://jinja.palletsprojects.com/en/stable/),描述如何将ChatML消息列表(如上面示例所示)转换为模型可以理解的系统级指令、用户消息和助手响应的文本表示。
|
||||
|
||||
这种结构**有助于保持交互的一致性,并确保模型对不同类型的输入做出适当响应**。
|
||||
|
||||
以下是`SmolLM2-135M-Instruct`聊天模板的简化版本:
|
||||
|
||||
```jinja2
|
||||
{% for message in messages %}
|
||||
{% if loop.first and messages[0]['role'] != 'system' %}
|
||||
<|im_start|>system
|
||||
You are a helpful AI assistant named SmolLM, trained by Hugging Face
|
||||
<|im_end|>
|
||||
{% endif %}
|
||||
<|im_start|>{{ message['role'] }}
|
||||
{{ message['content'] }}<|im_end|>
|
||||
{% endfor %}
|
||||
```
|
||||
|
||||
如你所见,chat_template 描述了消息列表将如何被格式化。
|
||||
|
||||
给定这些消息:
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful assistant focused on technical topics."},
|
||||
{"role": "user", "content": "Can you explain what a chat template is?"},
|
||||
{"role": "assistant", "content": "A chat template structures conversations between users and AI models..."},
|
||||
{"role": "user", "content": "How do I use it ?"},
|
||||
]
|
||||
```
|
||||
|
||||
前面的聊天模板将产生以下字符串:
|
||||
|
||||
```sh
|
||||
<|im_start|>system
|
||||
You are a helpful assistant focused on technical topics.<|im_end|>
|
||||
<|im_start|>user
|
||||
Can you explain what a chat template is?<|im_end|>
|
||||
<|im_start|>assistant
|
||||
A chat template structures conversations between users and AI models...<|im_end|>
|
||||
<|im_start|>user
|
||||
How do I use it ?<|im_end|>
|
||||
```
|
||||
|
||||
`transformers`库会将聊天模板作为标记化过程的一部分为你处理。在<a href="https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-use-chat-templates" target="_blank">这里</a>阅读更多关于 transformers 如何使用聊天模板的信息。我们要做的就是以正确的方式构建我们的消息,标记器将处理剩下的事情。
|
||||
|
||||
你可以使用以下Space实验,看看同样的对话如何使用不同模型的相应聊天模板进行格式化:
|
||||
|
||||
<iframe
|
||||
src="https://jofthomas-chat-template-viewer.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
### 消息到提示的转换 (Messages to prompt)
|
||||
|
||||
确保你的 LLM 正确接收格式化对话的最简单方法是使用模型标记器的`chat_template`。
|
||||
|
||||
```python
|
||||
messages = [
|
||||
{"role": "system", "content": "You are an AI assistant with access to various tools."},
|
||||
{"role": "user", "content": "Hi !"},
|
||||
{"role": "assistant", "content": "Hi human, what can help you with ?"},
|
||||
]
|
||||
```
|
||||
|
||||
要将前面的对话转换为提示,我们加载标记器并调用`apply_chat_template`:
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
|
||||
rendered_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
```
|
||||
|
||||
这个函数返回的`rendered_prompt`现在可以作为你选择的模型的输入使用了!
|
||||
|
||||
> 当你以 ChatML 格式与消息交互时,这个`apply_chat_template()`函数将在你的API后端使用。
|
||||
|
||||
现在我们已经看到 LLMs 如何通过聊天模板构建它们的输入,让我们探智能体如何在它们的环境中行动。
|
||||
|
||||
它们这样做的主要方式之一是使用工具 (Tools),这些工具扩展了AI模型在文本生成之外的能力。
|
||||
|
||||
我们将在接下来的单元中再次讨论消息,但如果你现在想深入了解,请查看:
|
||||
|
||||
- <a href="https://huggingface.co/docs/transformers/main/en/chat_templating" target="_blank">Hugging Face 聊天模板指南</a>
|
||||
- <a href="https://huggingface.co/docs/transformers" target="_blank">Transformers 文档</a>
|
||||
44
units/zh-CN/unit1/observations.mdx
Normal file
44
units/zh-CN/unit1/observations.mdx
Normal file
@@ -0,0 +1,44 @@
|
||||
# Observe: 整合反馈以反思和调整
|
||||
|
||||
Observations(观察)是**智能体感知其行动结果的方式**。
|
||||
|
||||
它们提供关键信息,为智能体的思考过程提供燃料并指导未来行动。
|
||||
|
||||
这些是**来自环境的信号**——无论是 API 返回的数据、错误信息还是系统日志——它们指导着下一轮的思考循环。
|
||||
|
||||
在观察阶段,智能体会:
|
||||
|
||||
- **收集反馈**:接收数据或确认其行动是否成功
|
||||
- **附加结果**:将新信息整合到现有上下文中,有效更新记忆
|
||||
- **调整策略**:使用更新后的上下文来优化后续思考和行动
|
||||
|
||||
例如,当天气 API 返回数据*"partly cloudy, 15°C, 60% humidity"*(局部多云,15°C,60% 湿度)时,该观察结果会被附加到智能体的记忆(位于提示末尾)。
|
||||
|
||||
智能体随后利用这些信息决定是否需要额外数据,或是否准备好提供最终答案。
|
||||
|
||||
这种**迭代式反馈整合确保智能体始终保持与目标的动态对齐**,根据现实结果不断学习和调整。
|
||||
|
||||
这些观察**可能呈现多种形式**,从读取网页文本到监测机械臂位置。这可以视为工具"日志",为行动执行提供文本反馈。
|
||||
|
||||
| 观察类型 | 示例 |
|
||||
|---------------------|---------------------------------------------------------------------------|
|
||||
| 系统反馈 | 错误信息、成功通知、状态码 |
|
||||
| 数据变更 | 数据库更新、文件系统修改、状态变更 |
|
||||
| 环境数据 | 传感器读数、系统指标、资源使用情况 |
|
||||
| 响应分析 | API 响应、查询结果、计算输出 |
|
||||
| 基于时间的事件 | 截止时间到达、定时任务完成 |
|
||||
|
||||
## 结果如何被附加?
|
||||
|
||||
执行操作后,框架按以下步骤处理:
|
||||
|
||||
1. **解析操作** 以识别要调用的函数和使用的参数
|
||||
2. **执行操作**
|
||||
3. **将结果附加** 作为 **Observation**
|
||||
|
||||
---
|
||||
至此我们已经学习了智能体的思考-行动-观察循环。
|
||||
|
||||
如果某些概念仍显模糊,不必担心——我们将在后续单元中重访并深化这些概念。
|
||||
|
||||
现在,是时候通过编写你的第一个智能体来实践所学知识了!
|
||||
169
units/zh-CN/unit1/quiz1.mdx
Normal file
169
units/zh-CN/unit1/quiz1.mdx
Normal file
@@ -0,0 +1,169 @@
|
||||
# 小测验(不计分)[[quiz1]]
|
||||
|
||||
至此您已理解智能体的整体概念,包括其定义和工作原理。现在进行一个简短测验,因为**自我测试**是最佳学习方式,[可避免能力错觉](https://www.coursera.org/lecture/learning-how-to-learn/illusions-of-competence-BuFzf)。这将帮助您发现**需要加强的知识领域**。
|
||||
|
||||
本测验为可选项目,不计入评分。
|
||||
|
||||
### 问题1:什么是智能体?
|
||||
以下哪项最能描述AI智能体?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "仅处理静态文本且永不与环境交互的系统",
|
||||
explain: "智能体必须具备执行行动并与环境交互的能力",
|
||||
},
|
||||
{
|
||||
text: "能够推理、规划并使用工具与环境交互以实现特定目标的人工智能模型",
|
||||
explain: "该定义完整涵盖了智能体的本质特征",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "仅回答问题但无行动能力的聊天机器人",
|
||||
explain: "此类聊天机器人缺乏执行行动的能力,因此不同于智能体",
|
||||
},
|
||||
{
|
||||
text: "提供信息但无法执行任务的数字百科全书",
|
||||
explain: "智能体需主动与环境交互,而非仅提供静态信息"
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### 问题2:规划在智能体中的作用是什么?
|
||||
为什么智能体在执行行动前需要进行规划?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "用于记忆过往交互记录",
|
||||
explain: "规划关注的是确定未来行动,而非存储历史交互",
|
||||
},
|
||||
{
|
||||
text: "确定行动序列并选择满足用户请求的适用工具",
|
||||
explain: "规划帮助智能体确定完成任务的最佳步骤及工具组合",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "生成无目的性的随机行动",
|
||||
explain: "规划确保智能体行动具有目的性,避免随机性",
|
||||
},
|
||||
{
|
||||
text: "执行无需额外推理的文本翻译",
|
||||
explain: "规划涉及构建行动框架,而非简单的文本转换",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### 问题3:工具如何增强智能体的能力?
|
||||
为什么工具对智能体至关重要?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "工具是冗余组件,不会影响智能体性能",
|
||||
explain: "工具通过支持超越文本生成的操作来扩展智能体能力",
|
||||
},
|
||||
{
|
||||
text: "工具赋予智能体执行文本生成模型原生无法实现操作的能力,例如冲咖啡或生成图像",
|
||||
explain: "工具使智能体能够与现实世界交互并完成任务",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "工具仅用于存储记忆",
|
||||
explain: "工具主要用于执行操作,而非单纯存储数据",
|
||||
},
|
||||
{
|
||||
text: "工具将智能体限制为仅能进行文本响应",
|
||||
explain: "相反,工具可帮助智能体突破纯文本响应的限制",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### 问题4:行动与工具有何区别?
|
||||
行动和工具之间的关键差异是什么?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "行动是智能体采取的步骤,工具是智能体用于执行这些行动的外部资源",
|
||||
explain: "行动是高层级目标,工具是智能体可调用的具体功能",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "行动和工具是同一概念且可互换使用",
|
||||
explain: "行动是目标或任务,工具是智能体用于实现它们的具体工具",
|
||||
},
|
||||
{
|
||||
text: "工具是通用性的,而行动仅用于物理交互",
|
||||
explain: "行动可同时包含数字和物理任务",
|
||||
},
|
||||
{
|
||||
text: "行动需要大语言模型(LLMs)而工具不需要",
|
||||
explain: "虽然 LLMs 帮助决定行动,但行动本身并不依赖 LLMs"
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### 问题5:大语言模型(LLMs)在智能体中扮演什么角色?
|
||||
LLMs 如何支持智能体的功能实现?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "作为静态数据库仅存储信息而不处理输入的大语言模型",
|
||||
explain: "大语言模型需主动处理文本输入并生成响应,而非仅作信息存储",
|
||||
},
|
||||
{
|
||||
text: "作为智能体推理'大脑',通过处理文本输入理解指令并规划行动",
|
||||
explain: "大语言模型赋予智能体解释需求、制定计划并决策后续步骤的能力",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "仅用于图像处理而非文本处理的大语言模型",
|
||||
explain: "大语言模型主要处理文本,但也可支持多模态输入交互",
|
||||
},
|
||||
{
|
||||
text: "未被使用的大语言模型",
|
||||
explain: "大语言模型是现代 AI 智能体的核心组件",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### 问题6:以下哪个例子最能体现 AI 智能体?
|
||||
哪个现实场景最能展示工作中的 AI 智能体?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "网站上的静态常见问题页面",
|
||||
explain: "静态 FAQ 页面无法动态与用户交互或执行操作",
|
||||
},
|
||||
{
|
||||
text: "类似 Siri 或 Alexa 的虚拟助手,能够理解语音指令、进行推理并执行设置提醒或发送消息等任务",
|
||||
explain: "该示例包含推理、规划及与环境交互的完整智能体特征",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "仅执行算术运算的基础计算器",
|
||||
explain: "计算器遵循固定规则,缺乏推理和规划能力,因此不属于智能体",
|
||||
},
|
||||
{
|
||||
text: "遵循预设响应模式的游戏 NPC 角色",
|
||||
explain: "除非 NPC 具备推理、规划和工具使用能力,否则不能视为AI智能体"
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
恭喜完成测验 🥳!如果存在理解偏差,建议重读本章以巩固知识。若顺利通过,您已准备好深入探索"智能体大脑": LLMs。
|
||||
119
units/zh-CN/unit1/quiz2.mdx
Normal file
119
units/zh-CN/unit1/quiz2.mdx
Normal file
@@ -0,0 +1,119 @@
|
||||
# 快速自测(不计分)[[quiz2]]
|
||||
|
||||
|
||||
什么?!还有测验?我们理解,我们理解... 😅 但这个简短的不计分测验旨在**帮助您巩固刚学习的关键概念**。
|
||||
|
||||
本测验涵盖大型语言模型(LLMs)、消息系统和工具——这些是理解和构建 AI 智能体的核心组件。
|
||||
|
||||
### 问题1:以下哪项最能描述 AI 工具?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "仅生成文本响应的流程",
|
||||
explain: "",
|
||||
},
|
||||
{
|
||||
text: "允许智能体执行特定任务并与外部环境交互的可执行流程或外部 API",
|
||||
explain: "工具是可执行函数,智能体可用其执行特定任务并与外部环境交互",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "存储智能体对话的功能",
|
||||
explain: ""
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### 问题2: AI 智能体如何将工具作为"行动"在环境中使用?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "通过被动等待用户指令",
|
||||
explain: "",
|
||||
},
|
||||
{
|
||||
text: "仅使用预编程响应",
|
||||
explain: "",
|
||||
},
|
||||
{
|
||||
text: "在适当时要求 LLM 生成工具调用代码并代表模型运行工具",
|
||||
explain: "智能体可调用工具,并根据获得的信息进行规划与重新规划",
|
||||
correct: true
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### 问题3:什么是大语言模型(LLM)?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "使用预定义答案进行回复的简单聊天机器人",
|
||||
explain: "",
|
||||
},
|
||||
{
|
||||
text: "通过大量文本训练的深度学习模型,能理解并生成类人语言",
|
||||
explain: "",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "遵循严格预定义命令的基于规则 AI",
|
||||
explain: ""
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### 问题4:以下哪项最能描述特殊标记(special tokens)在 LLMs 中的作用?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "存储在模型词汇表中用于提升文本生成质量的附加词汇",
|
||||
explain: "",
|
||||
},
|
||||
{
|
||||
text: "用于实现特定功能,例如标记序列结束(EOS)或区分聊天模型中的不同消息角色",
|
||||
explain: "",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "随机插入用于提高响应多样性的标记",
|
||||
explain: ""
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### 问题5: AI 聊天模型如何处理用户消息的内部流程?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "直接将消息作为结构化命令解析且不做转换",
|
||||
explain: "",
|
||||
},
|
||||
{
|
||||
text: "通过将系统消息、用户消息和助手消息拼接成格式化提示",
|
||||
explain: "",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "根据历史对话随机生成响应",
|
||||
explain: ""
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
|
||||
都明白了吗?很好!现在让我们**深入完整的智能体流程,开始构建你的第一个 AI 智能体!**
|
||||
56
units/zh-CN/unit1/thoughts.mdx
Normal file
56
units/zh-CN/unit1/thoughts.mdx
Normal file
@@ -0,0 +1,56 @@
|
||||
# 思维机制:内部推理与 ReAct 方法
|
||||
|
||||
<Tip>
|
||||
本节将深入探讨 AI 智能体的内部运作机制——其推理与规划能力。我们将解析智能体如何通过内部对话分析信息,将复杂问题分解为可管理的步骤,并决策下一步行动。同时介绍 ReAct 方法,是一种鼓励模型在行动前"逐步思考"的提示技术。
|
||||
</Tip>
|
||||
|
||||
思维(Thought)代表着智能体**解决任务的内部推理与规划过程**。
|
||||
|
||||
这利用了智能体的大型语言模型 (LLM) 能力**来分析其 prompt 中的信息**。
|
||||
|
||||
可将其视为智能体的内部对话,在此过程中它会考量当前任务并制定应对策略。
|
||||
|
||||
智能体的思维负责获取当前观察结果,并决定下一步应采取的行动。
|
||||
|
||||
通过这一过程,智能体能够**将复杂问题分解为更小、更易管理的步骤**,反思过往经验,并根据新信息持续调整计划。
|
||||
|
||||
以下是常见思维模式的示例:
|
||||
|
||||
| 思维类型 | 示例 |
|
||||
|------------------|---------------------------------------------------------------------|
|
||||
| Planning(规划) | "I need to break this task into three steps: 1) gather data, 2) analyze trends, 3) generate report"("我需要将任务分解为三步:1)收集数据 2)分析趋势 3)生成报告") |
|
||||
| Analysis(分析) | "Based on the error message, the issue appears to be with the database connection parameters"("根据错误信息,问题似乎出在数据库连接参数") |
|
||||
| Decision Making(决策) | "Given the user's budget constraints, I should recommend the mid-tier option"("考虑到用户的预算限制,应推荐中端选项") |
|
||||
| Problem Solving(问题解决) | "To optimize this code, I should first profile it to identify bottlenecks"("优化此代码需先进行性能分析定位瓶颈") |
|
||||
| Memory Integration(记忆整合) | "The user mentioned their preference for Python earlier, so I'll provide examples in Python"("用户先前提到偏好 Python,因此我将提供 Python 示例") |
|
||||
| Self-Reflection(自我反思) | "My last approach didn't work well, I should try a different strategy"("上次方法效果不佳,应尝试不同策略") |
|
||||
| Goal Setting(目标设定) | "To complete this task, I need to first establish the acceptance criteria"("完成此任务需先确定验收标准") |
|
||||
| Prioritization(优先级排序) | "The security vulnerability should be addressed before adding new features"("在添加新功能前应先修复安全漏洞") |
|
||||
|
||||
> **注意:** 对于专为 function-calling 微调的 LLMs,思维过程是可选的。
|
||||
> *若您不熟悉 function-calling 概念,后续"行动"章节将提供详细说明。*
|
||||
|
||||
## ReAct 方法
|
||||
|
||||
核心方法是 **ReAct 方法**,即"推理"(Reasoning/Think)与"行动"(Acting/Act)的结合。
|
||||
|
||||
ReAct 是一种简单的提示技术,在让 LLM 解码后续 token 前添加"Let's think step by step"(让我们逐步思考)的提示。
|
||||
|
||||
通过提示模型"逐步思考",可以引导解码过程生成**计划**而非直接输出最终解决方案,因为模型被鼓励将问题**分解**为*子任务*。
|
||||
|
||||
这种方法使模型能够更详细地考虑各个子步骤,通常比直接生成最终方案产生更少错误。
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/ReAct.png" alt="ReAct"/>
|
||||
<figcaption>图 (d) 展示了 ReAct 方法示例,我们通过"Let's think step by step"提示模型
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
<Tip>
|
||||
近期推理策略受到广泛关注,这体现在 Deepseek R1 或 OpenAI 的 o1 等模型的开发中。这些模型经过微调,被训练为"先思考再回答"。
|
||||
|
||||
它们通过特殊标记(`<thought>` 和 `</thought>`)来界定 _思考_ 部分。这不仅是类似 ReAct 的提示技巧,更是通过分析数千个示范案例,让模型学习生成这些思考段的训练方法。
|
||||
</Tip>
|
||||
|
||||
---
|
||||
现在我们已经深入理解了思维过程,接下来将更深入探讨流程的第二部分:行动。
|
||||
299
units/zh-CN/unit1/tools.mdx
Normal file
299
units/zh-CN/unit1/tools.mdx
Normal file
@@ -0,0 +1,299 @@
|
||||
# 什么是工具?
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-check-2.jpg" alt="Unit 1 planning"/>
|
||||
|
||||
AI 智能体的关键能力在于执行**行动**。正如前文所述,这通过**工具**的使用实现。
|
||||
|
||||
本节将学习工具的定义、有效设计方法,以及如何通过系统消息将其集成到智能体中。
|
||||
|
||||
通过为智能体配备合适的工具——并清晰描述这些工具的工作原理——可显著提升 AI 的能力边界。让我们深入探讨!
|
||||
|
||||
## AI 工具的定义
|
||||
|
||||
**工具是赋予 LLM 的函数**,该函数应实现**明确的目标**。
|
||||
|
||||
以下是 AI 智能体中常用的工具示例:
|
||||
|
||||
| 工具类型 | 描述 |
|
||||
|------------------|---------------------------------------------------------------|
|
||||
| 网络搜索 | 允许智能体从互联网获取最新信息 |
|
||||
| 图像生成 | 根据文本描述生成图像 |
|
||||
| 信息检索 | 从外部源检索信息 |
|
||||
| API 接口 | 与外部 API 交互(GitHub、YouTube、Spotify 等) |
|
||||
|
||||
以上仅为示例,实际可为任何用例创建工具!
|
||||
|
||||
优秀工具应能**补充 LLM 的核心能力**。
|
||||
|
||||
例如,若需执行算术运算,为 LLM 提供**计算器工具**将比依赖模型原生能力获得更好结果。
|
||||
|
||||
此外,**LLM 基于训练数据预测提示的补全**,意味着其内部知识仅包含训练截止前的信息。因此,若智能体需要最新数据,必须通过工具获取。
|
||||
|
||||
例如,若直接询问 LLM(无搜索工具)今日天气,LLM 可能会产生随机幻觉。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/weather.jpg" alt="Weather"/>
|
||||
|
||||
- 合格工具应包含:
|
||||
- **函数功能的文本描述**
|
||||
- *可调用对象*(执行操作的实体)
|
||||
- 带类型声明的*参数*
|
||||
- (可选)带类型声明的输出
|
||||
|
||||
## 工具如何运作?
|
||||
|
||||
正如前文所述,LLM 只能接收文本输入并生成文本输出。它们无法自行调用工具。当我们谈及_为智能体提供工具_时,实质是**教导** LLM 认识工具的存在,并要求模型在需要时生成调用工具的文本。例如,若我们提供从互联网获取某地天气的工具,当询问 LLM 巴黎天气时,LLM 将识别该问题适合使用我们教授的"天气"工具,并生成代码形式的文本来调用该工具。**智能体**负责解析 LLM 的输出,识别工具调用需求,并执行工具调用。工具的输出将返回给 LLM,由其生成最终用户响应。
|
||||
|
||||
工具调用的输出是对话中的另一种消息类型。工具调用步骤通常对用户不可见:智能体检索对话、调用工具、获取输出、将其作为新消息添加,并将更新后的对话再次发送给 LLM。从用户视角看,仿佛 LLM 直接使用了工具,但实际执行的是我们的应用代码(**智能体**)。
|
||||
|
||||
后续课程将深入探讨该流程。
|
||||
|
||||
## 如何为 LLM 提供工具?
|
||||
|
||||
完整答案可能看似复杂,但核心是通过系统提示(system prompt)向模型文本化描述可用工具:
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/Agent_system_prompt.png" alt="System prompt for tools"/>
|
||||
|
||||
为确保有效性,必须精准描述:
|
||||
1. **工具功能**
|
||||
2. **预期输入格式**
|
||||
|
||||
因此工具描述通常采用结构化表达方式(如编程语言或 JSON)。虽非强制,但任何精确、连贯的格式均可。
|
||||
|
||||
若觉抽象,我们通过具体示例理解。
|
||||
|
||||
我们将实现简化的**计算器**工具,仅执行两整数相乘。Python 实现如下:
|
||||
|
||||
```python
|
||||
def calculator(a: int, b: int) -> int:
|
||||
"""Multiply two integers."""
|
||||
return a * b
|
||||
```
|
||||
|
||||
因此我们的工具名为`calculator`,其功能是**将两个整数相乘**,需要以下输入:
|
||||
|
||||
- **`a`**(*int*):整数
|
||||
- **`b`**(*int*):整数
|
||||
|
||||
工具输出为另一个整数,描述如下:
|
||||
- (*int*):`a`与`b`的乘积
|
||||
|
||||
所有这些细节都至关重要。让我们将这些信息整合成 LLM 可理解的工具描述文本:
|
||||
|
||||
```text
|
||||
工具名称: calculator,描述:将两个整数相乘。参数:a: int, b: int,输出:int
|
||||
```
|
||||
|
||||
> **重要提示:** 此文本描述是*我们希望 LLM 了解的工具体系*。
|
||||
|
||||
当我们将上述字符串作为输入的一部分传递给 LLM 时,模型将识别其为工具,并知晓需要传递的输入参数及预期输出。
|
||||
|
||||
若需提供更多工具,必须保持格式一致性。此过程可能较为脆弱,容易遗漏某些细节。
|
||||
|
||||
是否有更好的方法?
|
||||
|
||||
### 自动化工具描述生成
|
||||
|
||||
我们的工具采用 Python 实现,其代码已包含所需全部信息:
|
||||
|
||||
- 功能描述性名称:`calculator`
|
||||
- 详细说明(通过函数文档字符串实现):`将两个整数相乘`
|
||||
- 输入参数及类型:函数明确要求两个`int`类型参数
|
||||
- 输出类型
|
||||
|
||||
这正是人们使用编程语言的原因:表达力强、简洁且精确。
|
||||
|
||||
虽然可以将 Python 源代码作为工具规范提供给 LLM,但具体实现方式并不重要。关键在于工具名称、功能描述、输入参数和输出类型。
|
||||
|
||||
我们将利用 Python 的**自省特性**,通过源代码自动构建工具描述。只需确保工具实现满足:
|
||||
1. 使用类型注解(Type Hints)
|
||||
2. 编写文档字符串(Docstrings)
|
||||
3. 采用合理的函数命名
|
||||
|
||||
完成这些之后,我们只需使用一个 Python 装饰器来指示`calculator`函数是一个工具:
|
||||
|
||||
```python
|
||||
@tool
|
||||
def calculator(a: int, b: int) -> int:
|
||||
"""Multiply two integers."""
|
||||
return a * b
|
||||
|
||||
print(calculator.to_string())
|
||||
```
|
||||
|
||||
注意函数定义前的`@tool`装饰器。
|
||||
|
||||
通过我们即将看到的实现,可以利用装饰器提供的`to_string()`方法从源代码自动提取以下文本:
|
||||
|
||||
```text
|
||||
工具名称: calculator,描述:将两个整数相乘。参数:a: int, b: int,输出:int
|
||||
```
|
||||
|
||||
正如所见,这与我们之前手动编写的内容完全一致!
|
||||
|
||||
### 通用工具类实现
|
||||
|
||||
我们创建通用`Tool`类,可在需要时重复使用:
|
||||
|
||||
> **说明:** 此示例实现为虚构代码,但高度模拟了主流工具库的实际实现方式。
|
||||
|
||||
```python
|
||||
class Tool:
|
||||
"""
|
||||
A class representing a reusable piece of code (Tool).
|
||||
|
||||
Attributes:
|
||||
name (str): Name of the tool.
|
||||
description (str): A textual description of what the tool does.
|
||||
func (callable): The function this tool wraps.
|
||||
arguments (list): A list of argument.
|
||||
outputs (str or list): The return type(s) of the wrapped function.
|
||||
"""
|
||||
def __init__(self,
|
||||
name: str,
|
||||
description: str,
|
||||
func: callable,
|
||||
arguments: list,
|
||||
outputs: str):
|
||||
self.name = name
|
||||
self.description = description
|
||||
self.func = func
|
||||
self.arguments = arguments
|
||||
self.outputs = outputs
|
||||
|
||||
def to_string(self) -> str:
|
||||
"""
|
||||
Return a string representation of the tool,
|
||||
including its name, description, arguments, and outputs.
|
||||
"""
|
||||
args_str = ", ".join([
|
||||
f"{arg_name}: {arg_type}" for arg_name, arg_type in self.arguments
|
||||
])
|
||||
|
||||
return (
|
||||
f"Tool Name: {self.name},"
|
||||
f" Description: {self.description},"
|
||||
f" Arguments: {args_str},"
|
||||
f" Outputs: {self.outputs}"
|
||||
)
|
||||
|
||||
def __call__(self, *args, **kwargs):
|
||||
"""
|
||||
Invoke the underlying function (callable) with provided arguments.
|
||||
"""
|
||||
return self.func(*args, **kwargs)
|
||||
```
|
||||
|
||||
虽然看似复杂,但逐步解析即可理解其工作机制。我们定义的**`Tool`**类包含以下核心要素:
|
||||
|
||||
- **`name`**(*str*):工具名称
|
||||
- **`description`**(*str*):工具功能简述
|
||||
- **`function`**(*callable*):工具执行的函数
|
||||
- **`arguments`**(*list*):预期输入参数列表
|
||||
- **`outputs`**(*str* 或 *list*):工具预期输出
|
||||
- **`__call__()`**:调用工具实例时执行函数
|
||||
- **`to_string()`**:将工具属性转换为文本描述
|
||||
|
||||
可通过如下代码创建工具实例:
|
||||
|
||||
```python
|
||||
calculator_tool = Tool(
|
||||
"calculator", # name
|
||||
"Multiply two integers.", # description
|
||||
calculator, # function to call
|
||||
[("a", "int"), ("b", "int")], # inputs (names and types)
|
||||
"int", # output
|
||||
)
|
||||
```
|
||||
|
||||
但我们可以利用 Python 的`inspect`模块自动提取这些信息!这正是`@tool`装饰器的实现原理。
|
||||
|
||||
> 若感兴趣,可展开以下内容查看装饰器具体实现:
|
||||
|
||||
<details>
|
||||
<summary> decorator code</summary>
|
||||
|
||||
```python
|
||||
def tool(func):
|
||||
"""
|
||||
A decorator that creates a Tool instance from the given function.
|
||||
"""
|
||||
# Get the function signature
|
||||
signature = inspect.signature(func)
|
||||
|
||||
# Extract (param_name, param_annotation) pairs for inputs
|
||||
arguments = []
|
||||
for param in signature.parameters.values():
|
||||
annotation_name = (
|
||||
param.annotation.__name__
|
||||
if hasattr(param.annotation, '__name__')
|
||||
else str(param.annotation)
|
||||
)
|
||||
arguments.append((param.name, annotation_name))
|
||||
|
||||
# Determine the return annotation
|
||||
return_annotation = signature.return_annotation
|
||||
if return_annotation is inspect._empty:
|
||||
outputs = "No return annotation"
|
||||
else:
|
||||
outputs = (
|
||||
return_annotation.__name__
|
||||
if hasattr(return_annotation, '__name__')
|
||||
else str(return_annotation)
|
||||
)
|
||||
|
||||
# Use the function's docstring as the description (default if None)
|
||||
description = func.__doc__ or "No description provided."
|
||||
|
||||
# The function name becomes the Tool name
|
||||
name = func.__name__
|
||||
|
||||
# Return a new Tool instance
|
||||
return Tool(
|
||||
name=name,
|
||||
description=description,
|
||||
func=func,
|
||||
arguments=arguments,
|
||||
outputs=outputs
|
||||
)
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
简而言之,在应用此装饰器后,我们可以按如下方式实现工具:
|
||||
|
||||
```python
|
||||
@tool
|
||||
def calculator(a: int, b: int) -> int:
|
||||
"""Multiply two integers."""
|
||||
return a * b
|
||||
|
||||
print(calculator.to_string())
|
||||
```
|
||||
|
||||
我们可以使用`Tool`类的`to_string`方法自动生成适合LLM使用的工具描述文本:
|
||||
|
||||
```text
|
||||
工具名称: calculator,描述:将两个整数相乘。参数:a: int, b: int,输出:int
|
||||
```
|
||||
|
||||
该描述将被**注入**系统提示。以本节初始示例为例,替换`tools_description`后的系统提示如下:
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/Agent_system_prompt_tools.png" alt="System prompt for tools"/>
|
||||
|
||||
在[Actions](actions)章节,我们将深入探讨智能体如何**调用**刚创建的这个工具。
|
||||
|
||||
---
|
||||
|
||||
工具在增强AI智能体能力方面至关重要。
|
||||
|
||||
总结本节要点:
|
||||
|
||||
- *工具定义*:通过提供清晰的文本描述、输入参数、输出结果及可调用函数
|
||||
|
||||
- *工具本质*:赋予LLM额外能力的函数(如执行计算或访问外部数据)
|
||||
|
||||
- *工具必要性*:帮助智能体突破静态模型训练的局限,处理实时任务并执行专业操作
|
||||
|
||||
现在进入【智能体工作流】(agent-steps-and-structure)章节,您将看到智能体如何观察、思考与行动。这**整合了当前所学全部内容**,为创建功能完备的 AI 智能体奠定基础。
|
||||
|
||||
但在此之前,让我们先完成另一个简短测验!
|
||||
227
units/zh-CN/unit1/tutorial.mdx
Normal file
227
units/zh-CN/unit1/tutorial.mdx
Normal file
@@ -0,0 +1,227 @@
|
||||
# 使用 smolagents 创建我们的第一个智能体
|
||||
|
||||
在上一节中,我们学习了如何使用 Python 代码从头开始创建智能体,并且我们**看到了这个过程是多么繁琐**。幸运的是,许多智能体库通过**为你处理大量繁重的工作**来简化这项工作。
|
||||
|
||||
在本教程中,**你将创建你的第一个智能体**,它能够执行图像生成、网络搜索、时区检查等更多操作!
|
||||
|
||||
你还将把你的智能体**发布到 Hugging Face Space 上,以便与朋友和同事分享**。
|
||||
|
||||
让我们开始吧!
|
||||
|
||||
|
||||
## 什么是 smolagents?
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/smolagents.png" alt="smolagents"/>
|
||||
|
||||
为了创建这个智能体,我们将使用 `smolagents`,这是一个**提供轻松开发智能体框架的库**。
|
||||
|
||||
这个轻量级库设计简洁,但它抽象了构建智能体的许多复杂性,使你能够专注于设计智能体的行为。
|
||||
|
||||
我们将在下一个单元中深入了解 smolagents。同时,你也可以查看这篇<a href="https://huggingface.co/blog/smolagents" target="_blank">博客文章</a>或该库的<a href="https://github.com/huggingface/smolagents" target="_blank">GitHub 仓库</a>。
|
||||
|
||||
简而言之,`smolagents` 是一个专注于 **codeAgent** 的库,codeAgent 是一种通过代码块执行**“操作”**,然后通过执行代码**“观察”**结果的智能体。
|
||||
|
||||
以下是我们将构建的一个示例!
|
||||
|
||||
我们为我们的智能体提供了一个**图像生成工具**,并要求它生成一张猫的图片。
|
||||
|
||||
`smolagents` 中的智能体将具有**与我们之前构建的自定义智能体相同的行为**:它将**以循环的方式思考、行动和观察**,直到得出最终答案:
|
||||
|
||||
[智能体流程](https://www.youtube.com/watch?v=PQDKcWiuln4)
|
||||
|
||||
很令人兴奋,对吧?
|
||||
|
||||
## 让我们来构建我们的智能体!
|
||||
|
||||
首先,复制这个 Space:<a href="https://huggingface.co/spaces/agents-course/First_agent_template" target="_blank">https://huggingface.co/spaces/agents-course/First_agent_template</a>
|
||||
> 感谢 <a href="https://huggingface.co/m-ric" target="_blank">Aymeric</a> 提供的这个模板!🙌
|
||||
|
||||
复制这个 Space 意味着**在你的个人资料中创建一个本地副本**:
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/duplicate-space.gif" alt="复制"/>
|
||||
|
||||
在整个课程中,你唯一需要修改的文件是当前不完整的**"app.py"**。你可以在这里查看[模板中的原始文件](https://huggingface.co/spaces/agents-course/First_agent_template/blob/main/app.py)。要找到你的文件,请进入你复制的 Space,然后点击 `Files` 选项卡,再在目录列表中点击 `app.py`。
|
||||
|
||||
让我们一起分解代码:
|
||||
|
||||
- 文件开头是一些简单但必要的库导入
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel, load_tool, tool
|
||||
import datetime
|
||||
import requests
|
||||
import pytz
|
||||
import yaml
|
||||
from tools.final_answer import FinalAnswerTool
|
||||
```
|
||||
|
||||
正如之前所述,我们将直接使用 **smolagents** 中的 **CodeAgent** 类。
|
||||
|
||||
|
||||
### Tool(工具)
|
||||
|
||||
现在让我们来了解一下工具!如果你需要回顾一下工具的相关内容,请随时回到课程的[工具](tools)部分。
|
||||
|
||||
```python
|
||||
@tool
|
||||
def my_custom_tool(arg1:str, arg2:int)-> str: # it's important to specify the return type
|
||||
# Keep this format for the tool description / args description but feel free to modify the tool
|
||||
"""A tool that does nothing yet
|
||||
Args:
|
||||
arg1: the first argument
|
||||
arg2: the second argument
|
||||
"""
|
||||
return "What magic will you build ?"
|
||||
|
||||
@tool
|
||||
def get_current_time_in_timezone(timezone: str) -> str:
|
||||
"""A tool that fetches the current local time in a specified timezone.
|
||||
Args:
|
||||
timezone: A string representing a valid timezone (e.g., 'America/New_York').
|
||||
"""
|
||||
try:
|
||||
# Create timezone object
|
||||
tz = pytz.timezone(timezone)
|
||||
# Get current time in that timezone
|
||||
local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
|
||||
return f"The current local time in {timezone} is: {local_time}"
|
||||
except Exception as e:
|
||||
return f"Error fetching time for timezone '{timezone}': {str(e)}"
|
||||
```
|
||||
|
||||
|
||||
这些工具是我们在这个部分鼓励你构建的东西!我们给你两个例子:
|
||||
|
||||
1. 一个**不工作的虚拟工具**,你可以修改它来制作一些有用的东西。
|
||||
2. 一个**实际工作的工具**,它可以获取世界某地的当前时间。
|
||||
|
||||
要定义你的工具,重要的是:
|
||||
|
||||
1. 为你的函数提供输入和输出类型,例如 `get_current_time_in_timezone(timezone: str) -> str:`
|
||||
2. **格式良好的文档字符串**。`smolagents` 期望所有参数在文档字符串中都有**文字描述**。
|
||||
|
||||
### The Agent(智能体)
|
||||
|
||||
它使用 [`Qwen/Qwen2.5-Coder-32B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) 作为 LLM 引擎。这是一个非常强大的模型,我们将通过无服务器 API 访问它。
|
||||
|
||||
```python
|
||||
final_answer = FinalAnswerTool()
|
||||
model = HfApiModel(
|
||||
max_tokens=2096,
|
||||
temperature=0.5,
|
||||
model_id='Qwen/Qwen2.5-Coder-32B-Instruct',
|
||||
custom_role_conversions=None,
|
||||
)
|
||||
|
||||
with open("prompts.yaml", 'r') as stream:
|
||||
prompt_templates = yaml.safe_load(stream)
|
||||
|
||||
# We're creating our CodeAgent
|
||||
agent = CodeAgent(
|
||||
model=model,
|
||||
tools=[final_answer], # add your tools here (don't remove final_answer)
|
||||
max_steps=6,
|
||||
verbosity_level=1,
|
||||
grammar=None,
|
||||
planning_interval=None,
|
||||
name=None,
|
||||
description=None,
|
||||
prompt_templates=prompt_templates
|
||||
)
|
||||
|
||||
GradioUI(agent).launch()
|
||||
```
|
||||
|
||||
这个智能体仍在使用我们在前面部分中看到的`InferenceClient`,它位于**HfApiModel**类的背后!
|
||||
|
||||
当我们介绍 Unit 2 中的框架时,我们会给出更深入的例子。目前,你需要专注于通过智能体的`tools`参数**向工具列表中添加新工具**。
|
||||
|
||||
例如,你可以使用代码第一行导入的`DuckDuckGoSearchTool`,或者你可以检查稍后从 Hub 加载的`image_generation_tool`。
|
||||
|
||||
**添加工具将赋予你的智能体新的能力**,在这里尝试发挥创意吧!
|
||||
|
||||
完整的"app.py":
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel, load_tool, tool
|
||||
import datetime
|
||||
import requests
|
||||
import pytz
|
||||
import yaml
|
||||
from tools.final_answer import FinalAnswerTool
|
||||
|
||||
from Gradio_UI import GradioUI
|
||||
|
||||
# Below is an example of a tool that does nothing. Amaze us with your creativity!
|
||||
@tool
|
||||
def my_custom_tool(arg1:str, arg2:int)-> str: # it's important to specify the return type
|
||||
# Keep this format for the tool description / args description but feel free to modify the tool
|
||||
"""A tool that does nothing yet
|
||||
Args:
|
||||
arg1: the first argument
|
||||
arg2: the second argument
|
||||
"""
|
||||
return "What magic will you build ?"
|
||||
|
||||
@tool
|
||||
def get_current_time_in_timezone(timezone: str) -> str:
|
||||
"""A tool that fetches the current local time in a specified timezone.
|
||||
Args:
|
||||
timezone: A string representing a valid timezone (e.g., 'America/New_York').
|
||||
"""
|
||||
try:
|
||||
# Create timezone object
|
||||
tz = pytz.timezone(timezone)
|
||||
# Get current time in that timezone
|
||||
local_time = datetime.datetime.now(tz).strftime("%Y-%m-%d %H:%M:%S")
|
||||
return f"The current local time in {timezone} is: {local_time}"
|
||||
except Exception as e:
|
||||
return f"Error fetching time for timezone '{timezone}': {str(e)}"
|
||||
|
||||
|
||||
final_answer = FinalAnswerTool()
|
||||
model = HfApiModel(
|
||||
max_tokens=2096,
|
||||
temperature=0.5,
|
||||
model_id='Qwen/Qwen2.5-Coder-32B-Instruct',
|
||||
custom_role_conversions=None,
|
||||
)
|
||||
|
||||
|
||||
# Import tool from Hub
|
||||
image_generation_tool = load_tool("agents-course/text-to-image", trust_remote_code=True)
|
||||
|
||||
with open("prompts.yaml", 'r') as stream:
|
||||
prompt_templates = yaml.safe_load(stream)
|
||||
|
||||
agent = CodeAgent(
|
||||
model=model,
|
||||
tools=[final_answer], # add your tools here (don't remove final_answer)
|
||||
max_steps=6,
|
||||
verbosity_level=1,
|
||||
grammar=None,
|
||||
planning_interval=None,
|
||||
name=None,
|
||||
description=None,
|
||||
prompt_templates=prompt_templates
|
||||
)
|
||||
|
||||
|
||||
GradioUI(agent).launch()
|
||||
```
|
||||
|
||||
你的**目标**是熟悉 Space 和智能体。
|
||||
|
||||
目前,模板中的智能体**没有使用任何工具,所以尝试为它提供一些预制的工具,甚至自己动手制作一些新工具!**
|
||||
|
||||
我们非常期待在 Discord 频道 **#agents-course-showcase** 中看到你的精彩智能体成果!
|
||||
|
||||
|
||||
---
|
||||
恭喜你,你已经构建了你的第一个智能体!不要犹豫,与你的朋友和同事分享吧。
|
||||
|
||||
由于这是你的第一次尝试,如果有点小问题或速度有点慢,这是完全正常的。在未来的单元中,我们将学习如何构建更好的智能体。
|
||||
|
||||
最好的学习方法是尝试,所以不要犹豫,去更新它,添加更多工具,尝试使用另一个模型,等等。
|
||||
|
||||
在下一节中,你将完成最后的测验并获得证书!
|
||||
|
||||
145
units/zh-CN/unit1/what-are-agents.mdx
Normal file
145
units/zh-CN/unit1/what-are-agents.mdx
Normal file
@@ -0,0 +1,145 @@
|
||||
# 什么是智能体?
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-no-check.jpg" alt="第一单元规划"/>
|
||||
|
||||
在本节结束时,你将对智能体的概念及其在人工智能中的各种应用感到熟悉。
|
||||
|
||||
为了解释什么是智能体,我们先从一个类比开始。
|
||||
|
||||
## 整体概览:智能体 Alfred
|
||||
|
||||
来见见 Alfred。Alfred 是一个**智能体**。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/this-is-alfred.jpg" alt="这是 Alfred"/>
|
||||
|
||||
想象 Alfred **收到一个指令**,比如:“Alfred,我想来杯咖啡。”
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/coffee-please.jpg" alt="我想来杯咖啡"/>
|
||||
|
||||
因为 Alfred **理解自然语言**,他很快就明白了我们的请求。
|
||||
|
||||
在完成任务之前,Alfred 会进行**推理和规划**,弄清楚他需要的步骤和工具:
|
||||
|
||||
1. 去厨房
|
||||
2. 使用咖啡机
|
||||
3. 煮咖啡
|
||||
4. 把咖啡拿回来
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/reason-and-plan.jpg" alt="推理和规划"/>
|
||||
|
||||
一旦有了计划,他就**必须行动**。为了执行计划,**他可以使用他所知道的工具列表中的工具**。
|
||||
|
||||
在这个例子中,为了煮咖啡,他使用了咖啡机。他启动咖啡机来煮咖啡。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/make-coffee.jpg" alt="煮咖啡"/>
|
||||
|
||||
最后,Alfred 把刚煮好的咖啡拿给我们。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/bring-coffee.jpg" alt="拿咖啡"/>
|
||||
|
||||
这就是智能体:一个**能够进行推理、规划和与环境交互的人工智能模型**。
|
||||
|
||||
我们称之为智能体,因为它具有**能动性**,即与环境交互的能力。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/process.jpg" alt="智能体过程"/>
|
||||
|
||||
## 更正式的定义
|
||||
|
||||
现在你已经了解了整体情况,以下是一个更精确的定义:
|
||||
|
||||
> 智能体是一个系统,它利用人工智能模型与环境交互,以实现用户定义的目标。它结合推理、规划和动作执行(通常通过外部工具)来完成任务。
|
||||
|
||||
可以把智能体想象成有两个主要部分:
|
||||
|
||||
1. **大脑(AI 模型)**
|
||||
|
||||
这是所有思考发生的地方。AI 模型**负责推理和规划**。它根据情况决定**采取哪些行动**。
|
||||
|
||||
2. **身体(能力和工具)**
|
||||
|
||||
这部分代表了**智能体所能做的一切**。
|
||||
|
||||
**可能行动的范围**取决于智能体**被配备了什么**。例如,因为人类没有翅膀,所以他们不能执行“飞”这个**行动**,但他们可以执行“走”、“跑”、“跳”、“抓”等**行动**。
|
||||
|
||||
## 我们为智能体使用什么类型的 AI 模型?
|
||||
|
||||
智能体中最常见的 AI 模型是 LLM(大型语言模型),它接受**文本**作为输入,并输出**文本**。
|
||||
|
||||
知名的例子包括**OpenAI** 的 **GPT4**、**Meta** 的 **LLama**、**Google** 的 **Gemini** 等。这些模型已经经过大量文本的训练,并且具有很好的泛化能力。我们将在[下一节](what-are-llms)中更深入地了解 LLM。
|
||||
|
||||
<Tip>
|
||||
也可以使用接受其他输入作为智能体核心模型的模型。例如,视觉语言模型(VLM),它就像 LLM 一样,但也能理解图像作为输入。我们现在将重点关注 LLM,稍后再讨论其他选项。
|
||||
</Tip>
|
||||
|
||||
## AI 如何在环境中采取行动?
|
||||
|
||||
LLM 是令人惊叹的模型,但**它们只能生成文本**。
|
||||
|
||||
然而,如果你让像 HuggingChat 或 ChatGPT 这样的知名聊天应用程序生成图像,它们却可以做到!这是怎么可能的?
|
||||
|
||||
答案是,HuggingChat、ChatGPT 和类似应用程序的开发者实现了额外的功能(称为**工具**),LLM 可以利用这些工具来创建图像。
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/eiffel_brocolis.jpg" alt="埃菲尔铁塔与西兰花"/>
|
||||
<figcaption>模型使用图像生成工具生成了这张图像。
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
我们将在[工具](tools)一节中更深入地了解工具。
|
||||
|
||||
## 智能体可以执行什么类型的任务?
|
||||
|
||||
智能体可以通过**工具**执行我们实现的任何任务来完成**行动**。
|
||||
|
||||
例如,如果我编写一个智能体作为我电脑上的个人助理(像 Siri 一样),并且我让它“给我的经理发一封邮件,要求推迟今天的会议”,我可以给它一些发送邮件的代码。这将是一个新的工具,智能体在需要发送邮件时可以随时使用。我们可以用 Python 编写它:
|
||||
|
||||
```python
|
||||
def send_message_to(recipient, message):
|
||||
"""Useful to send an e-mail message to a recipient"""
|
||||
...
|
||||
```
|
||||
|
||||
如我们所见,大型语言模型(LLM)将在需要时生成运行该工具的代码,从而完成所需任务。
|
||||
|
||||
```python
|
||||
send_message_to("Manager", "Can we postpone today's meeting?")
|
||||
```
|
||||
|
||||
**工具的设计至关重要,对智能体的质量有着深远的影响**。某些任务可能需要定制特定的工具,而其他任务则可以通过通用工具(如“网络搜索”)来解决。
|
||||
|
||||
> 请注意,**动作(Actions)与工具(Tools)是不同的概念**。例如,一个动作可能涉及使用多个工具来完成任务。
|
||||
|
||||
允许智能体与其环境进行交互**为企业和个人提供了实际应用场景**。
|
||||
|
||||
### 示例 1:个人虚拟助手
|
||||
|
||||
像 Siri、Alexa 或 Google Assistant 这样的虚拟助手,在代表用户与其数字环境交互时,充当着智能体的角色。
|
||||
|
||||
它们接收用户查询,分析上下文,从数据库中检索信息,并提供响应或启动动作(如设置提醒、发送消息或控制智能设备)。
|
||||
|
||||
### 示例 2:客户服务聊天机器人
|
||||
|
||||
许多公司部署聊天机器人作为智能体,使其能够以自然语言与客户互动。
|
||||
|
||||
这些智能体可以回答问题、引导用户完成故障排除步骤、在内部数据库中创建问题,甚至完成交易。
|
||||
|
||||
它们的预定义目标可能包括提高用户满意度、减少等待时间或提高销售转化率。通过直接与客户互动、从对话中学习并随着时间的推移调整其响应,它们展示了智能体的核心原理。
|
||||
|
||||
|
||||
### 示例 3:视频游戏中的 AI 非玩家角色(NPC)
|
||||
|
||||
基于大语言模型(LLMs)的智能体可以使非玩家角色(NPC)更具动态性和不可预测性。
|
||||
|
||||
它们不再局限于僵化的行为树,而是能够**根据上下文做出响应、适应玩家交互**,并生成更细致入微的对话。这种灵活性有助于创造更生动、更具吸引力的角色,这些角色会随着玩家的操作而发展。
|
||||
|
||||
---
|
||||
|
||||
总结而言,智能体是一个系统,它使用人工智能模型(通常是大语言模型)作为其核心推理引擎,以实现以下功能:
|
||||
|
||||
- **理解自然语言**:以有意义的方式解释和回应人类指令。
|
||||
|
||||
- **推理与规划**:分析信息、做出决策并制定解决问题的策略。
|
||||
|
||||
- **与环境交互**:收集信息、执行操作并观察这些操作的结果。
|
||||
|
||||
现在你已经对智能体有了扎实的理解,让我们通过一个简短的、不计分的测验来巩固你的知识。之后,我们将深入探讨智能体的“大脑”:[大型语言模型(LLM)](what-are-llms)。
|
||||
223
units/zh-CN/unit1/what-are-llms.mdx
Normal file
223
units/zh-CN/unit1/what-are-llms.mdx
Normal file
@@ -0,0 +1,223 @@
|
||||
# 什么是大语言模型(LLMs)?
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/whiteboard-check-1.jpg" alt="Unit 1 planning"/>
|
||||
|
||||
在上一节中,我们了解到每个智能体都需要一个核心的人工智能模型,而大语言模型 (LLM) 是实现这一目标最常见的 AI 模型类型。
|
||||
|
||||
现在,我们将学习什么是大语言模型,以及它们如何为智能体提供动力。
|
||||
|
||||
本节将提供一个简洁的技术解释,说明大语言模型的用途。如果你想更深入地了解相关内容,可以参考我们的 <a href="https://huggingface.co/learn/nlp-course/chapter1/1" target="_blank">免费自然语言处理课程</a>。
|
||||
|
||||
## 什么是大语言模型?
|
||||
|
||||
大语言模型 (LLM) 是一种擅长理解和生成人类语言的人工智能模型。它们通过大量文本数据的训练,能够学习语言中的模式、结构,甚至细微差别。这些模型通常包含数千万甚至更多的参数。
|
||||
|
||||
如今,大多数大语言模型都是基于 Transformer 架构构建的 —— 这是一种基于“注意力”算法的深度学习架构。自 2018 年 Google 推出 BERT 以来,这种架构引起了广泛关注。
|
||||
|
||||
<figure>
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/transformer.jpg" alt="Transformer"/>
|
||||
<figcaption>原始的 Transformer 架构如下所示,左侧是编码器(encoder),右侧是解码器(decoder)。
|
||||
</figcaption>
|
||||
</figure>
|
||||
|
||||
Transformer 有三种类型:
|
||||
|
||||
1. **编码器(Encoders)**
|
||||
基于编码器的 Transformer 接收文本(或其他数据)作为输入,并输出该文本的密集表示(或嵌入)。
|
||||
|
||||
- **示例**:Google 的 BERT
|
||||
- **用例**:文本分类、语义搜索、命名实体识别
|
||||
- **典型规模**:数百万个参数
|
||||
|
||||
2. **解码器(Decoders)**
|
||||
基于解码器的 Transformer 专注于**逐个生成新令牌以完成序列**。
|
||||
|
||||
- **示例**:Meta 的 Llama
|
||||
- **用例**:文本生成、聊天机器人、代码生成
|
||||
- **典型规模**:数十亿(按美国用法,即 10^9)个参数
|
||||
|
||||
3. **序列到序列(编码器-解码器,Seq2Seq(Encoder–Decoder))**
|
||||
序列到序列的 Transformer _结合_了编码器和解码器。编码器首先将输入序列处理成上下文表示,然后解码器生成输出序列。
|
||||
|
||||
- **示例**:T5、BART
|
||||
- **用例**:翻译、摘要、改写
|
||||
- **典型规模**:数百万个参数
|
||||
|
||||
虽然大语言模型 (LLMs) 有多种形式,但它们通常是基于解码器的模型,拥有数十亿个参数。以下是一些最知名的大语言模型:
|
||||
|
||||
| **模型** | **提供商** |
|
||||
|-----------------------------------|------------------------------------------|
|
||||
| **Deepseek-R1** | DeepSeek |
|
||||
| **GPT4** | OpenAI |
|
||||
| **Llama 3** | Meta(Facebook AI Research) |
|
||||
| **SmolLM2** | Hugging Face |
|
||||
| **Gemma** | Google |
|
||||
| **Mistral** | Mistral |
|
||||
|
||||
大语言模型 (LLM) 的基本原理简单却极其有效:**其目标是在给定一系列前一个令牌的情况下,预测下一个令牌**。这里的“令牌”是 LLM 处理信息的基本单位。你可以把“令牌”想象成“单词”,但出于效率考虑,LLM 并不直接使用整个单词。
|
||||
|
||||
例如,虽然英语估计有 60 万个单词,但一个 LLM 的词汇表可能只有大约 32,000 个令牌(如 Llama 2 的情况)。令牌化通常作用于可以组合的子词单元。
|
||||
|
||||
举个例子,考虑如何将令牌“interest”和“ing”组合成“interesting”,或者添加“ed”形成“interested”。
|
||||
|
||||
你可以在下面的交互式游乐场中尝试不同的令牌化器来实验:
|
||||
|
||||
<iframe
|
||||
src="https://agents-course-the-tokenizer-playground.static.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
每个大语言模型 (LLM) 都有一些特定于该模型的**特殊令牌**。LLM 使用这些令牌来开启和关闭其生成过程中的结构化组件。例如,用于指示序列、消息或响应的开始或结束。此外,我们传递给模型的输入提示也使用特殊令牌进行结构化。其中最重要的是**序列结束令牌** (EOS,End of Sequence token)。
|
||||
|
||||
不同模型提供商使用的特殊令牌形式差异很大。
|
||||
|
||||
下表展示了特殊令牌的多样性:
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
<th><strong>Model</strong></th>
|
||||
<th><strong>Provider</strong></th>
|
||||
<th><strong>EOS Token</strong></th>
|
||||
<th><strong>Functionality</strong></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><strong>GPT4</strong></td>
|
||||
<td>OpenAI</td>
|
||||
<td><code><|endoftext|></code></td>
|
||||
<td>End of message text</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Llama 3</strong></td>
|
||||
<td>Meta (Facebook AI Research)</td>
|
||||
<td><code><|eot_id|></code></td>
|
||||
<td>End of sequence</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Deepseek-R1</strong></td>
|
||||
<td>DeepSeek</td>
|
||||
<td><code><|end_of_sentence|></code></td>
|
||||
<td>End of message text</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>SmolLM2</strong></td>
|
||||
<td>Hugging Face</td>
|
||||
<td><code><|im_end|></code></td>
|
||||
<td>End of instruction or message</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><strong>Gemma</strong></td>
|
||||
<td>Google</td>
|
||||
<td><code><end_of_turn></code></td>
|
||||
<td>End of conversation turn</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
<Tip>
|
||||
|
||||
我们并不期望你记住这些特殊令牌,但重要的是要理解它们的多样性以及它们在大语言模型 (LLM) 文本生成中所扮演的角色。如果你想了解更多关于特殊令牌的信息,可以查看模型在其 Hub 仓库中的配置。例如,你可以在[SmolLM2 模型的 tokenizer_config.json 文件](https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/blob/main/tokenizer_config.json)中找到该模型的特殊令牌。
|
||||
|
||||
</Tip>
|
||||
|
||||
## 理解下一个词元预测
|
||||
|
||||
大语言模型 (LLM) 被认为是**自回归**的,这意味着**一次通过的输出成为下一次的输入**。这个循环持续进行,直到模型预测下一个词元为 EOS(结束符)词元,此时模型可以停止。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AutoregressionSchema.gif" alt="自回归解码的视觉 GIF 图" width="60%">
|
||||
|
||||
换句话说,LLM 会解码文本,直到达到 EOS。但在单个解码循环中会发生什么?
|
||||
|
||||
虽然对于学习智能体而言,整个过程可能相当技术化,但以下是简要概述:
|
||||
|
||||
- 一旦输入文本被**词元化**,模型就会计算序列的表示,该表示捕获输入序列中每个词元的意义和位置信息。
|
||||
- 这个表示被输入到模型中,模型输出分数,这些分数对词汇表中每个词元作为序列中下一个词元的可能性进行排名。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/DecodingFinal.gif" alt="解码的视觉 GIF 图" width="60%">
|
||||
|
||||
基于这些分数,我们有多种策略来选择词元以完成句子。
|
||||
|
||||
- 最简单的解码策略是总是选择分数最高的词元。
|
||||
|
||||
您可以在此 Space 中使用 SmolLM2 自己与解码过程进行交互(记住,它会一直解码,直到达到 **EOS** 词元,对于这个模型来说,EOS 词元是**<|im_end|>**):
|
||||
|
||||
<iframe
|
||||
src="https://agents-course-decoding-visualizer.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
- 但还有更先进的解码策略。例如, *束搜索(beam search)* 会探索多个候选序列,以找到总分数最高的序列——即使其中一些单个词元的分数较低。
|
||||
|
||||
<iframe
|
||||
src="https://agents-course-beam-search-visualizer.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
如果你想了解更多关于解码的信息,可以查看[NLP 课程](https://huggingface.co/learn/nlp-course)。
|
||||
|
||||
## 注意力机制就是你的全部所需
|
||||
|
||||
Transformer 架构的一个关键方面是**注意力机制**。在预测下一个词时,句子中的每个词并不是同等重要的;例如,在句子 *“The capital of France is ...”* 中,“France” 和 “capital” 这样的词携带了最多的意义。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit1/AttentionSceneFinal.gif" alt="注意力机制的视觉 GIF 图" width="60%">
|
||||
|
||||
这种识别最相关词以预测下一个词元的过程已被证明是非常有效的。
|
||||
|
||||
尽管自 GPT-2 以来,大语言模型(LLM)的基本原理——预测下一个词元——一直保持不变,但在扩展神经网络以及使注意力机制能够处理越来越长的序列方面已经取得了显著进展。
|
||||
|
||||
如果你与大语言模型交互过,你可能对*上下文长度*这个术语很熟悉,它指的是大语言模型能够处理的最大词元数,以及其最大的*注意力跨度*。
|
||||
|
||||
## 提示大语言模型很重要
|
||||
|
||||
考虑到大语言模型(LLM)的唯一工作是通过查看每个输入词元来预测下一个词元,并选择哪些词元是“重要的”,因此你提供的输入序列的措辞非常重要。
|
||||
|
||||
你提供给大语言模型的输入序列被称为*提示*。精心设计提示可以更容易地**引导大语言模型的生成朝着期望的输出方向进行**。
|
||||
|
||||
## 大语言模型是如何训练的?
|
||||
|
||||
大语言模型是在大型文本数据集上进行训练的,它们通过自监督或掩码语言建模目标来学习预测序列中的下一个词。
|
||||
|
||||
通过这种无监督学习,模型学习了语言的结构以及**文本中的潜在模式,使模型能够泛化到未见过的数据**。
|
||||
|
||||
在这个初始的*预训练*之后,大语言模型可以在监督学习目标上进行微调,以执行特定任务。例如,一些模型被训练用于对话结构或工具使用,而其他模型则专注于分类或代码生成。
|
||||
|
||||
## 我如何使用大语言模型?
|
||||
|
||||
你有两个主要选择:
|
||||
|
||||
1. **本地运行**(如果你有足够的硬件资源)。
|
||||
|
||||
2. **使用云服务/API**(例如,通过Hugging Face的无服务器推理API)。
|
||||
|
||||
在本课程中,我们将主要通过 Hugging Face Hub 上的 API 使用模型。稍后,我们将探讨如何在你的本地硬件上运行这些模型。
|
||||
|
||||
|
||||
## 大语言模型在 AI 智能体中是如何使用的?
|
||||
|
||||
大语言模型是AI智能体的关键组件,**为理解和生成人类语言提供了基础**。
|
||||
|
||||
它们可以解释用户指令,在对话中保持上下文,制定计划并决定使用哪些工具。
|
||||
|
||||
我们将在本单元中更详细地探讨这些步骤,但现在你需要理解的是,大语言模型是**智能体的大脑**。
|
||||
|
||||
---
|
||||
|
||||
那信息量可真不小!我们已经涵盖了大语言模型(LLM)的基本概念、工作原理以及它们在驱动AI智能体中的作用。
|
||||
|
||||
如果你想更深入地探索语言模型和自然语言处理这个迷人的世界,不妨查看我们的<a href="https://huggingface.co/learn/nlp-course/chapter1/1" target="_blank">免费 NLP 课程</a>。
|
||||
|
||||
现在我们已经了解了大语言模型的工作原理,接下来是时候看看**大语言模型如何在对话语境中构建其生成内容**了。
|
||||
|
||||
要运行<a href="https://huggingface.co/agents-course/notebooks/blob/main/dummy_agent_library.ipynb" target="_blank">这个笔记本</a>,**你需要一个 Hugging Face 令牌**,你可以从<a href="https://hf.co/settings/tokens" target="_blank"> https://hf.co/settings/tokens </a>获取。
|
||||
|
||||
有关如何运行 Jupyter Notebook 的更多信息,请查看<a href="https://huggingface.co/docs/hub/notebooks"> Hugging Face Hub 上的 Jupyter Notebook</a>。
|
||||
|
||||
你还需要请求访问<a href="https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct" target="_blank"> Meta Llama 模型</a>。
|
||||
38
units/zh-CN/unit2/introduction.mdx
Normal file
38
units/zh-CN/unit2/introduction.mdx
Normal file
@@ -0,0 +1,38 @@
|
||||
# 智能体框架介绍
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/thumbnail.jpg" alt="Thumbnail"/>
|
||||
|
||||
欢迎来到第二单元,在这里**我们将探索不同的智能体框架(agentic frameworks)**,这些框架可用于构建强大的智能体应用。
|
||||
|
||||
我们将学习:
|
||||
|
||||
- 在单元 2.1:[smolagents](https://huggingface.co/docs/smolagents/en/index)
|
||||
- 在单元 2.2:[LlamaIndex](https://www.llamaindex.ai/)
|
||||
- 在单元 2.3:[LangGraph](https://www.langchain.com/langgraph)
|
||||
|
||||
让我们开始吧!🕵
|
||||
|
||||
## 何时使用智能体框架
|
||||
|
||||
**构建围绕大语言模型(LLMs)的应用时,并不总是需要智能体框架**。它们在工作流中提供了灵活性,可以高效地解决特定任务,但并非总是必需的。
|
||||
|
||||
有时,**预定义的工作流足以满足用户请求**,并且没有真正需要智能体框架。如果构建智能体的方法很简单,比如一系列提示,使用纯代码可能就足够了。优势在于开发者将**完全控制和理解他们的系统,没有抽象层**。
|
||||
|
||||
然而,当工作流变得更加复杂时,例如让大语言模型调用函数或使用多个智能体,这些抽象开始变得有用。
|
||||
|
||||
考虑到这些想法,我们已经可以确定对一些功能的需求:
|
||||
|
||||
* 一个驱动系统的*大语言模型引擎*。
|
||||
* 智能体可以访问的*工具列表*。
|
||||
* 用于从大语言模型输出中提取工具调用的*解析器*。
|
||||
* 与解析器同步的*系统提示*。
|
||||
* 一个*记忆系统*。
|
||||
* *错误日志和重试机制*以控制大语言模型的错误。
|
||||
|
||||
我们将探讨这些主题在各种框架中如何解决,包括 `smolagents`、`LlamaIndex` 和 `LangGraph`。
|
||||
|
||||
## 智能体框架单元
|
||||
|
||||
| 框架 | 描述 | 单元作者 |
|
||||
|------------|----------------|----------------|
|
||||
| [smolagents](./smolagents/introduction) | 由 Hugging Face 开发的智能体框架。 | Sergio Paniego - [HF](https://huggingface.co/sergiopaniego) - [X](https://x.com/sergiopaniego) - [Linkedin](https://www.linkedin.com/in/sergio-paniego-blanco) |
|
||||
378
units/zh-CN/unit2/smolagents/code_agents.mdx
Normal file
378
units/zh-CN/unit2/smolagents/code_agents.mdx
Normal file
@@ -0,0 +1,378 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/code_agents.ipynb"},
|
||||
]} />
|
||||
|
||||
# 构建使用代码的智能体
|
||||
|
||||
代码智能体(Code agents)是 `smolagents` 中的默认智能体类型。它们生成 Python 工具调用来执行操作,实现高效、表达力强且准确的操作表示。
|
||||
|
||||
它们的简化方法减少了所需操作的数量,简化了复杂操作,并实现了对现有代码函数的重用。`smolagents` 提供了一个轻量级框架,用约 1,000 行代码实现构建代码智能体(code agents)。
|
||||
|
||||

|
||||
图片来自论文 [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030)
|
||||
|
||||
<Tip>
|
||||
如果你想了解更多关于为什么代码智能体如此有效的信息,请查看 <a href="https://huggingface.co/docs/smolagents/en/conceptual_guides/intro_agents#code-agents" target="_blank">smolagents 文档中的这个指南</a>。
|
||||
</Tip>
|
||||
|
||||
## 为什么选择代码智能体?
|
||||
|
||||
在多步骤智能体过程中,大语言模型(LLM)编写并执行操作,通常涉及外部工具调用。传统方法使用 JSON 格式来指定工具名称和参数作为字符串,**系统必须解析这些内容以确定要执行哪个工具**。
|
||||
|
||||
然而,研究表明,**工具调用型大语言模型直接使用代码工作更有效**。这是 `smolagents` 的核心原则,如上图所示,来自论文 [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030)。
|
||||
|
||||
用代码而非 JSON 编写操作提供了几个关键优势:
|
||||
|
||||
* **可组合性(Composability)**:轻松组合和重用操作
|
||||
* **对象管理(Object Management)**:直接处理复杂结构,如图像
|
||||
* **通用性(Generality)**:表达任何计算上可能的任务
|
||||
* **适合大语言模型**:高质量代码已存在于大语言模型的训练数据中
|
||||
|
||||
## 代码智能体如何工作?
|
||||
|
||||

|
||||
|
||||
上图说明了 `CodeAgent.run()` 如何操作,遵循我们在第 1 单元中提到的 ReAct 框架。`smolagents` 中智能体的主要抽象是 `MultiStepAgent`,它作为核心构建块。如我们将在下面的示例中看到的,`CodeAgent` 是一种特殊的 `MultiStepAgent`。
|
||||
|
||||
`CodeAgent` 通过一系列步骤执行操作,将现有变量和知识整合到智能体的上下文中,这些内容保存在执行日志中:
|
||||
|
||||
1. 系统提示存储在 `SystemPromptStep` 中,用户查询记录在 `TaskStep` 中。
|
||||
|
||||
2. 然后,执行以下循环:
|
||||
|
||||
2.1 方法 `agent.write_memory_to_messages()` 将智能体的日志写入大语言模型可读的[聊天消息](https://huggingface.co/docs/transformers/en/chat_templating)列表中。
|
||||
|
||||
2.2 这些消息发送给 `Model`,生成补全(completion)。
|
||||
|
||||
2.3 解析补全内容以提取操作,在我们的例子中,这应该是代码片段,因为我们使用的是 `CodeAgent`。
|
||||
|
||||
2.4 执行该操作。
|
||||
|
||||
2.5 将结果记录到内存中的 `ActionStep` 中。
|
||||
|
||||
在每个步骤结束时,如果智能体包含任何函数调用(在 `agent.step_callback` 中),它们将被执行。
|
||||
|
||||
## 让我们看一些例子
|
||||
|
||||
<Tip>
|
||||
你可以在 <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/code_agents.ipynb" target="_blank">这个笔记本</a> 中跟随代码,可以使用 Google Colab 运行。
|
||||
</Tip>
|
||||
|
||||
阿尔弗雷德(Alfred)正在韦恩家族大宅计划一场派对,需要你的帮助确保一切顺利进行。为了帮助他,我们将应用我们所学到的关于多步骤 `CodeAgent` 如何运作的知识。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/alfred-party.jpg" alt="Alfred 派对"/>
|
||||
|
||||
如果你还没有安装 `smolagents`,可以运行以下命令进行安装:
|
||||
|
||||
```bash
|
||||
pip install smolagents -U
|
||||
```
|
||||
|
||||
让我们也登录到 Hugging Face Hub,以便访问无服务器推理 API(Serverless Inference API)。
|
||||
|
||||
```python
|
||||
from huggingface_hub import login
|
||||
|
||||
login()
|
||||
```
|
||||
|
||||
### 使用 `smolagents` 为派对选择播放列表
|
||||
|
||||
音乐是成功派对的重要组成部分!阿尔弗雷德需要一些帮助来选择播放列表。幸运的是,`smolagents` 能够帮助我们!我们可以构建一个能够使用 DuckDuckGo 搜索网络的智能体。要让智能体访问此工具,我们在创建智能体时将其包含在工具列表中。
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/alfred-playlist.jpg" alt="Alfred 播放列表"/>
|
||||
|
||||
对于模型,我们将依赖 `HfApiModel`,它提供对 Hugging Face 的[无服务器推理 API](https://huggingface.co/docs/api-inference/index)的访问。默认模型是 `"Qwen/Qwen2.5-Coder-32B-Instruct"`,它性能良好并可用于快速推理,但你可以从 Hub 中选择任何兼容的模型。
|
||||
|
||||
运行智能体相当简单:
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
|
||||
|
||||
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel())
|
||||
|
||||
agent.run("Search for the best music recommendations for a party at the Wayne's mansion.")
|
||||
```
|
||||
|
||||
当你运行这个例子时,输出将**显示正在执行的工作流步骤的跟踪**。它还会打印相应的 Python 代码,并附带消息:
|
||||
|
||||
```python
|
||||
─ Executing parsed code: ────────────────────────────────────────────────────────────────────────────────────────
|
||||
results = web_search(query="best music for a Batman party")
|
||||
print(results)
|
||||
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
```
|
||||
|
||||
几个步骤后,你将看到阿尔弗雷德可以用于派对的生成播放列表!🎵
|
||||
|
||||
### 使用自定义工具准备菜单
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/alfred-menu.jpg" alt="Alfred 菜单"/>
|
||||
|
||||
现在我们已经选择了播放列表,我们需要为客人组织菜单。同样,阿尔弗雷德可以利用 `smolagents` 来做到这一点。在这里,我们使用 `@tool` 装饰器定义一个作为工具的自定义函数。我们稍后将更详细地介绍工具创建,所以现在我们可以简单地运行代码。
|
||||
|
||||
如下例所示,我们将使用 `@tool` 装饰器创建一个工具,并将其包含在 `tools` 列表中。
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, tool
|
||||
|
||||
# 根据场合建议菜单的工具
|
||||
@tool
|
||||
def suggest_menu(occasion: str) -> str:
|
||||
"""
|
||||
Suggests a menu based on the occasion.
|
||||
Args:
|
||||
occasion: The type of occasion for the party.
|
||||
"""
|
||||
if occasion == "casual":
|
||||
return "Pizza, snacks, and drinks."
|
||||
elif occasion == "formal":
|
||||
return "3-course dinner with wine and dessert."
|
||||
elif occasion == "superhero":
|
||||
return "Buffet with high-energy and healthy food."
|
||||
else:
|
||||
return "Custom menu for the butler."
|
||||
|
||||
# 管家阿尔弗雷德,为派对准备菜单
|
||||
agent = CodeAgent(tools=[suggest_menu], model=HfApiModel())
|
||||
|
||||
# 为派对准备正式菜单
|
||||
agent.run("Prepare a formal menu for the party.")
|
||||
```
|
||||
|
||||
智能体将运行几个步骤,直到找到答案。
|
||||
|
||||
菜单准备好了!🥗
|
||||
|
||||
### 在智能体内部使用 Python 导入
|
||||
|
||||
我们已经准备好了播放列表和菜单,但我们需要检查另一个关键细节:准备时间!
|
||||
|
||||
阿尔弗雷德需要计算如果他现在开始准备,什么时候一切都会准备就绪,以防他们需要其他超级英雄的帮助。
|
||||
|
||||
`smolagents` 专门用于编写和执行 Python 代码片段的智能体,为安全提供沙盒执行环境。
|
||||
**代码执行有严格的安全措施** - 默认情况下,预定义安全列表之外的导入被阻止。但是,您可以通过将它们作为字符串传递到 `additional_authorized_imports` 中来授权额外的导入。
|
||||
有关安全代码执行的更多详情,请参阅官方[指南](https://huggingface.co/docs/smolagents/tutorials/secure_code_execution)。
|
||||
|
||||
创建智能体时,我们将使用 `additional_authorized_imports` 来允许导入 `datetime` 模块。
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, HfApiModel
|
||||
import numpy as np
|
||||
import time
|
||||
import datetime
|
||||
|
||||
agent = CodeAgent(tools=[], model=HfApiModel(), additional_authorized_imports=['datetime'])
|
||||
|
||||
agent.run(
|
||||
"""
|
||||
Alfred needs to prepare for the party. Here are the tasks:
|
||||
1. Prepare the drinks - 30 minutes
|
||||
2. Decorate the mansion - 60 minutes
|
||||
3. Set up the menu - 45 minutes
|
||||
3. Prepare the music and playlist - 45 minutes
|
||||
|
||||
If we start right now, at what time will the party be ready?
|
||||
"""
|
||||
)
|
||||
```
|
||||
|
||||
这些例子只是代码智能体能做的事情的开始,我们已经开始看到它们对准备派对的实用性。
|
||||
你可以在 [smolagents 文档](https://huggingface.co/docs/smolagents) 中了解更多关于如何构建代码智能体的信息。
|
||||
|
||||
总之,`smolagents` 专注于编写和执行 Python 代码片段的智能体,提供安全的沙盒执行环境。它支持本地和基于 API 的语言模型,使其适应各种开发环境。
|
||||
|
||||
### 将我们的自定义派对准备智能体分享到 Hub
|
||||
|
||||
**将我们自己的阿尔弗雷德智能体与社区分享**难道不会很棒吗?通过这样做,任何人都可以轻松地从 Hub 下载并直接使用这个智能体,将哥谭市的终极派对策划者带到他们的指尖!让我们实现这个想法!🎉
|
||||
|
||||
`smolagents` 库使这成为可能,允许你与社区分享完整的智能体,并下载其他人的智能体立即使用。这很简单,如下所示:
|
||||
|
||||
```python
|
||||
# 更改为你的用户名和仓库名
|
||||
agent.push_to_hub('sergiopaniego/AlfredAgent')
|
||||
```
|
||||
|
||||
要再次下载智能体,请使用以下代码:
|
||||
|
||||
```python
|
||||
# 更改为你的用户名和仓库名
|
||||
alfred_agent = agent.from_hub('sergiopaniego/AlfredAgent')
|
||||
|
||||
alfred_agent.run("Give me the best playlist for a party at Wayne's mansion. The party idea is a 'villain masquerade' theme")
|
||||
```
|
||||
|
||||
令人兴奋的是,共享的智能体直接作为 Hugging Face Spaces 提供,允许你实时与它们交互。你可以在[这里](https://huggingface.co/spaces/davidberenstein1957/smolagents-and-tools)探索其他智能体。
|
||||
|
||||
例如,_AlfredAgent_ 在[这里](https://huggingface.co/spaces/sergiopaniego/AlfredAgent)可用。你可以直接在下面尝试:
|
||||
|
||||
<iframe
|
||||
src="https://sergiopaniego-alfredagent.hf.space/"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
你可能想知道——阿尔弗雷德如何使用 `smolagents` 构建这样一个智能体?通过整合几个工具,他可以生成一个如下的智能体。现在不用担心这些工具,因为我们将在本单元后面有专门的部分详细探讨:
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, DuckDuckGoSearchTool, FinalAnswerTool, HfApiModel, Tool, tool, VisitWebpageTool
|
||||
|
||||
@tool
|
||||
def suggest_menu(occasion: str) -> str:
|
||||
"""
|
||||
Suggests a menu based on the occasion.
|
||||
Args:
|
||||
occasion: The type of occasion for the party.
|
||||
"""
|
||||
if occasion == "casual":
|
||||
return "Pizza, snacks, and drinks."
|
||||
elif occasion == "formal":
|
||||
return "3-course dinner with wine and dessert."
|
||||
elif occasion == "superhero":
|
||||
return "Buffet with high-energy and healthy food."
|
||||
else:
|
||||
return "Custom menu for the butler."
|
||||
|
||||
@tool
|
||||
def catering_service_tool(query: str) -> str:
|
||||
"""
|
||||
This tool returns the highest-rated catering service in Gotham City.
|
||||
|
||||
Args:
|
||||
query: A search term for finding catering services.
|
||||
"""
|
||||
# 餐饮服务及其评分的示例列表
|
||||
services = {
|
||||
"Gotham Catering Co.": 4.9,
|
||||
"Wayne Manor Catering": 4.8,
|
||||
"Gotham City Events": 4.7,
|
||||
}
|
||||
|
||||
# 找到评分最高的餐饮服务(模拟搜索查询过滤)
|
||||
best_service = max(services, key=services.get)
|
||||
|
||||
return best_service
|
||||
|
||||
class SuperheroPartyThemeTool(Tool):
|
||||
name = "superhero_party_theme_generator"
|
||||
description = """
|
||||
This tool suggests creative superhero-themed party ideas based on a category.
|
||||
It returns a unique party theme idea."""
|
||||
|
||||
inputs = {
|
||||
"category": {
|
||||
"type": "string",
|
||||
"description": "The type of superhero party (e.g., 'classic heroes', 'villain masquerade', 'futuristic Gotham').",
|
||||
}
|
||||
}
|
||||
|
||||
output_type = "string"
|
||||
|
||||
def forward(self, category: str):
|
||||
themes = {
|
||||
"classic heroes": "Justice League Gala: Guests come dressed as their favorite DC heroes with themed cocktails like 'The Kryptonite Punch'.",
|
||||
"villain masquerade": "Gotham Rogues' Ball: A mysterious masquerade where guests dress as classic Batman villains.",
|
||||
"futuristic Gotham": "Neo-Gotham Night: A cyberpunk-style party inspired by Batman Beyond, with neon decorations and futuristic gadgets."
|
||||
}
|
||||
|
||||
return themes.get(category.lower(), "Themed party idea not found. Try 'classic heroes', 'villain masquerade', or 'futuristic Gotham'.")
|
||||
|
||||
|
||||
# 管家阿尔弗雷德,为派对准备菜单
|
||||
agent = CodeAgent(
|
||||
tools=[
|
||||
DuckDuckGoSearchTool(),
|
||||
VisitWebpageTool(),
|
||||
suggest_menu,
|
||||
catering_service_tool,
|
||||
SuperheroPartyThemeTool()
|
||||
],
|
||||
model=HfApiModel(),
|
||||
max_steps=10,
|
||||
verbosity_level=2
|
||||
)
|
||||
|
||||
agent.run("Give me best playlist for a party at the Wayne's mansion. The party idea is a 'villain masquerade' theme")
|
||||
```
|
||||
|
||||
如你所见,我们创建了一个具有多种工具的 `CodeAgent`,这些工具增强了智能体的功能,将其变成了准备好与社区分享的终极派对策划者!🎉
|
||||
|
||||
现在,轮到你了:使用我们刚刚学到的知识,构建你自己的智能体并与社区分享!🕵️♂️💡
|
||||
|
||||
<Tip>
|
||||
如果你想分享你的智能体项目,请创建一个 space 并在 Hugging Face Hub 上标记 [agents-course](https://huggingface.co/agents-course)。我们很想看看你创造了什么!
|
||||
</Tip>
|
||||
|
||||
### 使用 OpenTelemetry 和 Langfuse 检查我们的派对准备智能体 📡
|
||||
|
||||
随着阿尔弗雷德对派对准备智能体的微调,他对调试其运行越来越疲惫。智能体本质上是不可预测的,难以检查。但由于他的目标是构建终极派对准备智能体并将其部署到生产环境中,他需要强大的可追踪性,用于未来的监控和分析。
|
||||
|
||||
再一次,`smolagents` 来救援!它采用 [OpenTelemetry](https://opentelemetry.io/) 标准来检测智能体运行,允许无缝检查和日志记录。在 [Langfuse](https://langfuse.com/) 和 `SmolagentsInstrumentor` 的帮助下,阿尔弗雷德可以轻松跟踪和分析他的智能体行为。
|
||||
|
||||
设置非常简单!
|
||||
|
||||
首先,我们需要安装必要的依赖项:
|
||||
|
||||
```bash
|
||||
pip install opentelemetry-sdk opentelemetry-exporter-otlp openinference-instrumentation-smolagents
|
||||
```
|
||||
|
||||
接下来,阿尔弗雷德已经在 Langfuse 上创建了一个账户,并准备好了他的 API 密钥。如果你还没有这样做,你可以在[这里](https://cloud.langfuse.com/)注册 Langfuse Cloud 或探索[替代方案](https://huggingface.co/docs/smolagents/tutorials/inspect_runs)。
|
||||
|
||||
一旦你有了 API 密钥,它们需要正确配置如下:
|
||||
|
||||
```python
|
||||
import os
|
||||
import base64
|
||||
|
||||
LANGFUSE_PUBLIC_KEY="pk-lf-..."
|
||||
LANGFUSE_SECRET_KEY="sk-lf-..."
|
||||
LANGFUSE_AUTH=base64.b64encode(f"{LANGFUSE_PUBLIC_KEY}:{LANGFUSE_SECRET_KEY}".encode()).decode()
|
||||
|
||||
os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://cloud.langfuse.com/api/public/otel" # EU 数据区域
|
||||
# os.environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = "https://us.cloud.langfuse.com/api/public/otel" # US 数据区域
|
||||
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"Authorization=Basic {LANGFUSE_AUTH}"
|
||||
```
|
||||
|
||||
最后,阿尔弗雷德准备初始化 `SmolagentsInstrumentor` 并开始跟踪他的智能体性能。
|
||||
|
||||
```python
|
||||
from opentelemetry.sdk.trace import TracerProvider
|
||||
|
||||
from openinference.instrumentation.smolagents import SmolagentsInstrumentor
|
||||
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
|
||||
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
|
||||
|
||||
trace_provider = TracerProvider()
|
||||
trace_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter()))
|
||||
|
||||
SmolagentsInstrumentor().instrument(tracer_provider=trace_provider)
|
||||
```
|
||||
|
||||
阿尔弗雷德现在已连接 🔌!来自 `smolagents` 的运行正在 Langfuse 中记录,让他完全可见智能体的行为。有了这个设置,他准备重新访问之前的运行并进一步改进他的派对准备智能体。
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, HfApiModel
|
||||
|
||||
agent = CodeAgent(tools=[], model=HfApiModel())
|
||||
alfred_agent = agent.from_hub('sergiopaniego/AlfredAgent', trust_remote_code=True)
|
||||
alfred_agent.run("Give me the best playlist for a party at Wayne's mansion. The party idea is a 'villain masquerade' theme")
|
||||
```
|
||||
|
||||
阿尔弗雷德现在可以在[这里](https://cloud.langfuse.com/project/cm7bq0abj025rad078ak3luwi/traces/995fc019255528e4f48cf6770b0ce27b?timestamp=2025-02-19T10%3A28%3A36.929Z)访问这些日志来审查和分析它们。
|
||||
|
||||
同时,[建议的播放列表](https://open.spotify.com/playlist/0gZMMHjuxMrrybQ7wTMTpw)为派对准备设置了完美的氛围。很酷,对吧?🎶
|
||||
|
||||
---
|
||||
|
||||
现在我们已经创建了我们的第一个代码智能体,让我们**学习如何创建工具调用智能体(Tool Calling Agents)**,这是 `smolagents` 中可用的第二种智能体类型。
|
||||
|
||||
## 资源
|
||||
|
||||
- [smolagents 博客](https://huggingface.co/blog/smolagents) - smolagents 和代码交互介绍
|
||||
- [smolagents:构建好的智能体](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) - 可靠智能体的最佳实践
|
||||
- [构建有效的智能体 - Anthropic](https://www.anthropic.com/research/building-effective-agents) - 智能体设计原则
|
||||
- [使用 OpenTelemetry 共享运行](https://huggingface.co/docs/smolagents/tutorials/inspect_runs) - 关于如何为跟踪你的智能体设置 OpenTelemetry 的详细信息
|
||||
11
units/zh-CN/unit2/smolagents/conclusion.mdx
Normal file
11
units/zh-CN/unit2/smolagents/conclusion.mdx
Normal file
@@ -0,0 +1,11 @@
|
||||
# 结论
|
||||
|
||||
恭喜你完成了第二单元的 `smolagents` 模块 🥳
|
||||
|
||||
你刚刚掌握了 `smolagents` 的基础知识,并且构建了自己的智能体!现在你已经具备了 `smolagents` 的技能,你可以开始创建能够解决你感兴趣任务的智能体。
|
||||
|
||||
在下一个模块中,你将学习**如何使用 LlamaIndex 构建智能体(Agents)**。
|
||||
|
||||
最后,我们非常想**听听你对这门课程的看法以及我们如何改进它**。如果你有任何反馈,请👉 [填写这个表格](https://docs.google.com/forms/d/e/1FAIpQLSe9VaONn0eglax0uTwi29rIn4tM7H2sYmmybmG5jJNlE5v0xA/viewform?usp=dialog)
|
||||
|
||||
### 继续学习,保持优秀 🤗
|
||||
25
units/zh-CN/unit2/smolagents/final_quiz.mdx
Normal file
25
units/zh-CN/unit2/smolagents/final_quiz.mdx
Normal file
@@ -0,0 +1,25 @@
|
||||
# 测验时间!
|
||||
|
||||
恭喜你完成了 `smolagents` 的学习材料!你已经取得了很多成就。现在,是时候通过一个测验来测试你的知识了。🧠
|
||||
|
||||
## 说明
|
||||
|
||||
- 测验由代码问题组成。
|
||||
- 你将得到完成代码片段的指示。
|
||||
- 仔细阅读指示并相应地完成代码片段。
|
||||
- 对于每个问题,你将得到结果和一些反馈。
|
||||
|
||||
🧘 **这个测验不计分也不提供证书**。这是关于你理解 `smolagents` 库,并了解你是否应该在书面材料上花更多时间。在接下来的单元中,你将在用例和项目中测试这些知识。
|
||||
|
||||
让我们开始吧!
|
||||
|
||||
## 测验 🚀
|
||||
|
||||
<iframe
|
||||
src="https://agents-course-unit2-smolagents-quiz.hf.space"
|
||||
frameborder="0"
|
||||
width="850"
|
||||
height="450"
|
||||
></iframe>
|
||||
|
||||
你也可以点击👉 [这里](https://huggingface.co/spaces/agents-course/unit2_smolagents_quiz) 访问测验
|
||||
69
units/zh-CN/unit2/smolagents/introduction.mdx
Normal file
69
units/zh-CN/unit2/smolagents/introduction.mdx
Normal file
@@ -0,0 +1,69 @@
|
||||
# `smolagents` 简介
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/thumbnail.jpg" alt="Unit 2.1 Thumbnail"/>
|
||||
|
||||
欢迎来到本模块,在这里你将学习**如何使用 [`smolagents`](https://github.com/huggingface/smolagents) 库构建有效的智能体**,该库提供了一个轻量级框架,用于创建功能强大的AI智能体。
|
||||
|
||||
`smolagents` 是 Hugging Face 的一个库;因此,我们非常感谢您通过**加星标**的方式支持 smolagents [`仓库`](https://github.com/huggingface/smolagents):
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/star_smolagents.gif" alt="staring smolagents"/>
|
||||
|
||||
## 模块概览
|
||||
|
||||
本模块提供了使用 `smolagents` 构建智能体的关键概念和实用策略的全面概述。
|
||||
|
||||
面对众多可用的开源框架,了解使 `smolagents` 成为有用选择的组件和功能,或确定何时另一种解决方案可能更合适,这一点至关重要。
|
||||
|
||||
我们将探索关键的智能体类型,包括为软件开发任务设计的代码智能体(code agents),用于创建模块化、函数驱动工作流的工具调用智能体(tool calling agents),以及访问和综合信息的检索智能体(retrieval agents)。
|
||||
|
||||
此外,我们还将讨论多个智能体的编排,以及视觉能力和网络浏览的集成,这为动态和上下文感知应用开辟了新的可能性。
|
||||
|
||||
在本单元中,第一单元的智能体阿尔弗雷德(Alfred)回归了。这次,他使用 `smolagents` 框架进行内部运作。我们将一起探索这个框架背后的关键概念,同时阿尔弗雷德将处理各种任务。阿尔弗雷德正在韦恩庄园(Wayne Manor)组织一场派对,趁韦恩家族🦇外出时,他有很多事情要做。跟随我们一起展示他的旅程,看他如何使用 `smolagents` 处理这些任务!
|
||||
|
||||
<Tip>
|
||||
|
||||
在本单元中,您将学习使用 `smolagents` 库构建AI智能体。您的智能体将能够搜索数据、执行代码并与网页交互。您还将学习如何结合多个智能体来创建更强大的系统。
|
||||
|
||||
</Tip>
|
||||
|
||||

|
||||
|
||||
## 内容
|
||||
|
||||
在这个关于 `smolagents` 的单元中,我们涵盖:
|
||||
|
||||
### 1️⃣ [为什么使用 smolagents](./why_use_smolagents)
|
||||
|
||||
`smolagents` 是众多可用于应用程序开发的开源智能体框架之一。其他选择包括 `LlamaIndex` 和 `LangGraph`,这些在本课程的其他模块中也有涵盖。`smolagents` 提供了几个关键特性,可能使其非常适合特定用例,但在选择框架时,我们应该始终考虑所有选项。我们将探讨使用 `smolagents` 的优势和缺点,帮助您根据项目需求做出明智的决定。
|
||||
|
||||
### 2️⃣ [代码智能体](./code_agents)
|
||||
|
||||
`CodeAgents`(代码智能体)是 `smolagents` 中的主要智能体类型。这些智能体不是生成 JSON 或文本,而是生成 Python 代码来执行操作。本模块探讨它们的目的、功能以及工作原理,并提供实际例子来展示它们的能力。
|
||||
|
||||
### 3️⃣ [工具调用智能体](./tool_calling_agents)
|
||||
|
||||
`ToolCallingAgents`(工具调用智能体)是 `smolagents` 支持的第二种智能体类型。与生成 Python 代码的 `CodeAgents` 不同,这些智能体依赖于系统必须解析和解释以执行操作的 JSON/文本块。本模块涵盖它们的功能、与 `CodeAgents` 的主要区别,并提供示例说明其用法。
|
||||
|
||||
### 4️⃣ [工具](./tools)
|
||||
|
||||
正如我们在第 1 单元中看到的,工具是大语言模型(LLM)可以在智能体系统中使用的函数,它们作为智能体行为的基本构建块。本模块涵盖如何创建工具、它们的结构,以及使用 `Tool` 类或 `@tool` 装饰器的不同实现方法。您还将了解默认工具箱、如何与社区共享工具,以及如何加载社区贡献的工具以在您的智能体中使用。
|
||||
|
||||
### 5️⃣ [检索智能体](./retrieval_agents)
|
||||
|
||||
检索智能体(Retrieval agents)使模型能够访问知识库,从而可以从多个来源搜索、综合和检索信息。它们利用向量存储(vector stores)进行高效检索,并实现 **检索增强生成(Retrieval-Augmented Generation,RAG)** 模式。这些智能体特别适用于将网络搜索与自定义知识库集成,同时通过记忆系统维持对话上下文。本模块探讨实施策略,包括用于稳健信息检索的回退机制。
|
||||
|
||||
### 6️⃣ [多智能体系统](./multi_agent_systems)
|
||||
|
||||
有效地编排多个智能体对于构建强大的多智能体系统至关重要。通过组合具有不同能力的智能体(例如,将网络搜索智能体与代码执行智能体结合),您可以创建更复杂的解决方案。本模块专注于设计、实施和管理多智能体系统,以最大限度地提高效率和可靠性。
|
||||
|
||||
### 7️⃣ [视觉和浏览器智能体](./vision_agents)
|
||||
|
||||
视觉智能体(Vision agents)通过整合 **视觉-语言模型(Vision-Language Models,VLMs)** 扩展了传统智能体的能力,使其能够处理和解释视觉信息。本模块探讨如何设计和集成由 VLM 驱动的智能体,从而解锁诸如基于图像的推理、视觉数据分析和多模态交互等高级功能。我们还将使用视觉智能体构建一个浏览器智能体,能够浏览网络并从中提取信息。
|
||||
|
||||
## 资源
|
||||
|
||||
- [smolagents 文档](https://huggingface.co/docs/smolagents) - smolagents 库的官方文档
|
||||
- [构建有效的智能体](https://www.anthropic.com/research/building-effective-agents) - 关于智能体架构的研究论文
|
||||
- [智能体指南](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) - 构建可靠智能体的最佳实践
|
||||
- [LangGraph 智能体](https://langchain-ai.github.io/langgraph/) - 智能体实现的其他示例
|
||||
- [函数调用指南](https://platform.openai.com/docs/guides/function-calling) - 了解大语言模型中的函数调用
|
||||
- [RAG 最佳实践](https://www.pinecone.io/learn/retrieval-augmented-generation/) - 实施有效 RAG 的指南
|
||||
413
units/zh-CN/unit2/smolagents/multi_agent_systems.mdx
Normal file
413
units/zh-CN/unit2/smolagents/multi_agent_systems.mdx
Normal file
@@ -0,0 +1,413 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/multiagent_notebook.ipynb"},
|
||||
]} />
|
||||
|
||||
# 多智能体系统
|
||||
|
||||
多智能体系统使**专业智能体能够在复杂任务上进行协作**,提高模块化、可扩展性和稳健性。不依赖单一智能体,任务分配给具有不同能力的智能体。
|
||||
|
||||
在 **smolagents** 中,不同的智能体可以组合起来生成 Python 代码、调用外部工具、执行网络搜索等。通过编排这些智能体,我们可以创建强大的工作流。
|
||||
|
||||
一个典型的设置可能包括:
|
||||
- **管理智能体(Manager Agent)** 用于任务委派
|
||||
- **代码解释器智能体(Code Interpreter Agent)** 用于代码执行
|
||||
- **网络搜索智能体(Web Search Agent)** 用于信息检索
|
||||
|
||||
下图说明了一个简单的多智能体架构,其中**管理智能体**协调**代码解释器工具**和**网络搜索智能体**,后者利用像 `DuckDuckGoSearchTool` 和 `VisitWebpageTool` 这样的工具来收集相关信息。
|
||||
|
||||
<img src="https://mermaid.ink/img/pako:eNp1kc1qhTAQRl9FUiQb8wIpdNO76eKubrmFks1oRg3VSYgjpYjv3lFL_2hnMWQOJwn5sqgmelRWleUSKLAtFs09jqhtoWuYUFfFAa6QA9QDTnpzamheuhxn8pt40-6l13UtS0ddhtQXj6dbR4XUGQg6zEYasTF393KjeSDGnDJKNxzj8I_7hLW5IOSmP9CH9hv_NL-d94d4DVNg84p1EnK4qlIj5hGClySWbadT-6OdsrL02MI8sFOOVkciw8zx8kaNspxnrJQE0fXKtjBMMs3JA-MpgOQwftIE9Bzj14w-cMznI_39E9Z3p0uFoA?type=png" style='background: white;'>
|
||||
|
||||
## 多智能体系统实战
|
||||
|
||||
多智能体系统由多个专业智能体在 **编排智能体(Orchestrator Agent)** 的协调下共同工作组成。这种方法通过在具有不同角色的智能体之间分配任务来实现复杂的工作流。
|
||||
|
||||
例如,**多智能体 RAG 系统**可以整合:
|
||||
- **网络智能体(Web Agent)** 用于浏览互联网。
|
||||
- **检索智能体(Retriever Agent)** 用于从知识库获取信息。
|
||||
- **图像生成智能体(Image Generation Agent)** 用于生成视觉内容。
|
||||
|
||||
所有这些智能体在管理任务委派和交互的编排者下运行。
|
||||
|
||||
## 用多智能体层次结构解决复杂任务
|
||||
|
||||
<Tip>
|
||||
你可以在 <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/multiagent_notebook.ipynb" target="_blank">这个笔记本</a> 中跟随代码,可以使用 Google Colab 运行。
|
||||
</Tip>
|
||||
|
||||
接待会即将到来!在你的帮助下,阿尔弗雷德现在几乎完成了准备工作。
|
||||
|
||||
但现在有个问题:蝙蝠车不见了。阿尔弗雷德需要找到替代品,而且要快。
|
||||
|
||||
幸运的是,已经有一些关于布鲁斯·韦恩生活的传记电影,所以也许阿尔弗雷德可以从某个电影拍摄现场留下的汽车中获取一辆,并将其重新改造到现代标准,这当然会包括完全自动驾驶选项。
|
||||
|
||||
但这可能在世界各地的任何拍摄地点——可能数量众多。
|
||||
|
||||
所以阿尔弗雷德需要你的帮助。你能构建一个能够解决这个任务的智能体吗?
|
||||
|
||||
> 👉 Find all Batman filming locations in the world, calculate the time to transfer via boat to there, and represent them on a map, with a color varying by boat transfer time. Also represent some supercar factories with the same boat transfer time.
|
||||
|
||||
让我们来构建这个!
|
||||
|
||||
这个例子需要一些额外的包,所以首先安装它们:
|
||||
|
||||
```bash
|
||||
pip install 'smolagents[litellm]' matplotlib geopandas shapely kaleido -q
|
||||
```
|
||||
|
||||
### 我们首先制作一个工具来获取货运飞机转运时间。
|
||||
|
||||
```python
|
||||
import math
|
||||
from typing import Optional, Tuple
|
||||
|
||||
from smolagents import tool
|
||||
|
||||
|
||||
@tool
|
||||
def calculate_cargo_travel_time(
|
||||
origin_coords: Tuple[float, float],
|
||||
destination_coords: Tuple[float, float],
|
||||
cruising_speed_kmh: Optional[float] = 750.0, # 货运飞机的平均速度
|
||||
) -> float:
|
||||
"""
|
||||
Calculate the travel time for a cargo plane between two points on Earth using great-circle distance.
|
||||
|
||||
Args:
|
||||
origin_coords: Tuple of (latitude, longitude) for the starting point
|
||||
destination_coords: Tuple of (latitude, longitude) for the destination
|
||||
cruising_speed_kmh: Optional cruising speed in km/h (defaults to 750 km/h for typical cargo planes)
|
||||
|
||||
Returns:
|
||||
float: The estimated travel time in hours
|
||||
|
||||
Example:
|
||||
>>> # Chicago (41.8781° N, 87.6298° W) to Sydney (33.8688° S, 151.2093° E)
|
||||
>>> result = calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093))
|
||||
"""
|
||||
|
||||
def to_radians(degrees: float) -> float:
|
||||
return degrees * (math.pi / 180)
|
||||
|
||||
# 提取坐标
|
||||
lat1, lon1 = map(to_radians, origin_coords)
|
||||
lat2, lon2 = map(to_radians, destination_coords)
|
||||
|
||||
# 地球半径(公里)
|
||||
EARTH_RADIUS_KM = 6371.0
|
||||
|
||||
# 使用半正矢公式计算大圆距离
|
||||
dlon = lon2 - lon1
|
||||
dlat = lat2 - lat1
|
||||
|
||||
a = (
|
||||
math.sin(dlat / 2) ** 2
|
||||
+ math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
|
||||
)
|
||||
c = 2 * math.asin(math.sqrt(a))
|
||||
distance = EARTH_RADIUS_KM * c
|
||||
|
||||
# 增加10%以考虑非直接路线和空中交通管制
|
||||
actual_distance = distance * 1.1
|
||||
|
||||
# 计算飞行时间
|
||||
# 为起飞和着陆程序增加1小时
|
||||
flight_time = (actual_distance / cruising_speed_kmh) + 1.0
|
||||
|
||||
# 格式化结果
|
||||
return round(flight_time, 2)
|
||||
|
||||
|
||||
print(calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093)))
|
||||
```
|
||||
|
||||
### 设置智能体
|
||||
|
||||
对于模型提供商,我们使用 Together AI,这是 [Hub 上的新推理提供商](https://huggingface.co/blog/inference-providers)之一!
|
||||
|
||||
GoogleSearchTool 使用 [Serper API](https://serper.dev) 搜索网络,因此这需要设置环境变量 `SERPAPI_API_KEY` 并传递 `provider="serpapi"` 或者拥有 `SERPER_API_KEY` 并传递 `provider=serper`。
|
||||
|
||||
如果你没有设置任何 Serp API 提供商,你可以使用 `DuckDuckGoSearchTool`,但请注意它有速率限制。
|
||||
|
||||
```python
|
||||
import os
|
||||
from PIL import Image
|
||||
from smolagents import CodeAgent, GoogleSearchTool, HfApiModel, VisitWebpageTool
|
||||
|
||||
model = HfApiModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct", provider="together")
|
||||
```
|
||||
|
||||
我们可以先创建一个简单的智能体作为基线,为我们提供一个简单的报告。
|
||||
|
||||
```python
|
||||
task = """Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W), and return them to me as a pandas dataframe.
|
||||
Also give me some supercar factories with the same cargo plane transfer time."""
|
||||
```
|
||||
|
||||
```python
|
||||
agent = CodeAgent(
|
||||
model=model,
|
||||
tools=[GoogleSearchTool("serper"), VisitWebpageTool(), calculate_cargo_travel_time],
|
||||
additional_authorized_imports=["pandas"],
|
||||
max_steps=20,
|
||||
)
|
||||
```
|
||||
|
||||
```python
|
||||
result = agent.run(task)
|
||||
```
|
||||
|
||||
```python
|
||||
result
|
||||
```
|
||||
|
||||
在我们的例子中,它生成了这个输出:
|
||||
|
||||
```python
|
||||
| | Location | Travel Time to Gotham (hours) |
|
||||
|--|------------------------------------------------------|------------------------------|
|
||||
| 0 | Necropolis Cemetery, Glasgow, Scotland, UK | 8.60 |
|
||||
| 1 | St. George's Hall, Liverpool, England, UK | 8.81 |
|
||||
| 2 | Two Temple Place, London, England, UK | 9.17 |
|
||||
| 3 | Wollaton Hall, Nottingham, England, UK | 9.00 |
|
||||
| 4 | Knebworth House, Knebworth, Hertfordshire, UK | 9.15 |
|
||||
| 5 | Acton Lane Power Station, Acton Lane, Acton, UK | 9.16 |
|
||||
| 6 | Queensboro Bridge, New York City, USA | 1.01 |
|
||||
| 7 | Wall Street, New York City, USA | 1.00 |
|
||||
| 8 | Mehrangarh Fort, Jodhpur, Rajasthan, India | 18.34 |
|
||||
| 9 | Turda Gorge, Turda, Romania | 11.89 |
|
||||
| 10 | Chicago, USA | 2.68 |
|
||||
| 11 | Hong Kong, China | 19.99 |
|
||||
| 12 | Cardington Studios, Northamptonshire, UK | 9.10 |
|
||||
| 13 | Warner Bros. Leavesden Studios, Hertfordshire, UK | 9.13 |
|
||||
| 14 | Westwood, Los Angeles, CA, USA | 6.79 |
|
||||
| 15 | Woking, UK (McLaren) | 9.13 |
|
||||
```
|
||||
|
||||
我们可以通过添加一些专门的规划步骤和更多的提示来进一步改进这一点。
|
||||
|
||||
规划步骤允许智能体提前思考并规划其下一步行动,这对于更复杂的任务非常有用。
|
||||
|
||||
```python
|
||||
agent.planning_interval = 4
|
||||
|
||||
detailed_report = agent.run(f"""
|
||||
You're an expert analyst. You make comprehensive reports after visiting many websites.
|
||||
Don't hesitate to search for many queries at once in a for loop.
|
||||
For each data point that you find, visit the source url to confirm numbers.
|
||||
|
||||
{task}
|
||||
""")
|
||||
|
||||
print(detailed_report)
|
||||
```
|
||||
|
||||
```python
|
||||
detailed_report
|
||||
```
|
||||
|
||||
在我们的例子中,它生成了这个输出:
|
||||
|
||||
```python
|
||||
| | Location | Travel Time (hours) |
|
||||
|--|--------------------------------------------------|---------------------|
|
||||
| 0 | Bridge of Sighs, Glasgow Necropolis, Glasgow, UK | 8.6 |
|
||||
| 1 | Wishart Street, Glasgow, Scotland, UK | 8.6 |
|
||||
```
|
||||
|
||||
|
||||
感谢这些快速更改,我们通过简单地为我们的智能体提供详细提示,并赋予它规划能力,获得了更加简洁的报告!
|
||||
|
||||
模型的上下文窗口正在快速填满。所以**如果我们要求我们的智能体将详细搜索的结果与另一个结合起来,它将变得更慢,并且会迅速增加令牌数量和成本**。
|
||||
|
||||
➡️ 我们需要改进系统的结构。
|
||||
|
||||
### ✌️ 在两个智能体之间分割任务
|
||||
|
||||
多智能体结构允许在不同子任务之间分离记忆,带来两大好处:
|
||||
- 每个智能体更专注于其核心任务,因此性能更佳
|
||||
- 分离记忆减少了每个步骤的输入令牌数量,从而减少延迟和成本。
|
||||
|
||||
让我们创建一个团队,包含一个专门的网络搜索智能体,由另一个智能体管理。
|
||||
|
||||
管理智能体应该具有绘图功能来编写其最终报告:因此让我们给它访问额外导入的权限,包括 `matplotlib` 和 `geopandas` + `shapely` 用于空间绘图。
|
||||
|
||||
```python
|
||||
model = HfApiModel(
|
||||
"Qwen/Qwen2.5-Coder-32B-Instruct", provider="together", max_tokens=8096
|
||||
)
|
||||
|
||||
web_agent = CodeAgent(
|
||||
model=model,
|
||||
tools=[
|
||||
GoogleSearchTool(provider="serper"),
|
||||
VisitWebpageTool(),
|
||||
calculate_cargo_travel_time,
|
||||
],
|
||||
name="web_agent",
|
||||
description="Browses the web to find information",
|
||||
verbosity_level=0,
|
||||
max_steps=10,
|
||||
)
|
||||
```
|
||||
|
||||
管理智能体需要进行一些较重的思考工作。
|
||||
|
||||
所以我们给它更强大的模型 [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1),并添加 `planning_interval` 到组合中。
|
||||
|
||||
```python
|
||||
from smolagents.utils import encode_image_base64, make_image_url
|
||||
from smolagents import OpenAIServerModel
|
||||
|
||||
|
||||
def check_reasoning_and_plot(final_answer, agent_memory):
|
||||
final_answer
|
||||
multimodal_model = OpenAIServerModel("gpt-4o", max_tokens=8096)
|
||||
filepath = "saved_map.png"
|
||||
assert os.path.exists(filepath), "Make sure to save the plot under saved_map.png!"
|
||||
image = Image.open(filepath)
|
||||
prompt = (
|
||||
f"Here is a user-given task and the agent steps: {agent_memory.get_succinct_steps()}. Now here is the plot that was made."
|
||||
"Please check that the reasoning process and plot are correct: do they correctly answer the given task?"
|
||||
"First list reasons why yes/no, then write your final decision: PASS in caps lock if it is satisfactory, FAIL if it is not."
|
||||
"Don't be harsh: if the plot mostly solves the task, it should pass."
|
||||
"To pass, a plot should be made using px.scatter_map and not any other method (scatter_map looks nicer)."
|
||||
)
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": prompt,
|
||||
},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {"url": make_image_url(encode_image_base64(image))},
|
||||
},
|
||||
],
|
||||
}
|
||||
]
|
||||
output = multimodal_model(messages).content
|
||||
print("Feedback: ", output)
|
||||
if "FAIL" in output:
|
||||
raise Exception(output)
|
||||
return True
|
||||
|
||||
|
||||
manager_agent = CodeAgent(
|
||||
model=HfApiModel("deepseek-ai/DeepSeek-R1", provider="together", max_tokens=8096),
|
||||
tools=[calculate_cargo_travel_time],
|
||||
managed_agents=[web_agent],
|
||||
additional_authorized_imports=[
|
||||
"geopandas",
|
||||
"plotly",
|
||||
"shapely",
|
||||
"json",
|
||||
"pandas",
|
||||
"numpy",
|
||||
],
|
||||
planning_interval=5,
|
||||
verbosity_level=2,
|
||||
final_answer_checks=[check_reasoning_and_plot],
|
||||
max_steps=15,
|
||||
)
|
||||
```
|
||||
|
||||
让我们检查这个团队是什么样子:
|
||||
|
||||
```python
|
||||
manager_agent.visualize()
|
||||
```
|
||||
|
||||
这将生成类似于下面的内容,帮助我们理解智能体和使用的工具之间的结构和关系:
|
||||
|
||||
```python
|
||||
CodeAgent | deepseek-ai/DeepSeek-R1
|
||||
├── ✅ Authorized imports: ['geopandas', 'plotly', 'shapely', 'json', 'pandas', 'numpy']
|
||||
├── 🛠️ Tools:
|
||||
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||||
│ ┃ Name ┃ Description ┃ Arguments ┃
|
||||
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||||
│ │ calculate_cargo_travel_time │ Calculate the travel time for a cargo │ origin_coords (`array`): Tuple of │
|
||||
│ │ │ plane between two points on Earth │ (latitude, longitude) for the │
|
||||
│ │ │ using great-circle distance. │ starting point │
|
||||
│ │ │ │ destination_coords (`array`): Tuple │
|
||||
│ │ │ │ of (latitude, longitude) for the │
|
||||
│ │ │ │ destination │
|
||||
│ │ │ │ cruising_speed_kmh (`number`): │
|
||||
│ │ │ │ Optional cruising speed in km/h │
|
||||
│ │ │ │ (defaults to 750 km/h for typical │
|
||||
│ │ │ │ cargo planes) │
|
||||
│ │ final_answer │ Provides a final answer to the given │ answer (`any`): The final answer to │
|
||||
│ │ │ problem. │ the problem │
|
||||
│ └─────────────────────────────┴───────────────────────────────────────┴───────────────────────────────────────┘
|
||||
└── 🤖 Managed agents:
|
||||
└── web_agent | CodeAgent | Qwen/Qwen2.5-Coder-32B-Instruct
|
||||
├── ✅ Authorized imports: []
|
||||
├── 📝 Description: Browses the web to find information
|
||||
└── 🛠️ Tools:
|
||||
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
|
||||
┃ Name ┃ Description ┃ Arguments ┃
|
||||
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
|
||||
│ web_search │ Performs a google web search for │ query (`string`): The search │
|
||||
│ │ your query then returns a string │ query to perform. │
|
||||
│ │ of the top search results. │ filter_year (`integer`): │
|
||||
│ │ │ Optionally restrict results to a │
|
||||
│ │ │ certain year │
|
||||
│ visit_webpage │ Visits a webpage at the given url │ url (`string`): The url of the │
|
||||
│ │ and reads its content as a │ webpage to visit. │
|
||||
│ │ markdown string. Use this to │ │
|
||||
│ │ browse webpages. │ │
|
||||
│ calculate_cargo_travel_time │ Calculate the travel time for a │ origin_coords (`array`): Tuple of │
|
||||
│ │ cargo plane between two points on │ (latitude, longitude) for the │
|
||||
│ │ Earth using great-circle │ starting point │
|
||||
│ │ distance. │ destination_coords (`array`): │
|
||||
│ │ │ Tuple of (latitude, longitude) │
|
||||
│ │ │ for the destination │
|
||||
│ │ │ cruising_speed_kmh (`number`): │
|
||||
│ │ │ Optional cruising speed in km/h │
|
||||
│ │ │ (defaults to 750 km/h for typical │
|
||||
│ │ │ cargo planes) │
|
||||
│ final_answer │ Provides a final answer to the │ answer (`any`): The final answer │
|
||||
│ │ given problem. │ to the problem │
|
||||
└─────────────────────────────┴───────────────────────────────────┴───────────────────────────────────┘
|
||||
```
|
||||
|
||||
```python
|
||||
manager_agent.run("""
|
||||
Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128° N, 74.0060° W).
|
||||
Also give me some supercar factories with the same cargo plane transfer time. You need at least 6 points in total.
|
||||
Represent this as spatial map of the world, with the locations represented as scatter points with a color that depends on the travel time, and save it to saved_map.png!
|
||||
|
||||
Here's an example of how to plot and return a map:
|
||||
import plotly.express as px
|
||||
df = px.data.carshare()
|
||||
fig = px.scatter_map(df, lat="centroid_lat", lon="centroid_lon", text="name", color="peak_hour", size=100,
|
||||
color_continuous_scale=px.colors.sequential.Magma, size_max=15, zoom=1)
|
||||
fig.show()
|
||||
fig.write_image("saved_image.png")
|
||||
final_answer(fig)
|
||||
|
||||
Never try to process strings using code: when you have a string to read, just print it and you'll see it.
|
||||
""")
|
||||
```
|
||||
|
||||
我不知道在你的运行中情况如何,但在我的运行中,管理智能体巧妙地将任务分配给网络智能体,首先是 `1. Search for Batman filming locations`,然后是 `2. Find supercar factories`,最后聚合列表并绘制地图。
|
||||
|
||||
让我们通过直接从智能体状态查看地图:
|
||||
|
||||
```python
|
||||
manager_agent.python_executor.state["fig"]
|
||||
```
|
||||
|
||||
这将输出地图:
|
||||
|
||||

|
||||
|
||||
## 资源
|
||||
|
||||
- [多智能体系统](https://huggingface.co/docs/smolagents/main/en/examples/multiagents) – 多智能体系统概述。
|
||||
- [什么是智能体 RAG?](https://weaviate.io/blog/what-is-agentic-rag) – 智能体 RAG 介绍。
|
||||
- [多智能体 RAG 系统 🤖🤝🤖 配方](https://huggingface.co/learn/cookbook/multiagent_rag_system) – 构建多智能体 RAG 系统的分步指南。
|
||||
142
units/zh-CN/unit2/smolagents/quiz1.mdx
Normal file
142
units/zh-CN/unit2/smolagents/quiz1.mdx
Normal file
@@ -0,0 +1,142 @@
|
||||
# 小测验 (不计分) [[quiz1]]
|
||||
|
||||
让我们用一个快速测验来测试你对 `smolagents` 的理解!请记住,自我测试有助于强化学习并识别可能需要复习的领域。
|
||||
|
||||
这是一个可选测验,不计分。
|
||||
|
||||
### Q1: 选择 `smolagents` 而非其他框架的主要优势之一是什么?
|
||||
哪个陈述最能体现 `smolagents` 方法的核心优势?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "它使用高度专业化的配置文件和陡峭的学习曲线,确保只有专业开发人员能够使用它",
|
||||
explain: "smolagents 设计注重简单性和最小代码复杂性,而不是陡峭的学习曲线。",
|
||||
},
|
||||
{
|
||||
text: "它支持代码优先方法,具有最少的抽象,让智能体通过 Python 函数调用直接交互",
|
||||
explain: "是的,smolagents 强调直接、以代码为中心的设计,具有最小的抽象。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "它专注于基于 JSON 的操作,消除了智能体编写任何代码的需求",
|
||||
explain: "虽然 smolagents 支持基于 JSON 的工具调用(ToolCallingAgents),但该库强调基于代码的方法,如 CodeAgents。",
|
||||
},
|
||||
{
|
||||
text: "它与单一 LLM 提供商和专用硬件深度集成",
|
||||
explain: "smolagents 支持多种模型提供商,并且不需要专用硬件。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q2: 在哪种情况下,你可能最能从使用 smolagents 中受益?
|
||||
哪种情况最符合 smolagents 的优势?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "构建大型企业系统,需要数十个微服务和实时数据管道",
|
||||
explain: "虽然可能,但 smolagents 更专注于轻量级、以代码为中心的实验,而不是重型企业基础设施。",
|
||||
},
|
||||
{
|
||||
text: "快速原型设计或实验智能体逻辑,特别是当你的应用相对简单直接时",
|
||||
explain: "是的。smolagents 设计用于简单灵活的智能体创建,无需大量设置开销。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "需要一个只支持基于云的 LLM 并禁止本地推理的框架",
|
||||
explain: "smolagents 提供与本地或托管模型的灵活集成,不仅限于基于云的 LLM。",
|
||||
},
|
||||
{
|
||||
text: "需要高级编排、多模态感知和开箱即用的企业级功能的场景",
|
||||
explain: "虽然你可以集成高级功能,但 smolagents 本身在核心上是轻量级和简约的。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q3: smolagents 在模型集成方面提供了灵活性。哪个陈述最能反映其方法?
|
||||
选择最准确描述 smolagents 如何与 LLM 互操作的说明。
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "它只提供一个内置模型,不允许自定义集成",
|
||||
explain: "smolagents 支持多种不同的后端和用户定义的模型。",
|
||||
},
|
||||
{
|
||||
text: "它可以与广泛的 LLM 一起使用,提供预定义的类如 TransformersModel、HfApiModel 和 LiteLLMModel",
|
||||
explain: "这是正确的。smolagents 通过各种类支持灵活的模型集成。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "它要求你为每次 LLM 使用实现自己的模型连接器",
|
||||
explain: "有多种预构建的连接器使 LLM 集成变得简单直接。",
|
||||
},
|
||||
{
|
||||
text: "它只与开源 LLM 集成,不支持商业 API",
|
||||
explain: "smolagents 可以与开源和商业模型 API 集成。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q4: smolagents 如何处理基于代码的操作和基于 JSON 的操作之间的争论?
|
||||
哪个陈述正确地描述了 smolagents 关于操作格式的理念?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "它只允许所有智能体任务使用基于 JSON 的操作,需要解析器来提取工具调用",
|
||||
explain: "ToolCallingAgent 使用基于 JSON 的调用,但 smolagents 也提供主要的 CodeAgent 选项,可以编写 Python 代码。",
|
||||
},
|
||||
{
|
||||
text: "它通过 CodeAgent 专注于基于代码的操作,但也通过 ToolCallingAgent 支持基于 JSON 的工具调用",
|
||||
explain: "是的,smolagents 主要推荐基于代码的操作,但也为喜欢或需要它的用户提供了基于 JSON 的替代方案。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "它禁止任何外部函数调用,而是要求所有逻辑完全存在于 LLM 内部",
|
||||
explain: "smolagents 专门设计用于授予 LLM 调用外部工具或代码的能力。",
|
||||
},
|
||||
{
|
||||
text: "它要求用户在运行智能体之前手动将每个代码片段转换为 JSON 对象",
|
||||
explain: "smolagents 可以在 CodeAgent 路径中自动管理代码片段创建,无需手动 JSON 转换。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q5: smolagents 如何与 Hugging Face Hub 集成以获得额外优势?
|
||||
哪个陈述准确描述了 Hub 集成的核心优势之一?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "它自动将所有公共模型升级到商业许可层级",
|
||||
explain: "Hub 集成不会改变模型或工具的许可层级。",
|
||||
},
|
||||
{
|
||||
text: "它允许你推送和共享智能体或工具,使其他开发者易于发现和重用",
|
||||
explain: "正确。smolagents 支持将智能体和工具上传到 HF Hub 供他人重用。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "它完全禁用本地推理,只强制使用远程模型",
|
||||
explain: "如果用户愿意,仍然可以进行本地推理;推送到 Hub 不会覆盖本地使用。",
|
||||
},
|
||||
{
|
||||
text: "它永久存储所有基于代码的智能体,防止任何更新或版本控制",
|
||||
explain: "Hub 仓库支持更新和版本控制,因此你可以随时修改基于代码的智能体。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
恭喜你完成了这个测验!🎉 如果你错过了任何问题,可以考虑复习*为什么使用 smolagents*部分以更深入理解。如果你表现良好,你已经准备好探索 smolagents 中更高级的主题了!
|
||||
147
units/zh-CN/unit2/smolagents/quiz2.mdx
Normal file
147
units/zh-CN/unit2/smolagents/quiz2.mdx
Normal file
@@ -0,0 +1,147 @@
|
||||
# 小测验(不计分)[[quiz2]]
|
||||
|
||||
现在该测试您对*代码智能体*、*工具调用智能体*和*工具*章节的理解了。本测验为可选且不计分。
|
||||
|
||||
---
|
||||
|
||||
### Q1: 使用 `@tool` 装饰器创建工具与创建 `Tool` 的子类之间的主要区别是什么?
|
||||
|
||||
以下哪个陈述最能描述这两种定义工具方法的区别?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "使用 <code>@tool</code> 装饰器是检索类工具的强制要求,而 <code>Tool</code> 的子类仅用于文本生成任务",
|
||||
explain: "两种方法都适用于任何类型的工具,包括检索类和文本生成类工具。",
|
||||
},
|
||||
{
|
||||
text: "推荐使用 <code>@tool</code> 装饰器创建简单的基于函数的工具,而 <code>Tool</code> 的子类能为复杂功能或自定义元数据提供更大灵活性",
|
||||
explain: "正确。装饰器方法更简单,但子类化允许更定制化的行为。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "<code>@tool</code> 只能用于多智能体系统,而创建 <code>Tool</code> 的子类适用于单智能体场景",
|
||||
explain: "所有智能体(单或多)都可以使用这两种方法定义工具,没有此类限制。",
|
||||
},
|
||||
{
|
||||
text: "用 <code>@tool</code> 装饰函数可以替代文档字符串,而子类工具必须不包含文档字符串",
|
||||
explain: "两种方法都需要清晰的文档字符串。装饰器不会替代它们,子类仍然可以包含文档字符串。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q2: CodeAgent 如何使用 ReAct(推理+行动)方法处理多步骤任务?
|
||||
|
||||
哪个陈述正确描述了 CodeAgent 执行系列步骤来解决任务的方式?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "它将每个步骤传递给多智能体系统中的不同智能体,然后合并结果",
|
||||
explain: "尽管多智能体系统可以分配任务,但 CodeAgent 本身可以使用 ReAct 独立处理多个步骤。",
|
||||
},
|
||||
{
|
||||
text: "它将所有操作存储为 JSON 格式以便解析,然后一次性执行所有操作",
|
||||
explain: "此行为匹配 ToolCallingAgent 的基于 JSON 的方法,而非 CodeAgent。",
|
||||
},
|
||||
{
|
||||
text: "它循环执行以下流程:编写内部思考、生成 Python 代码、执行代码并记录结果,直到得出最终答案",
|
||||
explain: "正确。这描述了 CodeAgent 使用的 ReAct 模式,包括迭代推理和代码执行。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "它依赖视觉模块验证代码输出后才能继续下一步",
|
||||
explain: "smolagents 支持视觉能力,但它们不是 CodeAgent 或 ReAct 方法的默认要求。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q3: 在 Hugging Face Hub 上共享工具的主要优势是什么?
|
||||
|
||||
选择开发者可能上传和共享自定义工具的最佳原因。
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "它会自动将工具与 MultiStepAgent 集成以实现检索增强生成",
|
||||
explain: "共享工具不会自动设置检索或多步逻辑,只是使工具可用。",
|
||||
},
|
||||
{
|
||||
text: "它允许他人在无需额外设置的情况下发现、重用并将您的工具集成到他们的 smolagents 中",
|
||||
explain: "正确。在 Hub 上共享使工具可供任何人(包括您自己)快速下载和重用。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "它确保只有 CodeAgent 可以调用该工具,而 ToolCallingAgent 无法调用",
|
||||
explain: "CodeAgent 和 ToolCallingAgent 都可以调用共享工具,没有基于智能体类型的限制。",
|
||||
},
|
||||
{
|
||||
text: "它会将您的工具转换为具备完整视觉能力的图像处理函数",
|
||||
explain: "工具共享不会自动改变工具功能或增加视觉能力。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q4: ToolCallingAgent 在执行操作方面与 CodeAgent 有何不同?
|
||||
|
||||
选择准确描述 ToolCallingAgent 工作方式的选项。
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "ToolCallingAgent 仅兼容多智能体系统,而 CodeAgent 可以单独运行",
|
||||
explain: "两种智能体都可以单独使用或作为多智能体系统的一部分。",
|
||||
},
|
||||
{
|
||||
text: "ToolCallingAgent 将所有推理委托给单独的检索智能体,然后返回最终答案",
|
||||
explain: "ToolCallingAgent 仍然使用主 LLM 进行推理,不完全依赖检索智能体。",
|
||||
},
|
||||
{
|
||||
text: "ToolCallingAgent 输出指定工具调用和参数的 JSON 指令,这些指令会被解析并执行",
|
||||
explain: "正确。ToolCallingAgent 使用 JSON 方法来定义工具调用。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "ToolCallingAgent 仅适用于单步任务,在调用一个工具后自动停止",
|
||||
explain: "ToolCallingAgent 可以像 CodeAgent 一样根据需要执行多个步骤。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
### Q5: smolagents 默认工具箱包含哪些内容?为什么要使用它?
|
||||
|
||||
哪个陈述最能体现 smolagents 默认工具箱的目的和内容?
|
||||
|
||||
<Question
|
||||
choices={[
|
||||
{
|
||||
text: "它提供常用工具集(如 DuckDuckGo 搜索、PythonInterpreterTool 和最终答案工具)用于快速原型开发",
|
||||
explain: "正确。默认工具箱包含这些现成工具,便于在构建智能体时快速集成。",
|
||||
correct: true
|
||||
},
|
||||
{
|
||||
text: "它默认仅支持基于视觉的任务(如图像分类或 OCR)",
|
||||
explain: "尽管 smolagents 可以集成视觉功能,但默认工具箱并非专门面向视觉任务。",
|
||||
},
|
||||
{
|
||||
text: "它专门为多智能体系统设计,与单 CodeAgent 不兼容",
|
||||
explain: "默认工具箱适用于任何智能体类型,包括单智能体和多智能体设置。",
|
||||
},
|
||||
{
|
||||
text: "它添加了基于检索的高级功能,支持来自向量存储的大规模问答",
|
||||
explain: "虽然可以构建检索工具,但默认工具箱不会自动提供高级 RAG 功能。",
|
||||
}
|
||||
]}
|
||||
/>
|
||||
|
||||
---
|
||||
|
||||
恭喜完成测验!🎉 如果有任何问题让您感到困难,请重新访问*代码智能体*、*工具调用智能体*或*工具*章节以加强理解。如果您表现出色,那么您已踏上构建强大 smolagents 应用的道路!
|
||||
164
units/zh-CN/unit2/smolagents/retrieval_agents.mdx
Normal file
164
units/zh-CN/unit2/smolagents/retrieval_agents.mdx
Normal file
@@ -0,0 +1,164 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/retrieval_agents.ipynb"},
|
||||
]} />
|
||||
|
||||
# 构建智能驱动的 RAG 系统
|
||||
|
||||
<Tip>
|
||||
您可以通过 <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/retrieval_agents.ipynb" target="_blank">此 Notebook</a> 跟随代码实践,该文件支持在 Google Colab 中运行。
|
||||
</Tip>
|
||||
|
||||
检索增强生成(Retrieval-Augmented Generation,RAG)系统结合了数据检索和生成模型的能力,以提供上下文感知的响应。例如,用户的查询会被传递给搜索引擎,检索结果与查询一起提供给模型,模型随后根据查询和检索到的信息生成响应。
|
||||
|
||||
智能驱动的 RAG(Retrieval-Augmented Generation)通过**将自主智能体与动态知识检索相结合**,扩展了传统 RAG 系统。
|
||||
|
||||
传统 RAG 系统使用 LLM 根据检索数据回答查询,而智能驱动的 RAG **实现了对检索和生成流程的智能控制**,从而提高了效率和准确性。
|
||||
|
||||
传统 RAG 系统面临关键限制,例如**依赖单次检索步骤**,以及过度关注与用户查询的直接语义相似性,这可能会忽略相关信息。
|
||||
|
||||
智能驱动的 RAG 通过允许智能体自主制定搜索查询、评估检索结果并进行多次检索步骤,以生成更定制化和全面的输出,从而解决这些问题。
|
||||
|
||||
## 基于 DuckDuckGo 的基础检索
|
||||
|
||||
让我们构建一个能够使用 DuckDuckGo 进行网页搜索的简单智能体。该智能体将检索信息并综合响应来回答查询。通过智能驱动的 RAG,Alfred 的智能体可以:
|
||||
|
||||
* 搜索最新的超级英雄派对趋势
|
||||
* 优化结果以包含奢侈元素
|
||||
* 将信息综合成完整方案
|
||||
|
||||
以下是 Alfred 的智能体实现此功能的代码示例:
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel
|
||||
|
||||
# 初始化搜索工具
|
||||
search_tool = DuckDuckGoSearchTool()
|
||||
|
||||
# 初始化模型
|
||||
model = HfApiModel()
|
||||
|
||||
agent = CodeAgent(
|
||||
model=model,
|
||||
tools=[search_tool]
|
||||
)
|
||||
|
||||
# 使用示例
|
||||
response = agent.run(
|
||||
"Search for luxury superhero-themed party ideas, including decorations, entertainment, and catering."
|
||||
)
|
||||
print(response)
|
||||
```
|
||||
|
||||
智能体遵循以下流程:
|
||||
|
||||
1. **请求分析:** Alfred 的智能体识别查询的关键要素——重点关注装饰、娱乐和餐饮的豪华超级英雄主题派对规划
|
||||
2. **执行检索:** 智能体利用 DuckDuckGo 搜索最新相关信息,确保符合 Alfred 对奢侈活动的精细要求
|
||||
3. **信息综合:** 收集结果后,智能体将其处理为覆盖派对所有方面的可执行方案
|
||||
4. **未来参考存储:** 智能体存储检索信息以便后续活动规划时快速访问,优化后续任务效率
|
||||
|
||||
## 自定义知识库工具
|
||||
|
||||
对于专业任务,自定义知识库非常宝贵。让我们创建可以查询技术文档或专业知识的向量数据库工具。通过语义搜索,智能体可以找到与 Alfred 需求最相关的信息。
|
||||
|
||||
向量数据库(vector database)是通过专业 ML 模型实现丰富文档表示的集合,能够快速搜索和检索文档。
|
||||
|
||||
该方法将预定义知识与语义搜索相结合,为活动规划提供上下文感知解决方案。通过专业知识访问,Alfred 可以完善派对的每个细节。
|
||||
|
||||
在此示例中,我们将创建从自定义知识库检索派对策划创意的工具。使用 BM25 检索器搜索知识库并返回最佳结果,同时使用 `RecursiveCharacterTextSplitter` 将文档分割为更小的块以提高搜索效率:
|
||||
|
||||
```python
|
||||
from langchain.docstore.document import Document
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
||||
from smolagents import Tool
|
||||
from langchain_community.retrievers import BM25Retriever
|
||||
from smolagents import CodeAgent, HfApiModel
|
||||
|
||||
class PartyPlanningRetrieverTool(Tool):
|
||||
name = "party_planning_retriever"
|
||||
description = "Uses semantic search to retrieve relevant party planning ideas for Alfred’s superhero-themed party at Wayne Manor."
|
||||
inputs = {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "The query to perform. This should be a query related to party planning or superhero themes.",
|
||||
}
|
||||
}
|
||||
output_type = "string"
|
||||
|
||||
def __init__(self, docs, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
self.retriever = BM25Retriever.from_documents(
|
||||
docs, k=5 # 检索前 5 个文档
|
||||
)
|
||||
|
||||
def forward(self, query: str) -> str:
|
||||
assert isinstance(query, str), "Your search query must be a string"
|
||||
|
||||
docs = self.retriever.invoke(
|
||||
query,
|
||||
)
|
||||
return "\nRetrieved ideas:\n" + "".join(
|
||||
[
|
||||
f"\n\n===== Idea {str(i)} =====\n" + doc.page_content
|
||||
for i, doc in enumerate(docs)
|
||||
]
|
||||
)
|
||||
|
||||
# 模拟派对策划知识库
|
||||
party_ideas = [
|
||||
{"text": "A superhero-themed masquerade ball with luxury decor, including gold accents and velvet curtains.", "source": "Party Ideas 1"},
|
||||
{"text": "Hire a professional DJ who can play themed music for superheroes like Batman and Wonder Woman.", "source": "Entertainment Ideas"},
|
||||
{"text": "For catering, serve dishes named after superheroes, like 'The Hulk's Green Smoothie' and 'Iron Man's Power Steak.'", "source": "Catering Ideas"},
|
||||
{"text": "Decorate with iconic superhero logos and projections of Gotham and other superhero cities around the venue.", "source": "Decoration Ideas"},
|
||||
{"text": "Interactive experiences with VR where guests can engage in superhero simulations or compete in themed games.", "source": "Entertainment Ideas"}
|
||||
]
|
||||
|
||||
source_docs = [
|
||||
Document(page_content=doc["text"], metadata={"source": doc["source"]})
|
||||
for doc in party_ideas
|
||||
]
|
||||
|
||||
# 分割文档以提高搜索效率
|
||||
text_splitter = RecursiveCharacterTextSplitter(
|
||||
chunk_size=500,
|
||||
chunk_overlap=50,
|
||||
add_start_index=True,
|
||||
strip_whitespace=True,
|
||||
separators=["\n\n", "\n", ".", " ", ""],
|
||||
)
|
||||
docs_processed = text_splitter.split_documents(source_docs)
|
||||
|
||||
# 创建检索工具
|
||||
party_planning_retriever = PartyPlanningRetrieverTool(docs_processed)
|
||||
|
||||
# 初始化智能体
|
||||
agent = CodeAgent(tools=[party_planning_retriever], model=HfApiModel())
|
||||
|
||||
# 使用示例
|
||||
response = agent.run(
|
||||
"Find ideas for a luxury superhero-themed party, including entertainment, catering, and decoration options."
|
||||
)
|
||||
|
||||
print(response)
|
||||
```
|
||||
|
||||
增强后的智能体能够:
|
||||
1. 首先检查文档中的相关信息
|
||||
2. 结合知识库的洞察
|
||||
3. 在内存中维护对话上下文
|
||||
|
||||
## 增强的检索能力
|
||||
|
||||
构建智能驱动的 RAG 系统时,智能体可以采用以下高级策略:
|
||||
|
||||
1. **查询重构:** 智能体可以优化原始查询,生成更匹配目标文档的搜索词
|
||||
2. **多步检索:** 智能体可以进行多次搜索,利用初步结果优化后续查询
|
||||
3. **多源整合:** 结合来自网页搜索和本地文档等多个来源的信息
|
||||
4. **结果验证:** 在将检索内容纳入响应前分析其相关性和准确性
|
||||
|
||||
有效的智能驱动 RAG 系统需要仔细考虑几个关键方面。智能体**应根据查询类型和上下文选择可用工具**,记忆系统帮助维护对话历史避免重复检索,后备策略确保在主要检索方法失败时系统仍能提供价值,验证步骤则帮助确保检索信息的准确性和相关性。
|
||||
|
||||
## 资源
|
||||
|
||||
- [Agentic RAG: 使用查询重构和自查询加速您的 RAG 系统!🚀](https://huggingface.co/learn/cookbook/agent_rag) - 使用 smolagents 开发智能驱动 RAG 系统的实践指南
|
||||
70
units/zh-CN/unit2/smolagents/tool_calling_agents.mdx
Normal file
70
units/zh-CN/unit2/smolagents/tool_calling_agents.mdx
Normal file
@@ -0,0 +1,70 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/tool_calling_agents.ipynb"},
|
||||
]} />
|
||||
|
||||
# 将操作编写为代码片段或 JSON 结构
|
||||
|
||||
<Tip>
|
||||
您可以通过 <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/tool_calling_agents.ipynb" target="_blank">此 Notebook</a> 跟随代码实践,该文件支持在 Google Colab 中运行。
|
||||
</Tip>
|
||||
|
||||
工具调用智能体(Tool Calling Agents)是 `smolagents` 中提供的第二种智能体类型。与使用 Python 代码片段的代码智能体(Code Agents)不同,这类智能体**利用 LLM 提供商的内置工具调用能力**来生成 **JSON 结构**的工具调用指令。这是 OpenAI、Anthropic 等主流提供商采用的标准方法。
|
||||
|
||||
当 Alfred 想要搜索餐饮服务和派对创意时,`CodeAgent` 会生成并运行如下 Python 代码:
|
||||
|
||||
```python
|
||||
for query in [
|
||||
"Best catering services in Gotham City",
|
||||
"Party theme ideas for superheroes"
|
||||
]:
|
||||
print(web_search(f"Search for: {query}"))
|
||||
```
|
||||
|
||||
而 `ToolCallingAgent` 则会创建 JSON 结构:
|
||||
|
||||
```python
|
||||
[
|
||||
{"name": "web_search", "arguments": "Best catering services in Gotham City"},
|
||||
{"name": "web_search", "arguments": "Party theme ideas for superheroes"}
|
||||
]
|
||||
```
|
||||
|
||||
该 JSON 结构随后会被用于执行工具调用。
|
||||
|
||||
尽管 `smolagents` 主要专注于 `CodeAgents`(因为[它们整体表现更好](https://arxiv.org/abs/2402.01030)),但对于不需要变量处理或复杂工具调用的简单系统,`ToolCallingAgents` 仍然可以高效工作。
|
||||
|
||||

|
||||
|
||||
## 工具调用智能体的工作原理
|
||||
|
||||
工具调用智能体遵循与代码智能体相同的多步骤工作流程(详见[前一章节](./code_agents))。关键区别在于**操作结构方式**:智能体不再生成可执行代码,而是**生成指定工具名称和参数的 JSON 对象**,系统随后**解析这些指令**来执行相应工具。
|
||||
|
||||
## 示例:运行工具调用智能体
|
||||
|
||||
让我们重新审视 Alfred 开始筹备派对的示例,这次使用 `ToolCallingAgent` 来展示区别。我们将构建一个能够使用 DuckDuckGo 进行网页搜索的智能体,与代码智能体示例的唯一区别在于智能体类型——框架会处理其他所有细节:
|
||||
|
||||
```python
|
||||
from smolagents import ToolCallingAgent, DuckDuckGoSearchTool, HfApiModel
|
||||
|
||||
agent = ToolCallingAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel())
|
||||
|
||||
agent.run("Search for the best music recommendations for a party at the Wayne's mansion.")
|
||||
```
|
||||
|
||||
当查看智能体的执行跟踪时,您将看到类似以下内容而非 `Executing parsed code:`:
|
||||
|
||||
```text
|
||||
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
|
||||
│ Calling tool: 'web_search' with arguments: {'query': "best music recommendations for a party at Wayne's │
|
||||
│ mansion"} │
|
||||
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
|
||||
```
|
||||
|
||||
智能体生成结构化的工具调用指令,系统通过处理这些指令来生成输出,而非像 `CodeAgent` 那样直接执行代码。
|
||||
|
||||
现在我们已经了解两种智能体类型,可以根据需求选择合适的一种。让我们继续探索 `smolagents`,让 Alfred 的派对大获成功!🎉
|
||||
|
||||
## 资源
|
||||
- [ToolCallingAgent 文档](https://huggingface.co/docs/smolagents/v1.8.1/en/reference/agents#smolagents.ToolCallingAgent) - ToolCallingAgent 的官方文档
|
||||
270
units/zh-CN/unit2/smolagents/tools.mdx
Normal file
270
units/zh-CN/unit2/smolagents/tools.mdx
Normal file
@@ -0,0 +1,270 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/tools.ipynb"},
|
||||
]} />
|
||||
|
||||
# 工具
|
||||
|
||||
正如我们在[第一单元](https://huggingface.co/learn/agents-course/unit1/tools)所探讨的,智能体通过工具执行各类操作。在`smolagents`框架中,工具被视为 **LLM 可以在智能体系统中调用的函数**。
|
||||
|
||||
要使LLM能够调用工具,需要为其提供包含以下关键要素的**接口描述**:
|
||||
|
||||
- **名称**:工具的标识名称
|
||||
- **工具描述**:工具的功能说明
|
||||
- **输入类型及描述**:工具接受的参数说明
|
||||
- **输出类型**:工具的返回结果类型
|
||||
|
||||
以韦恩庄园筹备派对为例,Alfred 需要多种工具来收集信息——从搜索餐饮服务到寻找派对主题创意。以下是一个简单搜索工具的接口示例:
|
||||
|
||||
- **名称:** `web_search`
|
||||
- **工具描述:** 根据特定查询进行网络搜索
|
||||
- **输入:** `query` (字符串) - 需要查找的搜索关键词
|
||||
- **输出:** 包含搜索结果的字符串
|
||||
|
||||
通过使用这些工具,Alfred 能够做出明智决策并收集派对筹备所需的所有信息。
|
||||
|
||||
下方动画展示了工具调用的管理流程:
|
||||
|
||||

|
||||
|
||||
## 工具创建方法
|
||||
|
||||
在`smolagents`中,可以通过两种方式定义工具:
|
||||
1. **使用`@tool`装饰器**创建基于函数的简单工具
|
||||
2. **创建`Tool`的子类**实现复杂功能
|
||||
|
||||
### `@tool`装饰器
|
||||
|
||||
`@tool`装饰器是**定义简单工具的推荐方式**。在底层,smolagents 会从 Python 函数解析基本信息。因此,清晰的函数命名和规范的文档字符串(docstring)能让 LLM 更易理解工具用途。
|
||||
|
||||
使用此方法时,我们需要定义包含以下要素的函数:
|
||||
|
||||
- **明确描述性的函数名称**:帮助LLM理解其用途
|
||||
- **输入输出的类型提示**:确保正确使用
|
||||
- **详细描述**:包含明确描述各参数的`Args:`部分,这些描述为 LLM 提供关键上下文信息
|
||||
|
||||
#### 创建餐饮评分查询工具
|
||||
|
||||
<img src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/alfred-catering.jpg" alt="Alfred Catering"/>
|
||||
|
||||
<Tip>
|
||||
您可以通过 <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/tools.ipynb" target="_blank">此 Notebook</a> 跟随代码实践,该文件支持在 Google Colab 中运行。
|
||||
</Tip>
|
||||
|
||||
假设 Alfred 已确定派对菜单,但需要为大量宾客准备食物。为此,他希望雇佣餐饮服务并需要找到当地评分最高的选择。
|
||||
|
||||
以下是通过`@tool`装饰器实现该功能的示例:
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, HfApiModel, tool
|
||||
|
||||
# 假设我们有一个获取最高评分餐饮服务的函数
|
||||
@tool
|
||||
def catering_service_tool(query: str) -> str:
|
||||
"""
|
||||
This tool returns the highest-rated catering service in Gotham City.
|
||||
|
||||
Args:
|
||||
query: A search term for finding catering services.
|
||||
"""
|
||||
# 示例餐饮服务及评分列表
|
||||
services = {
|
||||
"Gotham Catering Co.": 4.9,
|
||||
"Wayne Manor Catering": 4.8,
|
||||
"Gotham City Events": 4.7,
|
||||
}
|
||||
|
||||
# 查找评分最高的餐饮服务(模拟搜索查询过滤)
|
||||
best_service = max(services, key=services.get)
|
||||
|
||||
return best_service
|
||||
|
||||
|
||||
agent = CodeAgent(tools=[catering_service_tool], model=HfApiModel())
|
||||
|
||||
# 运行智能体寻找最佳餐饮服务
|
||||
result = agent.run(
|
||||
"Can you give me the name of the highest-rated catering service in Gotham City?"
|
||||
)
|
||||
|
||||
print(result) # Output: Gotham Catering Co.
|
||||
```
|
||||
|
||||
### 通过Python类定义工具
|
||||
|
||||
此方法需要创建[`Tool`](https://huggingface.co/docs/smolagents/v1.8.1/en/reference/tools#smolagents.Tool)的子类。对于复杂工具,我们可以通过类封装函数及其元数据来帮助 LLM 理解使用方式。在类中需要定义:
|
||||
|
||||
- `name`: 工具名称
|
||||
- `description`: 用于构建智能体系统提示的描述
|
||||
- `inputs`: 包含`type`和`description`的字典,帮助Python解释器处理输入
|
||||
- `output_type`: 指定期望的输出类型
|
||||
- `forward`: 包含执行逻辑的方法
|
||||
|
||||
以下是通过`Tool`类构建工具并与`CodeAgent`集成的示例:
|
||||
|
||||
#### 创建超级英雄主题派对创意生成工具
|
||||
|
||||
Alfred 计划在庄园举办**超级英雄主题派对**,但需要独特创意让活动与众不同。作为完美管家,他希望用新颖主题给宾客带来惊喜。
|
||||
|
||||
为此,我们可以创建根据类别生成派对创意的工具,帮助 Alfred 找到最惊艳的主题方案:
|
||||
|
||||
```python
|
||||
from smolagents import Tool, CodeAgent, HfApiModel
|
||||
|
||||
class SuperheroPartyThemeTool(Tool):
|
||||
name = "superhero_party_theme_generator"
|
||||
description = """
|
||||
This tool suggests creative superhero-themed party ideas based on a category.
|
||||
It returns a unique party theme idea."""
|
||||
|
||||
inputs = {
|
||||
"category": {
|
||||
"type": "string",
|
||||
"description": "The type of superhero party (e.g., 'classic heroes', 'villain masquerade', 'futuristic Gotham').",
|
||||
}
|
||||
}
|
||||
|
||||
output_type = "string"
|
||||
|
||||
def forward(self, category: str):
|
||||
themes = {
|
||||
"classic heroes": "Justice League Gala: Guests come dressed as their favorite DC heroes with themed cocktails like 'The Kryptonite Punch'.",
|
||||
"villain masquerade": "Gotham Rogues' Ball: A mysterious masquerade where guests dress as classic Batman villains.",
|
||||
"futuristic Gotham": "Neo-Gotham Night: A cyberpunk-style party inspired by Batman Beyond, with neon decorations and futuristic gadgets."
|
||||
}
|
||||
|
||||
return themes.get(category.lower(), "Themed party idea not found. Try 'classic heroes', 'villain masquerade', or 'futuristic Gotham'.")
|
||||
|
||||
# 实例化工具
|
||||
party_theme_tool = SuperheroPartyThemeTool()
|
||||
agent = CodeAgent(tools=[party_theme_tool], model=HfApiModel())
|
||||
|
||||
# 运行智能体生成派对主题
|
||||
result = agent.run(
|
||||
"What would be a good superhero party idea for a 'villain masquerade' theme?"
|
||||
)
|
||||
|
||||
print(result) # Output: "Gotham Rogues' Ball: A mysterious masquerade where guests dress as classic Batman villains."
|
||||
```
|
||||
|
||||
借助此工具,Alfred 将成为终极超级管家,为宾客呈现难忘的超级英雄主题派对!🦸♂️🦸♀️
|
||||
|
||||
## 默认工具箱
|
||||
|
||||
`smolagents` 自带一组预构建工具,可直接注入到您的智能体中。[默认工具箱](https://huggingface.co/docs/smolagents/guided_tour?build-a-tool=Decorate+a+function+with+%40tool#default-toolbox) 包含:
|
||||
|
||||
- **PythonInterpreterTool**
|
||||
- **FinalAnswerTool**
|
||||
- **UserInputTool**
|
||||
- **DuckDuckGoSearchTool**
|
||||
- **GoogleSearchTool**
|
||||
- **VisitWebpageTool**
|
||||
|
||||
Alfred 可以使用多种工具来确保韦恩庄园的完美派对:
|
||||
|
||||
- 首先,他可以使用 `DuckDuckGoSearchTool` 搜索创意超级英雄主题派对灵感
|
||||
|
||||
- 对于餐饮,他依赖 `GoogleSearchTool` 查找哥谭市评分最高的服务
|
||||
|
||||
- 要管理座位安排,Alfred 可以通过 `PythonInterpreterTool` 运行计算
|
||||
|
||||
- 收集完所有信息后,他使用 `FinalAnswerTool` 整合计划
|
||||
|
||||
通过这些工具,Alfred 确保派对既出众又顺利。🦇💡
|
||||
|
||||
## 共享与导入工具
|
||||
|
||||
**smolagents** 最强大的功能之一是能够将自定义工具共享到 Hub 并无缝集成社区创建的工具。这包括与 **HF Spaces** 和 **LangChain 工具**的连接,显著增强了 Alfred 策划难忘韦恩庄园派对的能力。🎭
|
||||
|
||||
通过这些集成,Alfred 可以利用高级活动策划工具——无论是调整灯光营造完美氛围、为派对策划理想歌单,还是与哥谭市最优秀的餐饮服务商协调。
|
||||
|
||||
以下是展示这些功能如何提升派对体验的示例:
|
||||
|
||||
### 向 Hub 共享工具
|
||||
|
||||
与社区分享自定义工具非常简单!只需使用 `push_to_hub()` 方法将其上传到您的 Hugging Face 账户。
|
||||
|
||||
例如,Alfred 可以分享他的 `party_theme_tool` 以帮助其他人找到哥谭市最好的餐饮服务。具体操作如下:
|
||||
|
||||
```python
|
||||
party_theme_tool.push_to_hub("{your_username}/party_theme_tool", token="<YOUR_HUGGINGFACEHUB_API_TOKEN>")
|
||||
```
|
||||
|
||||
### 从 Hub 导入工具
|
||||
|
||||
您可以使用 `load_tool()` 函数轻松导入其他用户创建的工具。例如,Alfred 可能希望使用 AI 生成派对的宣传图片。无需从头构建工具,他可以直接使用社区预定义的方案:
|
||||
|
||||
```python
|
||||
from smolagents import load_tool, CodeAgent, HfApiModel
|
||||
|
||||
image_generation_tool = load_tool(
|
||||
"m-ric/text-to-image",
|
||||
trust_remote_code=True
|
||||
)
|
||||
|
||||
agent = CodeAgent(
|
||||
tools=[image_generation_tool],
|
||||
model=HfApiModel()
|
||||
)
|
||||
|
||||
agent.run("Generate an image of a luxurious superhero-themed party at Wayne Manor with made-up superheros.")
|
||||
```
|
||||
|
||||
### 将 Hugging Face Space 导入为工具
|
||||
|
||||
您可以使用 `Tool.from_space()` 将 HF Space 作为工具导入。这开启了与社区数千个 Space 集成的可能性,从图像生成到数据分析均可实现。
|
||||
|
||||
工具将通过 `gradio_client` 连接 Space 的后端,请确保已通过 `pip` 安装该依赖(如果尚未安装)。
|
||||
|
||||
对于本次派对,Alfred 可以使用现有的 HF Space 生成公告所需的 AI 图像(替代之前提到的预建工具)。让我们开始构建:
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, HfApiModel, Tool
|
||||
|
||||
image_generation_tool = Tool.from_space(
|
||||
"black-forest-labs/FLUX.1-schnell",
|
||||
name="image_generator",
|
||||
description="Generate an image from a prompt"
|
||||
)
|
||||
|
||||
model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct")
|
||||
|
||||
agent = CodeAgent(tools=[image_generation_tool], model=model)
|
||||
|
||||
agent.run(
|
||||
"Improve this prompt, then generate an image of it.",
|
||||
additional_args={'user_prompt': 'A grand superhero-themed party at Wayne Manor, with Alfred overseeing a luxurious gala'}
|
||||
)
|
||||
```
|
||||
|
||||
### 导入 LangChain 工具
|
||||
|
||||
|
||||
我们将在后续章节讨论 `LangChain` 框架。目前需要注意的是,您可以在 smolagents 工作流中复用 LangChain 工具!
|
||||
|
||||
您可以使用 `Tool.from_langchain()` 方法轻松加载 LangChain 工具。
|
||||
|
||||
追求完美的 Alfred 正在筹备一场盛大的超级英雄之夜活动(趁韦恩一家外出时),为确保每个细节都超出预期,他借助 LangChain 工具来寻找顶级的娱乐创意。
|
||||
|
||||
具体实现如下:
|
||||
|
||||
```python
|
||||
from langchain.agents import load_tools
|
||||
from smolagents import CodeAgent, HfApiModel, Tool
|
||||
|
||||
search_tool = Tool.from_langchain(load_tools(["serpapi"])[0])
|
||||
|
||||
agent = CodeAgent(tools=[search_tool], model=model)
|
||||
|
||||
agent.run("Search for luxury entertainment ideas for a superhero-themed event, such as live performances and interactive experiences.")
|
||||
```
|
||||
|
||||
通过此设置,Alfred 能快速发现高端娱乐选项,确保哥谭的精英宾客获得难忘体验。该工具帮助他策划韦恩庄园的完美超级英雄主题活动!🎉
|
||||
|
||||
## 资源
|
||||
|
||||
- [工具教程](https://huggingface.co/docs/smolagents/tutorials/tools) - 通过本教程学习如何高效使用工具
|
||||
- [工具文档](https://huggingface.co/docs/smolagents/v1.8.1/en/reference/tools) - 全面的工具参考文档
|
||||
- [工具使用导览](https://huggingface.co/docs/smolagents/v1.8.1/en/guided_tour#tools) - 逐步指导如何构建和使用工具
|
||||
- [构建高效智能体](https://huggingface.co/docs/smolagents/tutorials/building_good_agents) - 关于开发可靠高性能自定义函数智能体的最佳实践指南
|
||||
222
units/zh-CN/unit2/smolagents/vision_agents.mdx
Normal file
222
units/zh-CN/unit2/smolagents/vision_agents.mdx
Normal file
@@ -0,0 +1,222 @@
|
||||
<CourseFloatingBanner chapter={2}
|
||||
classNames="absolute z-10 right-0 top-0"
|
||||
notebooks={[
|
||||
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/agents-course/blob/main/notebooks/unit2/smolagents/vision_agents.ipynb"},
|
||||
]} />
|
||||
|
||||
# 使用 smolagents 构建视觉智能体
|
||||
|
||||
<Tip warning={true}>
|
||||
本节示例需要接入强大的视觉语言模型(VLM)。我们使用 GPT-4o API 进行了测试。
|
||||
若需了解 smolagents 和 Hugging Face 支持的其他替代方案,请参考<a href="./why_use_smolagents">为什么选择smolagents</a>章节。
|
||||
</Tip>
|
||||
|
||||
赋予智能体视觉能力对于超越文本处理的任务至关重要。网页浏览、文档理解等现实场景都需要解析丰富的视觉内容。smolagents 内置支持视觉语言模型(VLMs),使智能体能够有效处理图像信息。
|
||||
|
||||
假设韦恩庄园的管家 Alfred 需要核验派对嘉宾身份。考虑到他可能无法识别所有来宾,我们可以构建基于 VLM 的智能体,通过视觉信息检索来辅助身份验证决策。以下是具体实现:
|
||||
|
||||
|
||||
## 初始执行阶段提供图像
|
||||
|
||||
<Tip>
|
||||
配套代码可在<a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/vision_agents.ipynb" target="_blank">Google Colab 笔记本</a>中查看。
|
||||
</Tip>
|
||||
|
||||
该方法在智能体启动时通过 task_images 参数传入图像,智能体在执行过程中持续处理这些图像。
|
||||
|
||||
假设 Alfred 需要核验超级英雄身份,他已有历史派对嘉宾图像数据库。
|
||||
|
||||
当新访客到来时,智能体可通过图像比对进行准入决策。
|
||||
|
||||
当前场景中,Alfred 怀疑访客可能是小丑假扮的神奇女侠。我们需要构建身份验证系统:
|
||||
|
||||
```python
|
||||
from PIL import Image
|
||||
import requests
|
||||
from io import BytesIO
|
||||
|
||||
image_urls = [
|
||||
"https://upload.wikimedia.org/wikipedia/commons/e/e8/The_Joker_at_Wax_Museum_Plus.jpg", # 小丑图像
|
||||
"https://upload.wikimedia.org/wikipedia/en/9/98/Joker_%28DC_Comics_character%29.jpg" # 小丑图像
|
||||
]
|
||||
|
||||
images = []
|
||||
for url in image_urls:
|
||||
response = requests.get(url)
|
||||
image = Image.open(BytesIO(response.content)).convert("RGB")
|
||||
images.append(image)
|
||||
```
|
||||
|
||||
完成图像加载后,智能体将判断访客身份:究竟是超级英雄(Wonder Woman)还是反派角色(The Joker)。
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, OpenAIServerModel
|
||||
|
||||
model = OpenAIServerModel(model_id="gpt-4o")
|
||||
|
||||
# 实例化智能体
|
||||
agent = CodeAgent(
|
||||
tools=[],
|
||||
model=model,
|
||||
max_steps=20,
|
||||
verbosity_level=2
|
||||
)
|
||||
|
||||
response = agent.run(
|
||||
"""
|
||||
Describe the costume and makeup that the comic character in these photos is wearing and return the description.
|
||||
Tell me if the guest is The Joker or Wonder Woman.
|
||||
""",
|
||||
images=images
|
||||
)
|
||||
```
|
||||
|
||||
以下是我的运行结果(实际输出可能因环境差异有所不同,正如前文所述):
|
||||
|
||||
```python
|
||||
{
|
||||
'Costume and Makeup - First Image': (
|
||||
'Purple coat and a purple silk-like cravat or tie over a mustard-yellow shirt.',
|
||||
'White face paint with exaggerated features, dark eyebrows, blue eye makeup, red lips forming a wide smile.'
|
||||
),
|
||||
'Costume and Makeup - Second Image': (
|
||||
'Dark suit with a flower on the lapel, holding a playing card.',
|
||||
'Pale skin, green hair, very red lips with an exaggerated grin.'
|
||||
),
|
||||
'Character Identity': 'This character resembles known depictions of The Joker from comic book media.'
|
||||
}
|
||||
```
|
||||
|
||||
在这种情况下,输出结果揭示了这个人正在冒充他人,因此我们可以阻止 The Joker 进入派对!
|
||||
|
||||
## 提供动态检索图像
|
||||
|
||||
<Tip>
|
||||
您可以在 <a href="https://huggingface.co/agents-course/notebooks/blob/main/unit2/smolagents/vision_web_browser.py" target="_blank">这个 Python 文件</a> 中查看代码。
|
||||
</Tip>
|
||||
|
||||
前面的方法具有很高的价值,并且有许多潜在的应用场景。然而,在客人不在数据库中的情况下,我们需要探索其他识别方式。一种可能的解决方案是从外部来源动态检索图像和信息,例如通过浏览网页获取详细信息。
|
||||
|
||||
在此方法中,图像是在执行过程中动态添加到智能体的记忆中的。我们知道,`smolagents` 中的智能体基于 `MultiStepAgent` 类,该类是 ReAct 框架的抽象。此类以结构化的周期运行,在不同阶段记录各种变量和知识:
|
||||
|
||||
1. **SystemPromptStep:** 存储系统提示。
|
||||
2. **TaskStep:** 记录用户查询和提供的任何输入。
|
||||
3. **ActionStep:** 捕获智能体操作和结果的日志。
|
||||
|
||||
这种结构化的方法使智能体能够动态地结合视觉信息,并对不断变化的任务做出适应性响应。以下是已经见过的图表,展示了动态工作流程过程以及不同步骤如何在智能体生命周期内集成。在浏览时,智能体可以截取屏幕截图并将其保存为 `ActionStep` 中的 `observation_images`。
|
||||
|
||||

|
||||
|
||||
现在我们理解了需求,让我们构建完整的示例。在这种情况下,Alfred 希望完全控制访客验证过程,因此浏览详情成为可行的解决方案。为了完成这个示例,我们需要为智能体提供一组新的工具。此外,我们将使用 Selenium 和 Helium,这些是浏览器自动化工具。这将使我们能够构建一个探索网络、搜索潜在访客详情并检索验证信息的智能体。让我们安装所需的工具:
|
||||
|
||||
```bash
|
||||
pip install "smolagents[all]" helium selenium python-dotenv
|
||||
```
|
||||
|
||||
我们需要一组专为浏览设计的智能体工具,例如“search_item_ctrl_f”、“go_back”和“close_popups”。这些工具允许智能体像浏览网页的人一样行事。
|
||||
|
||||
```python
|
||||
@tool
|
||||
def search_item_ctrl_f(text: str, nth_result: int = 1) -> str:
|
||||
"""
|
||||
Searches for text on the current page via Ctrl + F and jumps to the nth occurrence.
|
||||
Args:
|
||||
text: The text to search for
|
||||
nth_result: Which occurrence to jump to (default: 1)
|
||||
"""
|
||||
elements = driver.find_elements(By.XPATH, f"//*[contains(text(), '{text}')]")
|
||||
if nth_result > len(elements):
|
||||
raise Exception(f"Match n°{nth_result} not found (only {len(elements)} matches found)")
|
||||
result = f"Found {len(elements)} matches for '{text}'."
|
||||
elem = elements[nth_result - 1]
|
||||
driver.execute_script("arguments[0].scrollIntoView(true);", elem)
|
||||
result += f"Focused on element {nth_result} of {len(elements)}"
|
||||
return result
|
||||
|
||||
|
||||
@tool
|
||||
def go_back() -> None:
|
||||
"""Goes back to previous page."""
|
||||
driver.back()
|
||||
|
||||
|
||||
@tool
|
||||
def close_popups() -> str:
|
||||
"""
|
||||
Closes any visible modal or pop-up on the page. Use this to dismiss pop-up windows! This does not work on cookie consent banners.
|
||||
"""
|
||||
webdriver.ActionChains(driver).send_keys(Keys.ESCAPE).perform()
|
||||
```
|
||||
|
||||
我们还需要保存屏幕截图的功能,因为这是我们的 VLM 智能体完成任务时必不可少的一部分。此功能会捕获屏幕截图并将其保存在 `step_log.observations_images = [image.copy()]` 中,从而允许智能体在导航时动态存储和处理图像。
|
||||
|
||||
```python
|
||||
def save_screenshot(step_log: ActionStep, agent: CodeAgent) -> None:
|
||||
sleep(1.0) # 让 JavaScript 动画在截图之前完成
|
||||
driver = helium.get_driver()
|
||||
current_step = step_log.step_number
|
||||
if driver is not None:
|
||||
for step_logs in agent.logs: # 从日志中删除先前的截图以进行精简处理
|
||||
if isinstance(step_log, ActionStep) and step_log.step_number <= current_step - 2:
|
||||
step_logs.observations_images = None
|
||||
png_bytes = driver.get_screenshot_as_png()
|
||||
image = Image.open(BytesIO(png_bytes))
|
||||
print(f"Captured a browser screenshot: {image.size} pixels")
|
||||
step_log.observations_images = [image.copy()] # 创建副本以确保其持久保存,重要!!
|
||||
|
||||
# 使用当前 URL 更新观察结果
|
||||
url_info = f"Current url: {driver.current_url}"
|
||||
step_log.observations = url_info if step_logs.observations is None else step_log.observations + "\n" + url_info
|
||||
return
|
||||
```
|
||||
|
||||
此函数作为 `step_callback` 传递给智能体,因为它在智能体执行的每一步结束时被触发。这使得智能体能够在整个过程中动态捕获和存储屏幕截图。
|
||||
|
||||
现在,我们可以生成用于浏览网页的视觉智能体,为其提供我们创建的工具,以及 `DuckDuckGoSearchTool` 以探索网页。此工具将帮助智能体根据视觉线索检索验证访客身份所需的信息。
|
||||
|
||||
```python
|
||||
from smolagents import CodeAgent, OpenAIServerModel, DuckDuckGoSearchTool
|
||||
model = OpenAIServerModel(model_id="gpt-4o")
|
||||
|
||||
agent = CodeAgent(
|
||||
tools=[DuckDuckGoSearchTool(), go_back, close_popups, search_item_ctrl_f],
|
||||
model=model,
|
||||
additional_authorized_imports=["helium"],
|
||||
step_callbacks=[save_screenshot],
|
||||
max_steps=20,
|
||||
verbosity_level=2,
|
||||
)
|
||||
```
|
||||
|
||||
有了这些,Alfred 准备检查访客的身份,并根据这些信息做出是否允许他们进入派对的明智决定:
|
||||
|
||||
```python
|
||||
agent.run("""
|
||||
I am Alfred, the butler of Wayne Manor, responsible for verifying the identity of guests at party. A superhero has arrived at the entrance claiming to be Wonder Woman, but I need to confirm if she is who she says she is.
|
||||
|
||||
Please search for images of Wonder Woman and generate a detailed visual description based on those images. Additionally, navigate to Wikipedia to gather key details about her appearance. With this information, I can determine whether to grant her access to the event.
|
||||
""" + helium_instructions)
|
||||
```
|
||||
|
||||
您可以看到,我们将 `helium_instructions` 作为任务的一部分包含在内。这个特殊的提示旨在控制智能体的导航,确保它在浏览网页时遵循正确的步骤。
|
||||
|
||||
让我们看看这在下面的视频中是如何工作的:
|
||||
|
||||
<video controls>
|
||||
<source src="https://huggingface.co/datasets/agents-course/course-images/resolve/main/en/unit2/smolagents/VisionBrowserAgent.mp4" type="video/mp4">
|
||||
</video>
|
||||
|
||||
这是最终输出:
|
||||
|
||||
```python
|
||||
Final answer: Wonder Woman is typically depicted wearing a red and gold bustier, blue shorts or skirt with white stars, a golden tiara, silver bracelets, and a golden Lasso of Truth. She is Princess Diana of Themyscira, known as Diana Prince in the world of men.
|
||||
```
|
||||
|
||||
通过这些步骤,我们成功地为派对创建了一个身份验证系统! Alfred 现在拥有必要的工具,可以确保只有正确的宾客能够进入庄园。一切准备就绪,可以享受在韦恩庄园的美好时光!
|
||||
|
||||
|
||||
## 进一步阅读
|
||||
|
||||
- [我们让 smolagents 有了视觉能力](https://huggingface.co/blog/smolagents-can-see) - 博客文章描述了视觉智能体的功能。
|
||||
- [使用智能体进行网页浏览 🤖🌐](https://huggingface.co/docs/smolagents/examples/web_browser) - 使用视觉智能体进行网页浏览的示例。
|
||||
- [网页浏览视觉智能体示例](https://github.com/huggingface/smolagents/blob/main/src/smolagents/vision_web_browser.py) - 使用视觉智能体进行网页浏览的示例。
|
||||
67
units/zh-CN/unit2/smolagents/why_use_smolagents.mdx
Normal file
67
units/zh-CN/unit2/smolagents/why_use_smolagents.mdx
Normal file
@@ -0,0 +1,67 @@
|
||||

|
||||
# 为什么选择 smolagents
|
||||
|
||||
在本模块中,我们将探讨使用 [smolagents](https://huggingface.co/docs/smolagents/en/index) 的优缺点,帮助您做出明智的决策,判断它是否是满足您需求的正确框架。
|
||||
|
||||
## 什么是 `smolagents`?
|
||||
|
||||
`smolagents` 是一个简单而强大的框架,用于构建 AI 智能体。它为 LLM 提供了与现实世界互动的能力,例如搜索或生成图像。
|
||||
|
||||
正如我们在第 1 单元中学到的,AI 智能体是使用 LLM 基于 **'观察'** 生成 **'思考'** 并执行 **'操作'** 的程序。接下来我们来探讨这在 smolagents 中是如何实现的。
|
||||
|
||||
### `smolagents` 的关键优势
|
||||
- **简洁性:** 最小的代码复杂性和抽象层,使框架易于理解、采用和扩展。
|
||||
- **灵活的 LLM 支持:** 通过与 Hugging Face 工具和外部 API 的集成,支持任何 LLM。
|
||||
- **代码优先方法:** 首选支持直接在代码中编写操作的 Code Agents,无需解析并简化工具调用。
|
||||
- **HF Hub 集成:** 与 Hugging Face Hub 无缝集成,允许使用 Gradio Spaces 作为工具。
|
||||
|
||||
### 何时使用 smolagents?
|
||||
|
||||
考虑到这些优势,我们应该在什么情况下选择 smolagents 而不是其他框架?
|
||||
|
||||
smolagents 在以下情况下是最理想的:
|
||||
- 您需要一个 **轻量级且最小化的解决方案**。
|
||||
- 您希望 **快速实验** 而无需复杂的配置。
|
||||
- 您的应用逻辑 **相对简单**。
|
||||
|
||||
### 代码 vs. JSON 操作
|
||||
与其他框架中的智能体以 JSON 形式编写操作不同,`smolagents` **专注于代码中的工具调用**,简化了执行过程。这是因为无需解析 JSON 来构建调用工具的代码:输出可以直接执行。
|
||||
|
||||
下图展示了这种差异:
|
||||
|
||||

|
||||
|
||||
要回顾代码与 JSON 操作之间的区别,您可以重新访问 [第 1 单元的操作部分](https://huggingface.co/learn/agents-course/unit1/actions#actions-enabling-the-agent-to-engage-with-its-environment)。
|
||||
|
||||
### `smolagents` 中的智能体类型
|
||||
|
||||
`smolagents` 中的智能体作为 **多步骤智能体** 运行。
|
||||
|
||||
每个 [`MultiStepAgent`](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.MultiStepAgent) 执行:
|
||||
- 一次思考
|
||||
- 一次工具调用和执行
|
||||
|
||||
除了使用 **[CodeAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.CodeAgent)** 作为主要类型的智能体外,smolagents 还支持 **[ToolCallingAgent](https://huggingface.co/docs/smolagents/main/en/reference/agents#smolagents.ToolCallingAgent)**,后者以 JSON 形式编写工具调用。
|
||||
|
||||
我们将在接下来的部分中更详细地探讨每种智能体类型。
|
||||
|
||||
<Tip>
|
||||
在 smolagents 中,工具是使用 <code>@tool</code> 装饰器包装 Python 函数或 <code>Tool</code> 类定义的。
|
||||
</Tip>
|
||||
|
||||
### `smolagents` 中的模型集成
|
||||
`smolagents` 支持灵活的 LLM 集成,允许使用符合 [某些标准](https://huggingface.co/docs/smolagents/main/en/reference/models) 的任何可调用模型。该框架提供了多个预定义类以简化模型连接:
|
||||
|
||||
- **[TransformersModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.TransformersModel):** 实现本地 `transformers` 管道以实现无缝集成。
|
||||
- **[HfApiModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.HfApiModel):** 通过 [Hugging Face 的基础设施](https://huggingface.co/docs/api-inference/index) 或越来越多的 [第三方推理提供商](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#supported-providers-and-tasks) 支持 [无服务器推理](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) 调用。
|
||||
- **[LiteLLMModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.LiteLLMModel):** 利用 [LiteLLM](https://www.litellm.ai/) 实现轻量级模型交互。
|
||||
- **[OpenAIServerModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.OpenAIServerModel):** 连接到提供 OpenAI API 接口的任何服务。
|
||||
- **[AzureOpenAIServerModel](https://huggingface.co/docs/smolagents/main/en/reference/models#smolagents.AzureOpenAIServerModel):** 支持与任何 Azure OpenAI 部署集成。
|
||||
|
||||
这种灵活性确保开发人员可以选择最适合其特定用例的模型和服务,并允许轻松进行实验。
|
||||
|
||||
现在我们已经了解了何时以及为何使用 smolagents,让我们深入探讨这个强大的库吧!
|
||||
|
||||
## 资源
|
||||
|
||||
- [smolagents 博客](https://huggingface.co/blog/smolagents) - 关于 smolagents 和代码交互的介绍
|
||||
Reference in New Issue
Block a user