mirror of
https://github.com/Jonathan-Adly/AgentRun.git
synced 2024-06-02 16:24:31 +03:00
complete
This commit is contained in:
84
README.md
84
README.md
@@ -1,4 +1,4 @@
|
||||
# Agentrun: Run AI Generated Code Safely
|
||||
# AgentRun: Run AI Generated Code Safely
|
||||
|
||||
[](https://pypi.org/project/agentrun/)
|
||||
[](https://github.com/jonathan-adly/agentrun/actions/workflows/test.yml)
|
||||
@@ -6,9 +6,9 @@
|
||||
[](https://github.com/jonathan-adly/agentrun/blob/main/LICENSE)
|
||||
[](https://twitter.com/Jonathan_Adly_)
|
||||
|
||||
Agentrun is a Python library that makes it easy to run Python code safely from large language models (LLMs) with a single line of code. Built on top of the Docker Python SDK and RestrictedPython, it provides a simple, transparent, and user-friendly API to manage isolated code execution.
|
||||
AgentRun is a Python library that makes it easy to run Python code safely from large language models (LLMs) with a single line of code. Built on top of the Docker Python SDK and RestrictedPython, it provides a simple, transparent, and user-friendly API to manage isolated code execution.
|
||||
|
||||
Agentrun automatically installs and uninstalls dependencies, limits resource consumption, checks code safety, and sets execution timeouts. It has 97% test coverage with full static typing and only two dependencies.
|
||||
AgentRun automatically installs and uninstalls dependencies with optional caching, limits resource consumption, checks code safety, and sets execution timeouts. It has 97% test coverage with full static typing and only two dependencies.
|
||||
|
||||
## Why?
|
||||
|
||||
@@ -27,46 +27,27 @@ This package gives code execution ability to **any LLM** in a single line of cod
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Safe code execution**: Agentrun checks the generated code for dangerous elements before execution
|
||||
- **Safe code execution**: AgentRun checks the generated code for dangerous elements before execution
|
||||
- **Isolated Environment**: Code is executed in a fully isolated docker container
|
||||
- **Configurable Resource Management**: You can set how much compute resources the code can consume, with sane defaults
|
||||
- **Timeouts**: Set time limits on how long a script can take to run
|
||||
- **Dependency Management**: Complete control on what dependencies are allowed to install
|
||||
- **Automatic Cleanups**: Agentrun cleans any artifacts created by the generated code.
|
||||
- **Comes with a REST API**: Hate setting up docker? Agentrun comes with already configured docker setup for self-hosting.
|
||||
- **Transparent Exception Handling**: Agentrun returns the same exact output as running Python in your system - exceptions and tracebacks included. No cryptic docker messages.
|
||||
- **Dependency Caching**: AgentRun gives you the ability to cache any dependency in advance in the docker container to optimize performance.
|
||||
- **Automatic Cleanups**: AgentRun cleans any artifacts created by the generated code.
|
||||
- **Comes with a REST API**: Hate setting up docker? AgentRun comes with already configured docker setup for self-hosting.
|
||||
- **Transparent Exception Handling**: AgentRun returns the same exact output as running Python in your system - exceptions and tracebacks included. No cryptic docker messages.
|
||||
|
||||
If you want to use your own Docker configuration, install this package with pip and simply initialize Agentrun with a running Docker container. Additionally, you can use an already configured Docker Compose setup and API that is ready for self-hosting by cloning this repo.
|
||||
If you want to use your own Docker configuration, install this package with pip and simply initialize AgentRun with a running Docker container. Additionally, you can use an already configured Docker Compose setup and API that is ready for self-hosting by cloning this repo.
|
||||
|
||||
Unless you are comfortable with Docker, **we highly recommend using the REST API with the already configured Docker as a standalone service.**
|
||||
|
||||
|
||||
## Get Started in Minutes
|
||||
|
||||
There are two ways to use Agentrun, depending on your needs: with pip for your own Docker setup, or directly as a REST API as a standalone service (recommended).
|
||||
There are two ways to use AgentRun, depending on your needs: with pip for your own Docker setup, or directly as a REST API as a standalone service (recommended).
|
||||
|
||||
1. Install Agentrun with a single command via pip (you will need to configure your own Docker setup):
|
||||
1. **REST API**: Clone this repository and start immediately with a standalone REST API.
|
||||
|
||||
```bash
|
||||
pip install agentrun
|
||||
```
|
||||
|
||||
Now, let's see AgentRun in action with a simple example:
|
||||
|
||||
```Python
|
||||
from agentrun import AgentRun
|
||||
|
||||
runner = AgentRun(container_name="my_container") # container should be running
|
||||
code_from_llm = get_code_from_llm(prompt) # "print('hello, world!')"
|
||||
|
||||
result = runner.execute_code_in_container(code_from_llm)
|
||||
print(result)
|
||||
#> "Hello, world!"
|
||||
```
|
||||
|
||||
Worried about spinning up Docker containers? No problem.
|
||||
|
||||
2. Clone this repository and start immediately with a standalone REST API:
|
||||
```bash
|
||||
git clone https://github.com/Jonathan-Adly/agentrun
|
||||
cd agentrun/agentrun-api
|
||||
@@ -95,6 +76,26 @@ Or if you prefer the terminal.
|
||||
|
||||
`curl -X POST http://localhost:8000/v1/run/ -H "Content-Type: application/json" -d '{"code": "print(\'hello, world!\')"}'`
|
||||
|
||||
2. Install AgentRun with a single command via pip (you will need to configure your own Docker setup):
|
||||
|
||||
```bash
|
||||
pip install agentrun
|
||||
```
|
||||
|
||||
Now, let's see AgentRun in action with a simple example:
|
||||
|
||||
```Python
|
||||
from agentrun import AgentRun
|
||||
|
||||
runner = AgentRun(container_name="my_container") # container should be running
|
||||
code_from_llm = get_code_from_llm(prompt) # "print('hello, world!')"
|
||||
|
||||
result = runner.execute_code_in_container(code_from_llm)
|
||||
print(result)
|
||||
#> "Hello, world!"
|
||||
```
|
||||
|
||||
|
||||
|
||||
Difference | Python Package | REST API |
|
||||
--------- | -------------- | ----------- |
|
||||
@@ -108,11 +109,11 @@ Customize | Fully | Partially |
|
||||
|
||||
## Usage
|
||||
|
||||
Now, let's see AgentRun in action with something more complicated. We will take advantage of function calling and agentrun, to have LLMs write and execute code on the fly to solve arbitrary tasks. You can find the full code under `examples/`
|
||||
Now, let's see AgentRun in action with something more complicated. We will take advantage of function calling and AgentRun, to have LLMs write and execute code on the fly to solve arbitrary tasks. You can find the full code under `examples/`
|
||||
|
||||
First, we will install the needed packages. We are using mixtral here via groq to keep things fast and with minimal depenencies, but agentrun works with any LLM out of the box. All what's required is for the LLM to return a code snippet.
|
||||
First, we will install the needed packages. We are using mixtral here via groq to keep things fast and with minimal depenencies, but AgentRun works with any LLM out of the box. All what's required is for the LLM to return a code snippet.
|
||||
|
||||
> FYI: OpenAI assistant tool `code_interpreter` can execute code. Agentrun is a transparent, open-source version that can work with any LLM.
|
||||
> FYI: OpenAI assistant tool `code_interpreter` can execute code. AgentRun is a transparent, open-source version that can work with any LLM.
|
||||
|
||||
```bash
|
||||
!pip install groq
|
||||
@@ -138,7 +139,7 @@ def execute_python_code(code: str) -> str:
|
||||
|
||||
Next, we will setup our LLM function calling skeleton code. We need:
|
||||
|
||||
1. An LLM client such Groq or OpenAI or Anthropic (alternatively, you can use liteLLm as wrapper)
|
||||
1. An LLM client such Groq or OpenAI or Anthropic (alternatively, you can use litellm as wrapper)
|
||||
2. The model you will use
|
||||
3. Our code execution tool - that encourages the LLM model to send us python code to execute reliably
|
||||
|
||||
@@ -188,7 +189,7 @@ def chat_completion_request(messages, tools=None, tool_choice=None, model=GPT_MO
|
||||
return e
|
||||
```
|
||||
|
||||
Finally, we will set up a function that takes the user query and returns an answer. Using Agentrun to execute code when the LLM determines code execution is necesary to answer the question
|
||||
Finally, we will set up a function that takes the user query and returns an answer. Using AgentRun to execute code when the LLM determines code execution is necesary to answer the question
|
||||
|
||||
```python
|
||||
def get_answer(query):
|
||||
@@ -245,7 +246,7 @@ average_move = moves.mean()
|
||||
print(f'{average_move:.2f}')
|
||||
```
|
||||
|
||||
That code was sent to agentrun, which outputted:
|
||||
That code was sent to AgentRun, which outputted:
|
||||
`'\r[*********************100%%**********************] 1 of 1 completed\n2.39'`
|
||||
|
||||
Lastly, the output was sent to the LLM again to make human friendly. Giving us the final answer: $2.39
|
||||
@@ -254,10 +255,14 @@ Lastly, the output was sent to the LLM again to make human friendly. Giving us t
|
||||
|
||||
## Customize
|
||||
|
||||
Agentrun has sane defaults, but totally customizable. You can change:
|
||||
AgentRun has sane defaults, but totally customizable. You can change:
|
||||
|
||||
1. dependencies_whitelist - by default any thing that can be pip installed is allowable.
|
||||
2. cpu_quota - the default is 50000. Here is GPT-4 explaining what does that mean.
|
||||
2. cached_dependencies - these are dependencies that are installed on the image on initialization, and stay there until the image is brought down. `[]` by default.
|
||||
|
||||
> It will take longer to initialize the image with cached_dependencies, however subsequent runs using those dependencies would be a lot faster.
|
||||
|
||||
3. cpu_quota - the default is 50000. Here is GPT-4 explaining what does that mean.
|
||||
|
||||
> In Docker SDK, the cpu_quota parameter is used to limit CPU usage for a container.
|
||||
> The value of cpu_quota specifies the amount of CPU time that the container is allowed to use in microseconds per scheduling period.
|
||||
@@ -290,6 +295,7 @@ runner = AgentRun(
|
||||
container_name="my_container",
|
||||
# only allowed to pip install requests
|
||||
dependencies_whitelist = ["requests"], # [] = no dependencies
|
||||
cached_dependencies = ["requests"],
|
||||
# 3 minutes timeout
|
||||
default_timeout = 3 * 60,
|
||||
# how much RAM can the script use
|
||||
@@ -307,7 +313,7 @@ print(result)
|
||||
|
||||
## Benchmarks
|
||||
|
||||
Agentrun Median execution time is ~220ms without dependencies. Dependency installing is usually the bottleneck and depends on the size of package and if the package has many dependencies.
|
||||
AgentRun Median execution time is ~220ms without dependencies. Dependency installing is usually the bottleneck and depends on the size of package and if the package has many dependencies.
|
||||
|
||||

|
||||
|
||||
|
||||
@@ -37,14 +37,18 @@ class AgentRun:
|
||||
def __init__(
|
||||
self,
|
||||
container_name,
|
||||
dependencies_whitelist=["*"],
|
||||
cached_dependencies=[],
|
||||
dependencies_whitelist=None,
|
||||
cached_dependencies=None,
|
||||
cpu_quota=50000,
|
||||
default_timeout=20,
|
||||
memory_limit="100m",
|
||||
memswap_limit="512m",
|
||||
client=None,
|
||||
):
|
||||
if dependencies_whitelist is None:
|
||||
dependencies_whitelist = ["*"]
|
||||
if cached_dependencies is None:
|
||||
cached_dependencies = []
|
||||
|
||||
self.cpu_quota = cpu_quota
|
||||
self.default_timeout = default_timeout
|
||||
@@ -55,22 +59,69 @@ class AgentRun:
|
||||
# this is to allow a mock client to be passed in for testing if docker is not available (not implemented yet)
|
||||
self.client = client or docker.from_env()
|
||||
self.cached_dependencies = cached_dependencies
|
||||
for dep in self.cached_dependencies:
|
||||
self.dependencies_whitelist.append(dep)
|
||||
self.dependencies_whitelist = list(set(self.dependencies_whitelist))
|
||||
# install the cached dependencies in the container in a separate thread
|
||||
thread = Thread(
|
||||
target=self.install_dependencies,
|
||||
args=(
|
||||
self.client.containers.get(self.container_name),
|
||||
self.cached_dependencies,
|
||||
),
|
||||
|
||||
try:
|
||||
self.client = client or docker.from_env()
|
||||
self.client.ping()
|
||||
except docker.errors.DockerException as e:
|
||||
raise RuntimeError(
|
||||
f"Failed to connect to Docker daemon. Please make sure Docker is running. {e}"
|
||||
)
|
||||
thread.start()
|
||||
|
||||
try:
|
||||
container = self.client.containers.get(self.container_name)
|
||||
if container.status != "running":
|
||||
raise ValueError(f"Container {self.container_name} is not running.")
|
||||
except docker.errors.NotFound:
|
||||
raise ValueError(f"Container {self.container_name} not found.")
|
||||
|
||||
if (
|
||||
not self.is_everything_whitelisted()
|
||||
and not self.validate_cached_dependencies()
|
||||
):
|
||||
raise ValueError("Some cached dependencies are not in the whitelist.")
|
||||
|
||||
if self.cached_dependencies:
|
||||
self.install_cached_dependencies()
|
||||
|
||||
class CommandTimeout(Exception):
|
||||
"""Exception raised when a command execution times out."""
|
||||
|
||||
pass
|
||||
|
||||
def is_everything_whitelisted(self) -> bool:
|
||||
"""
|
||||
Check if everything is whitelisted.
|
||||
|
||||
Returns:
|
||||
bool: True if everything is whitelisted, False otherwise.
|
||||
"""
|
||||
return "*" in self.dependencies_whitelist
|
||||
|
||||
def validate_cached_dependencies(self) -> bool:
|
||||
"""
|
||||
Validates the cached dependencies against the whitelist.
|
||||
|
||||
Returns:
|
||||
bool: True if all cached dependencies are whitelisted, False otherwise.
|
||||
"""
|
||||
if self.is_everything_whitelisted():
|
||||
return True
|
||||
return all(
|
||||
dep in self.dependencies_whitelist for dep in self.cached_dependencies
|
||||
)
|
||||
|
||||
def install_cached_dependencies(self) -> None:
|
||||
"""
|
||||
Attempts to install cached dependencies into the specified Docker container.
|
||||
Raises:
|
||||
ValueError: If the dependencies could not be successfully installed.
|
||||
"""
|
||||
container = self.client.containers.get(self.container_name)
|
||||
output = self.install_dependencies(container, self.cached_dependencies)
|
||||
if output != "Dependencies installed successfully.":
|
||||
raise ValueError(output)
|
||||
|
||||
def execute_command_in_container(
|
||||
self, container: Container, cmd: str, timeout: int
|
||||
) -> tuple[Any | None, Any | str]:
|
||||
@@ -237,7 +288,7 @@ class AgentRun:
|
||||
Success message or error message
|
||||
|
||||
"""
|
||||
everything_whitelisted = "*" in self.dependencies_whitelist
|
||||
everything_whitelisted = self.is_everything_whitelisted()
|
||||
|
||||
# Perform a pre-check to ensure all dependencies are in the whitelist (or everything is whitelisted)
|
||||
if not everything_whitelisted:
|
||||
@@ -245,9 +296,8 @@ class AgentRun:
|
||||
if dep not in self.dependencies_whitelist:
|
||||
return f"Dependency: {dep} is not in the whitelist."
|
||||
|
||||
exit_code, output = self.execute_command_in_container(
|
||||
container, "pip list", timeout=3
|
||||
)
|
||||
exec_log = container.exec_run(cmd="pip list", workdir="/code")
|
||||
exit_code, output = exec_log.exit_code, exec_log.output.decode("utf-8")
|
||||
installed_packages = output.splitlines()
|
||||
installed_packages = [
|
||||
line.split()[0].lower() for line in installed_packages if " " in line
|
||||
@@ -354,10 +404,8 @@ class AgentRun:
|
||||
if not safe:
|
||||
return safety_message
|
||||
|
||||
try:
|
||||
container = client.containers.get(self.container_name)
|
||||
except docker.errors.NotFound:
|
||||
return f"Container with name {self.container_name} not found."
|
||||
container = client.containers.get(self.container_name)
|
||||
|
||||
# update the container with the new limits
|
||||
container.update(
|
||||
cpu_quota=self.cpu_quota,
|
||||
@@ -367,12 +415,11 @@ class AgentRun:
|
||||
# Copy the code to the container
|
||||
exec_result = self.copy_code_to_container(container, python_code)
|
||||
successful_copy = exec_result["success"]
|
||||
copy_message = exec_result["message"]
|
||||
message = exec_result["message"]
|
||||
if not successful_copy:
|
||||
copy_message = exec_result["message"]
|
||||
return copy_message
|
||||
return message
|
||||
|
||||
script_name = copy_message
|
||||
script_name = message
|
||||
|
||||
# Install dependencies in the container
|
||||
dependencies = self.parse_dependencies(python_code)
|
||||
|
||||
@@ -158,44 +158,47 @@ def test_parse_dependencies(code, expected, docker_container):
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"code, expected, whitelist",
|
||||
"code, expected, whitelist, cached",
|
||||
[
|
||||
# dependencies: arrow, open whitelist
|
||||
(
|
||||
"import arrow\nfixed_date = arrow.get('2023-04-15T12:00:00')\nprint(fixed_date.format('YYYY-MM-DD HH:mm:ss'))",
|
||||
"2023-04-15 12:00:00\n",
|
||||
["*"],
|
||||
["requests"],
|
||||
),
|
||||
# dependencies: numpy, but not in the whitelist
|
||||
(
|
||||
"import numpy as np\nprint(np.array([1, 2, 3]))",
|
||||
"Dependency: numpy is not in the whitelist.",
|
||||
["pandas"],
|
||||
[],
|
||||
),
|
||||
# python built-in
|
||||
(
|
||||
"import math\nprint(math.sqrt(16))",
|
||||
"4.0\n",
|
||||
["requests"],
|
||||
),
|
||||
("import math\nprint(math.sqrt(16))", "4.0\n", ["requests"], []),
|
||||
# dependencies: requests, in the whitelist
|
||||
(
|
||||
"import numpy as np\nprint(np.array([1, 2, 3]))",
|
||||
"[1 2 3]\n",
|
||||
["numpy"],
|
||||
["numpy"],
|
||||
),
|
||||
# a dependency that doesn't exist
|
||||
(
|
||||
"import unknownpackage",
|
||||
"Failed to install dependency unknownpackage",
|
||||
["*"],
|
||||
[],
|
||||
),
|
||||
],
|
||||
)
|
||||
def test_execute_code_with_dependencies(code, expected, whitelist, docker_container):
|
||||
def test_execute_code_with_dependencies(
|
||||
code, expected, whitelist, cached, docker_container
|
||||
):
|
||||
runner = AgentRun(
|
||||
container_name=docker_container.name,
|
||||
dependencies_whitelist=whitelist,
|
||||
cached_dependencies=cached,
|
||||
)
|
||||
output = runner.execute_code_in_container(code)
|
||||
assert output == expected
|
||||
@@ -219,37 +222,93 @@ def test_execute_code_in_container(code, expected, docker_container):
|
||||
assert output == expected
|
||||
|
||||
|
||||
# test with wrong container name
|
||||
def test_execute_code_in_container_with_wrong_container_name():
|
||||
runner = AgentRun(
|
||||
container_name="wrong-container-name",
|
||||
)
|
||||
output = runner.execute_code_in_container("print('Hello, World!')")
|
||||
assert output == "Container with name wrong-container-name not found."
|
||||
def test_init_with_wrong_container_name(docker_container):
|
||||
with pytest.raises(ValueError) as excinfo:
|
||||
runner = AgentRun(container_name="wrong-container-name")
|
||||
|
||||
assert "Container wrong-container-name not found" in str(excinfo.value)
|
||||
|
||||
|
||||
def execute_code_in_container_benchmark(docker_container, code):
|
||||
runner = AgentRun(
|
||||
container_name=docker_container.name,
|
||||
)
|
||||
def test_init_with_stopped_container(docker_container):
|
||||
# stop the docker_container
|
||||
docker_container.stop()
|
||||
with pytest.raises(ValueError) as excinfo:
|
||||
runner = AgentRun(container_name=docker_container.name)
|
||||
|
||||
assert f"Container {docker_container.name} is not running."
|
||||
docker_container.start()
|
||||
|
||||
|
||||
def test_init_with_docker_not_running():
|
||||
from unittest.mock import Mock, patch
|
||||
|
||||
# Create a mock client that raises an exception when ping is called
|
||||
with patch("docker.DockerClient") as MockClient:
|
||||
mock_client = MockClient.return_value
|
||||
mock_client.ping.side_effect = docker.errors.DockerException(
|
||||
"Docker daemon not available"
|
||||
)
|
||||
|
||||
# Test that initializing AgentRun with this mock client raises ValueError
|
||||
with pytest.raises(RuntimeError) as excinfo:
|
||||
runner = AgentRun(container_name="any-name", client=mock_client)
|
||||
|
||||
assert (
|
||||
"Failed to connect to Docker daemon. Please make sure Docker is running. Docker daemon not available"
|
||||
in str(excinfo.value)
|
||||
)
|
||||
|
||||
|
||||
def test_init_w_dependency_mismatch(docker_container):
|
||||
with pytest.raises(ValueError) as excinfo:
|
||||
runner = AgentRun(
|
||||
container_name=docker_container.name,
|
||||
dependencies_whitelist=[],
|
||||
cached_dependencies=["requests"],
|
||||
)
|
||||
assert "Some cached dependencies are not in the whitelist." in str(excinfo.value)
|
||||
|
||||
|
||||
"""**benchmarking**"""
|
||||
|
||||
|
||||
def execute_code_in_container_benchmark(runner, code):
|
||||
output = runner.execute_code_in_container(code)
|
||||
return output
|
||||
|
||||
|
||||
def test_dependency_benchmark(benchmark, docker_container):
|
||||
def test_cached_dependency_benchmark(benchmark, docker_container):
|
||||
runner = AgentRun(
|
||||
container_name=docker_container.name,
|
||||
cached_dependencies=["numpy"],
|
||||
)
|
||||
result = benchmark(
|
||||
execute_code_in_container_benchmark,
|
||||
docker_container=docker_container,
|
||||
runner=runner,
|
||||
code="import numpy as np\nprint(np.array([1, 2, 3]))",
|
||||
)
|
||||
assert result == "[1 2 3]\n"
|
||||
|
||||
|
||||
def test_exception_benchmark(benchmark, docker_container):
|
||||
def test_dependency_benchmark(benchmark, docker_container):
|
||||
runner = AgentRun(
|
||||
container_name=docker_container.name,
|
||||
)
|
||||
result = benchmark(
|
||||
execute_code_in_container_benchmark,
|
||||
docker_container=docker_container,
|
||||
runner=runner,
|
||||
code="import requests\nprint(requests.get('https://example.com').status_code)",
|
||||
)
|
||||
assert result == "200\n"
|
||||
|
||||
|
||||
def test_exception_benchmark(benchmark, docker_container):
|
||||
runner = AgentRun(
|
||||
container_name=docker_container.name,
|
||||
)
|
||||
result = benchmark(
|
||||
execute_code_in_container_benchmark,
|
||||
runner=runner,
|
||||
code="print(f'{1/0}')",
|
||||
)
|
||||
ends_with = "ZeroDivisionError: division by zero\n"
|
||||
@@ -257,9 +316,12 @@ def test_exception_benchmark(benchmark, docker_container):
|
||||
|
||||
|
||||
def test_vanilla_benchmark(benchmark, docker_container):
|
||||
runner = AgentRun(
|
||||
container_name=docker_container.name,
|
||||
)
|
||||
result = benchmark(
|
||||
execute_code_in_container_benchmark,
|
||||
docker_container=docker_container,
|
||||
runner=runner,
|
||||
code="print('Hello, World!')",
|
||||
)
|
||||
assert result == "Hello, World!\n"
|
||||
|
||||
Reference in New Issue
Block a user