mirror of
https://github.com/All-Hands-AI/OpenHands.git
synced 2024-08-29 01:18:33 +03:00
docs: More consistent documentation (#3608)
This commit is contained in:
@@ -5,30 +5,26 @@ sidebar_position: 8
|
||||
# 📚 Misc
|
||||
|
||||
## ⭐️ Research Strategy
|
||||
|
||||
Achieving full replication of production-grade applications with LLMs is a complex endeavor. Our strategy involves:
|
||||
|
||||
1. **Core Technical Research:** Focusing on foundational research to understand and improve the technical aspects of code generation and handling.
|
||||
2. **Specialist Abilities:** Enhancing the effectiveness of core components through data curation, training methods, and more.
|
||||
3. **Task Planning:** Developing capabilities for bug detection, codebase management, and optimization.
|
||||
4. **Evaluation:** Establishing comprehensive evaluation metrics to better understand and improve our models.
|
||||
1. **Core Technical Research:** Focusing on foundational research to understand and improve the technical aspects of code generation and handling
|
||||
2. **Specialist Abilities:** Enhancing the effectiveness of core components through data curation, training methods, and more
|
||||
3. **Task Planning:** Developing capabilities for bug detection, codebase management, and optimization
|
||||
4. **Evaluation:** Establishing comprehensive evaluation metrics to better understand and improve our models
|
||||
|
||||
## 🚧 Default Agent
|
||||
|
||||
- Our default Agent is currently the CodeActAgent, which is capable of generating code and handling files. We're working on other Agent implementations, including [SWE Agent](https://swe-agent.com/). You can [read about our current set of agents here](./agents).
|
||||
Our default Agent is currently the [CodeActAgent](agents), which is capable of generating code and handling files.
|
||||
|
||||
## 🤝 How to Contribute
|
||||
|
||||
OpenHands is a community-driven project, and we welcome contributions from everyone. Whether you're a developer, a researcher, or simply enthusiastic about advancing the field of software engineering with AI, there are many ways to get involved:
|
||||
|
||||
- **Code Contributions:** Help us develop the core functionalities, frontend interface, or sandboxing solutions.
|
||||
- **Research and Evaluation:** Contribute to our understanding of LLMs in software engineering, participate in evaluating the models, or suggest improvements.
|
||||
- **Feedback and Testing:** Use the OpenHands toolset, report bugs, suggest features, or provide feedback on usability.
|
||||
- **Code Contributions:** Help us develop the core functionalities, frontend interface, or sandboxing solutions
|
||||
- **Research and Evaluation:** Contribute to our understanding of LLMs in software engineering, participate in evaluating the models, or suggest improvements
|
||||
- **Feedback and Testing:** Use the OpenHands toolset, report bugs, suggest features, or provide feedback on usability
|
||||
|
||||
For details, please check [this document](https://github.com/All-Hands-AI/OpenHands/blob/main/CONTRIBUTING.md).
|
||||
|
||||
## 🤖 Join Our Community
|
||||
|
||||
We have both Slack workspace for the collaboration on building OpenHands and Discord server for discussion about anything related, e.g., this project, LLM, agent, etc.
|
||||
|
||||
- [Slack workspace](https://join.slack.com/t/opendevin/shared_invite/zt-2oikve2hu-UDxHeo8nsE69y6T7yFX_BA)
|
||||
@@ -41,7 +37,6 @@ If you would love to contribute, feel free to join our community. Let's simplify
|
||||
[](https://star-history.com/#All-Hands-AI/OpenHands&Date)
|
||||
|
||||
## 🛠️ Built With
|
||||
|
||||
OpenHands is built using a combination of powerful frameworks and libraries, providing a robust foundation for its development. Here are the key technologies used in the project:
|
||||
|
||||
       
|
||||
@@ -49,5 +44,4 @@ OpenHands is built using a combination of powerful frameworks and libraries, pro
|
||||
Please note that the selection of these technologies is in progress, and additional technologies may be added or existing ones may be removed as the project evolves. We strive to adopt the most suitable and efficient tools to enhance the capabilities of OpenHands.
|
||||
|
||||
## 📜 License
|
||||
|
||||
Distributed under the MIT License. See [our license](https://github.com/All-Hands-AI/OpenHands/blob/main/LICENSE) for more information.
|
||||
|
||||
@@ -2,13 +2,11 @@
|
||||
sidebar_position: 3
|
||||
---
|
||||
|
||||
# 🧠 Agents and Capabilities
|
||||
|
||||
## CodeAct Agent
|
||||
|
||||
# 🧠 Main Agent and Capabilities
|
||||
## CodeActAgent
|
||||
### Description
|
||||
|
||||
This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both _simplicity_ and _performance_ (see paper for more details).
|
||||
This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a
|
||||
unified **code** action space for both _simplicity_ and _performance_.
|
||||
|
||||
The conceptual idea is illustrated below. At each turn, the agent can:
|
||||
|
||||
@@ -20,74 +18,7 @@ The conceptual idea is illustrated below. At each turn, the agent can:
|
||||
|
||||

|
||||
|
||||
### Plugin System
|
||||
|
||||
To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenHands's plugin system:
|
||||
|
||||
- [Jupyter plugin](https://github.com/All-Hands-AI/OpenHands/tree/main/openhands/runtime/plugins/jupyter): for IPython execution via bash command
|
||||
- [SWE-agent tool plugin](https://github.com/All-Hands-AI/OpenHands/tree/main/openhands/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
|
||||
|
||||
### Demo
|
||||
|
||||
https://github.com/All-Hands-AI/OpenHands/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
|
||||
|
||||
_Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)_
|
||||
|
||||
### Actions
|
||||
|
||||
`Action`,
|
||||
`CmdRunAction`,
|
||||
`IPythonRunCellAction`,
|
||||
`AgentEchoAction`,
|
||||
`AgentFinishAction`,
|
||||
`AgentTalkAction`
|
||||
|
||||
### Observations
|
||||
|
||||
`CmdOutputObservation`,
|
||||
`IPythonRunCellObservation`,
|
||||
`AgentMessageObservation`,
|
||||
`UserMessageObservation`
|
||||
|
||||
### Methods
|
||||
|
||||
| Method | Description |
|
||||
| --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `__init__` | Initializes an agent with `llm` and a list of messages `list[Mapping[str, str]]` |
|
||||
| `step` | Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute. |
|
||||
|
||||
## Planner Agent
|
||||
|
||||
### Description
|
||||
|
||||
The planner agent utilizes a special prompting strategy to create long term plans for solving problems.
|
||||
The agent is given its previous action-observation pairs, current task, and hint based on last action taken at every step.
|
||||
|
||||
### Actions
|
||||
|
||||
`NullAction`,
|
||||
`CmdRunAction`,
|
||||
`BrowseURLAction`,
|
||||
`GithubPushAction`,
|
||||
`FileReadAction`,
|
||||
`FileWriteAction`,
|
||||
`AgentThinkAction`,
|
||||
`AgentFinishAction`,
|
||||
`AgentSummarizeAction`,
|
||||
`AddTaskAction`,
|
||||
`ModifyTaskAction`,
|
||||
|
||||
### Observations
|
||||
|
||||
`Observation`,
|
||||
`NullObservation`,
|
||||
`CmdOutputObservation`,
|
||||
`FileReadObservation`,
|
||||
`BrowserOutputObservation`
|
||||
|
||||
### Methods
|
||||
|
||||
| Method | Description |
|
||||
| --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `__init__` | Initializes an agent with `llm` |
|
||||
| `step` | Checks to see if current step is completed, returns `AgentFinishAction` if True. Otherwise, creates a plan prompt and sends to model for inference, adding the result as the next action. |
|
||||
_Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)_.
|
||||
|
||||
@@ -8,17 +8,13 @@ It creates a sandboxed environment using Docker, where arbitrary code can be run
|
||||
|
||||
OpenHands needs to execute arbitrary code in a secure, isolated environment for several reasons:
|
||||
|
||||
1. Security: Executing untrusted code can pose significant risks to the host system. A sandboxed environment prevents malicious code from accessing or modifying the host system's resources.
|
||||
1. Security: Executing untrusted code can pose significant risks to the host system. A sandboxed environment prevents malicious code from accessing or modifying the host system's resources
|
||||
2. Consistency: A sandboxed environment ensures that code execution is consistent across different machines and setups, eliminating "it works on my machine" issues
|
||||
3. Resource Control: Sandboxing allows for better control over resource allocation and usage, preventing runaway processes from affecting the host system
|
||||
4. Isolation: Different projects or users can work in isolated environments without interfering with each other or the host system
|
||||
5. Reproducibility: Sandboxed environments make it easier to reproduce bugs and issues, as the execution environment is consistent and controllable
|
||||
|
||||
2. Consistency: A sandboxed environment ensures that code execution is consistent across different machines and setups, eliminating "it works on my machine" issues.
|
||||
|
||||
3. Resource Control: Sandboxing allows for better control over resource allocation and usage, preventing runaway processes from affecting the host system.
|
||||
|
||||
4. Isolation: Different projects or users can work in isolated environments without interfering with each other or the host system.
|
||||
|
||||
5. Reproducibility: Sandboxed environments make it easier to reproduce bugs and issues, as the execution environment is consistent and controllable.
|
||||
|
||||
## How does our Runtime work?
|
||||
## How does the Runtime work?
|
||||
|
||||
The OpenHands Runtime system uses a client-server architecture implemented with Docker containers. Here's an overview of how it works:
|
||||
|
||||
@@ -51,84 +47,75 @@ graph TD
|
||||
end
|
||||
```
|
||||
|
||||
1. User Input: The user provides a custom base Docker image.
|
||||
|
||||
2. Image Building: OpenHands builds a new Docker image (the "OD runtime image") based on the user-provided image. This new image includes OpenHands-specific code, primarily the "runtime client."
|
||||
|
||||
3. Container Launch: When OpenHands starts, it launches a Docker container using the OD runtime image.
|
||||
|
||||
4. Client Initialization: The runtime client initializes inside the container, setting up necessary components like a bash shell and loading any specified plugins.
|
||||
|
||||
5. Communication: The OpenHands backend (`runtime.py`) communicates with the runtime client over RESTful API, sending actions and receiving observations.
|
||||
|
||||
6. Action Execution: The runtime client receives actions from the backend, executes them in the sandboxed environment, and sends back observations.
|
||||
|
||||
7. Observation Return: The client sends execution results back to the OpenHands backend as observations.
|
||||
1. User Input: The user provides a custom base Docker image
|
||||
2. Image Building: OpenHands builds a new Docker image (the "OD runtime image") based on the user-provided image. This new image includes OpenHands-specific code, primarily the "runtime client"
|
||||
3. Container Launch: When OpenHands starts, it launches a Docker container using the OD runtime image
|
||||
4. Client Initialization: The runtime client initializes inside the container, setting up necessary components like a bash shell and loading any specified plugins
|
||||
5. Communication: The OpenHands backend (`runtime.py`) communicates with the runtime client over RESTful API, sending actions and receiving observations
|
||||
6. Action Execution: The runtime client receives actions from the backend, executes them in the sandboxed environment, and sends back observations
|
||||
7. Observation Return: The client sends execution results back to the OpenHands backend as observations
|
||||
|
||||
|
||||
The role of the client is crucial:
|
||||
- It acts as an intermediary between the OpenHands backend and the sandboxed environment.
|
||||
- It executes various types of actions (shell commands, file operations, Python code, etc.) safely within the container.
|
||||
- It manages the state of the sandboxed environment, including the current working directory and loaded plugins.
|
||||
- It formats and returns observations to the backend, ensuring a consistent interface for processing results.
|
||||
The role of the client:
|
||||
- It acts as an intermediary between the OpenHands backend and the sandboxed environment
|
||||
- It executes various types of actions (shell commands, file operations, Python code, etc.) safely within the container
|
||||
- It manages the state of the sandboxed environment, including the current working directory and loaded plugins
|
||||
- It formats and returns observations to the backend, ensuring a consistent interface for processing results
|
||||
|
||||
|
||||
## Advanced: How OpenHands builds and maintains OD Runtime images
|
||||
## How OpenHands builds and maintains OD Runtime images
|
||||
|
||||
OpenHands uses a sophisticated approach to build and manage runtime images. This process ensures efficiency, consistency, and flexibility in creating and maintaining Docker images for both production and development environments.
|
||||
OpenHands' approach to building and managing runtime images ensures efficiency, consistency, and flexibility in creating and maintaining Docker images for both production and development environments.
|
||||
|
||||
Check out [relavant code](https://github.com/All-Hands-AI/OpenHands/blob/main/openhands/runtime/utils/runtime_build.py) if you are interested in more details.
|
||||
Check out the [relevant code](https://github.com/All-Hands-AI/OpenHands/blob/main/openhands/runtime/utils/runtime_build.py) if you are interested in more details.
|
||||
|
||||
### Image Tagging System
|
||||
|
||||
OpenHands uses a dual-tagging system for its runtime images to balance reproducibility with flexibility:
|
||||
|
||||
1. Hash-based tag: `{target_image_repo}:{target_image_hash_tag}`
|
||||
1. Hash-based tag: `{target_image_repo}:{target_image_hash_tag}`.
|
||||
Example: `runtime:abc123def456`
|
||||
|
||||
- This tag is based on the MD5 hash of the Docker build folder, which includes the source code (of runtime client and related dependencies) and Dockerfile.
|
||||
- Identical hash tags guarantee that the images were built with exactly the same source code and Dockerfile.
|
||||
- This ensures reproducibility: the same hash always means the same image contents.
|
||||
- This tag is based on the MD5 hash of the Docker build folder, which includes the source code (of runtime client and related dependencies) and Dockerfile
|
||||
- Identical hash tags guarantee that the images were built with exactly the same source code and Dockerfile
|
||||
- This ensures reproducibility; the same hash always means the same image contents
|
||||
|
||||
2. Generic tag: `{target_image_repo}:{target_image_tag}`
|
||||
2. Generic tag: `{target_image_repo}:{target_image_tag}`.
|
||||
Example: `runtime:od_v0.8.3_ubuntu_tag_22.04`
|
||||
|
||||
- This tag follows the format: `runtime:od_v{OD_VERSION}_{BASE_IMAGE_NAME}_tag_{BASE_IMAGE_TAG}`
|
||||
- It represents the latest build for a particular base image and OpenHands version combination.
|
||||
- This tag is updated whenever a new image is built from the same base image, even if the source code changes.
|
||||
- It represents the latest build for a particular base image and OpenHands version combination
|
||||
- This tag is updated whenever a new image is built from the same base image, even if the source code changes
|
||||
|
||||
The hash-based tag ensures exact reproducibility, while the generic tag provides a stable reference to the latest version of a particular configuration. This dual-tagging approach allows OpenHands to efficiently manage both development and production environments.
|
||||
The hash-based tag ensures reproducibility, while the generic tag provides a stable reference to the latest version of a particular configuration. This dual-tagging approach allows OpenHands to efficiently manage both development and production environments.
|
||||
|
||||
### Build Process
|
||||
|
||||
1. Image Naming Convention:
|
||||
- Hash-based tag: `{target_image_repo}:{target_image_hash_tag}`
|
||||
- Hash-based tag: `{target_image_repo}:{target_image_hash_tag}`.
|
||||
Example: `runtime:abc123def456`
|
||||
- Generic tag: `{target_image_repo}:{target_image_tag}`
|
||||
- Generic tag: `{target_image_repo}:{target_image_tag}`.
|
||||
Example: `runtime:od_v0.8.3_ubuntu_tag_22.04`
|
||||
|
||||
2. Build Process:
|
||||
- a. Convert the base image name to an OD runtime image name.
|
||||
- a. Convert the base image name to an OD runtime image name
|
||||
Example: `ubuntu:22.04` -> `runtime:od_v0.8.3_ubuntu_tag_22.04`
|
||||
- b. Generate a build context (Dockerfile and OpenHands source code) and calculate its hash.
|
||||
- c. Check for an existing image with the calculated hash.
|
||||
- d. If not found, check for a recent compatible image to use as a base.
|
||||
- e. If no compatible image exists, build from scratch using the original base image.
|
||||
- f. Tag the new image with both hash-based and generic tags.
|
||||
- b. Generate a build context (Dockerfile and OpenHands source code) and calculate its hash
|
||||
- c. Check for an existing image with the calculated hash
|
||||
- d. If not found, check for a recent compatible image to use as a base
|
||||
- e. If no compatible image exists, build from scratch using the original base image
|
||||
- f. Tag the new image with both hash-based and generic tags
|
||||
|
||||
3. Image Reuse and Rebuilding Logic:
|
||||
The system follows these steps to determine whether to build a new image or use an existing one from a user-provided (base) image (e.g., `ubuntu:22.04`):
|
||||
|
||||
a. If an image exists with the same hash (e.g., `runtime:abc123def456`), it will be reused as is.
|
||||
|
||||
b. If the exact hash is not found, the system will try to rebuild using the latest generic image (e.g., `runtime:od_v0.8.3_ubuntu_tag_22.04`) as a base. This saves time by leveraging existing dependencies.
|
||||
|
||||
c. If neither the hash-tagged nor the generic-tagged image is found, the system will build the image completely from scratch.
|
||||
- a. If an image exists with the same hash (e.g., `runtime:abc123def456`), it will be reused as is
|
||||
- b. If the exact hash is not found, the system will try to rebuild using the latest generic image (e.g., `runtime:od_v0.8.3_ubuntu_tag_22.04`) as a base. This saves time by leveraging existing dependencies
|
||||
- c. If neither the hash-tagged nor the generic-tagged image is found, the system will build the image completely from scratch
|
||||
|
||||
4. Caching and Efficiency:
|
||||
- The system attempts to reuse existing images when possible to save build time.
|
||||
- If an exact match (by hash) is found, it's used without rebuilding.
|
||||
- If a compatible image is found, it's used as a base for rebuilding, saving time on dependency installation.
|
||||
- The system attempts to reuse existing images when possible to save build time
|
||||
- If an exact match (by hash) is found, it's used without rebuilding
|
||||
- If a compatible image is found, it's used as a base for rebuilding, saving time on dependency installation
|
||||
|
||||
Here's a flowchart illustrating the build process:
|
||||
|
||||
@@ -149,14 +136,11 @@ flowchart TD
|
||||
|
||||
This approach ensures that:
|
||||
|
||||
1. Identical source code and Dockerfile always produce the same image (via hash-based tags).
|
||||
2. The system can quickly rebuild images when minor changes occur (by leveraging recent compatible images).
|
||||
3. The generic tag (e.g., `runtime:od_v0.8.3_ubuntu_tag_22.04`) always points to the latest build for a particular base image and OpenHands version combination.
|
||||
1. Identical source code and Dockerfile always produce the same image (via hash-based tags)
|
||||
2. The system can quickly rebuild images when minor changes occur (by leveraging recent compatible images)
|
||||
3. The generic tag (e.g., `runtime:od_v0.8.3_ubuntu_tag_22.04`) always points to the latest build for a particular base image and OpenHands version combination
|
||||
|
||||
By using this method, OpenHands maintains an efficient and flexible system for building and managing runtime images, adapting to both development needs and production requirements.
|
||||
|
||||
|
||||
## Advanced: Runtime Plugin System
|
||||
## Runtime Plugin System
|
||||
|
||||
The OpenHands Runtime supports a plugin system that allows for extending functionality and customizing the runtime environment. Plugins are initialized when the runtime client starts up.
|
||||
|
||||
@@ -166,12 +150,8 @@ Check [an example of Jupyter plugin here](https://github.com/All-Hands-AI/OpenHa
|
||||
|
||||
Key aspects of the plugin system:
|
||||
|
||||
1. Plugin Definition: Plugins are defined as Python classes that inherit from a base `Plugin` class.
|
||||
|
||||
2. Plugin Registration: Available plugins are registered in an `ALL_PLUGINS` dictionary.
|
||||
|
||||
3. Plugin Specification: Plugins are associate with `Agent.sandbox_plugins: list[PluginRequirement]`. Users can specify which plugins to load when initializing the runtime.
|
||||
|
||||
4. Initialization: Plugins are initialized asynchronously when the runtime client starts.
|
||||
|
||||
5. Usage: The runtime client can use initialized plugins to extend its capabilities (e.g., the JupyterPlugin for running IPython cells).
|
||||
1. Plugin Definition: Plugins are defined as Python classes that inherit from a base `Plugin` class
|
||||
2. Plugin Registration: Available plugins are registered in an `ALL_PLUGINS` dictionary
|
||||
3. Plugin Specification: Plugins are associated with `Agent.sandbox_plugins: list[PluginRequirement]`. Users can specify which plugins to load when initializing the runtime
|
||||
4. Initialization: Plugins are initialized asynchronously when the runtime client starts
|
||||
5. Usage: The runtime client can use initialized plugins to extend its capabilities (e.g., the JupyterPlugin for running IPython cells)
|
||||
|
||||
@@ -4,11 +4,11 @@ sidebar_position: 5
|
||||
|
||||
# ✅ Providing Feedback
|
||||
|
||||
When using OpenHands, you will undoubtably encounter cases where things work well, and others where they don't. We encourage you to provide feedback when you use OpenHands to help give feedback to the development team, and perhaps more importantly, create an open corpus of coding agent training examples -- Share-OpenHands!
|
||||
When using OpenHands, you will encounter cases where things work well, and others where they don't. We encourage you to provide feedback when you use OpenHands to help give feedback to the development team, and perhaps more importantly, create an open corpus of coding agent training examples -- Share-OpenHands!
|
||||
|
||||
## 📝 How to Provide Feedback
|
||||
|
||||
Providing feedback is easy! When you are using OpenHands, you can press the thumbs-up or thumbs-down button at any point during your interaction with. You will be prompted to provide your email address (e.g. so we can contact you if we want to ask any follow-up questions), and you can choose whether you want to provide feedback publicly or privately.
|
||||
Providing feedback is easy! When you are using OpenHands, you can press the thumbs-up or thumbs-down button at any point during your interaction. You will be prompted to provide your email address (e.g. so we can contact you if we want to ask any follow-up questions), and you can choose whether you want to provide feedback publicly or privately.
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/5rFx-StMVV0?si=svo7xzp6LhGK_GXr" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
|
||||
|
||||
@@ -31,7 +31,7 @@ The public data will be released when we hit fixed milestones, such as 1,000 pub
|
||||
At this time, we will follow the following release process:
|
||||
|
||||
1. All people who contributed public feedback will receive an email describing the data release and being given an opportunity to opt out.
|
||||
2. The person or people in charge of the data release will perform quality control of the data, removing low-quality feedback, removing email submitter email addresses, and attempting to remove any sensitive information such as API keys.
|
||||
2. The person or people in charge of the data release will perform quality control of the data, removing low-quality feedback, removing email submitter email addresses, and attempting to remove any sensitive information.
|
||||
3. The data will be released publicly under the MIT license through commonly used sites such as github or Hugging Face.
|
||||
|
||||
### What if I want my data deleted?
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
# Create and Use a Custom Docker Sandbox
|
||||
|
||||
The default OpenHands sandbox comes with a [minimal ubuntu configuration](https://github.com/All-Hands-AI/OpenHands/blob/main/containers/sandbox/Dockerfile).
|
||||
|
||||
Your use case may need additional software installed by default.
|
||||
|
||||
There are two ways you can do so:
|
||||
|
||||
@@ -2,10 +2,9 @@
|
||||
|
||||
This guide provides an overview of how to integrate your own evaluation benchmark into the OpenHands framework.
|
||||
|
||||
## Before everything begins: Setup Environment and LLM Configuration
|
||||
|
||||
Please follow instruction [here](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to setup your local development environment and LLM.
|
||||
## Setup Environment and LLM Configuration
|
||||
|
||||
Please follow instructions [here](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to setup your local development environment.
|
||||
OpenHands in development mode uses `config.toml` to keep track of most configurations.
|
||||
|
||||
Here's an example configuration file you can use to define and use multiple LLMs:
|
||||
@@ -61,9 +60,9 @@ This command runs OpenHands with:
|
||||
|
||||
The main entry point for OpenHands is in `openhands/core/main.py`. Here's a simplified flow of how it works:
|
||||
|
||||
1. Parse command-line arguments and load the configuration.
|
||||
2. Create a runtime environment using `create_runtime()`.
|
||||
3. Initialize the specified agent.
|
||||
1. Parse command-line arguments and load the configuration
|
||||
2. Create a runtime environment using `create_runtime()`
|
||||
3. Initialize the specified agent
|
||||
4. Run the controller using `run_controller()`, which:
|
||||
- Attaches the runtime to the agent
|
||||
- Executes the agent's task
|
||||
@@ -234,9 +233,9 @@ Here's a more accurate visual representation:
|
||||
|
||||
In this workflow:
|
||||
|
||||
- Executable actions (like running commands or executing code) are handled directly by the Runtime.
|
||||
- Non-executable actions (typically when the agent wants to communicate or ask for clarification) are handled by the `user_response_fn`.
|
||||
- The agent then processes the feedback, whether it's an Observation from the Runtime or a simulated response from the `user_response_fn`.
|
||||
- Executable actions (like running commands or executing code) are handled directly by the Runtime
|
||||
- Non-executable actions (typically when the agent wants to communicate or ask for clarification) are handled by the `user_response_fn`
|
||||
- The agent then processes the feedback, whether it's an Observation from the Runtime or a simulated response from the `user_response_fn`
|
||||
|
||||
This approach allows for automated handling of both concrete actions and simulated user interactions, making it suitable for evaluation scenarios where you want to test the agent's ability to complete tasks with minimal human intervention.
|
||||
|
||||
@@ -270,8 +269,8 @@ def codeact_user_response(state: State | None) -> str:
|
||||
|
||||
This function does the following:
|
||||
|
||||
1. Provides a standard message encouraging the agent to continue working.
|
||||
2. Checks how many times the agent has attempted to communicate with the user.
|
||||
3. If the agent has made multiple attempts, it provides an option to give up.
|
||||
1. Provides a standard message encouraging the agent to continue working
|
||||
2. Checks how many times the agent has attempted to communicate with the user
|
||||
3. If the agent has made multiple attempts, it provides an option to give up
|
||||
|
||||
By using this function, you can ensure consistent behavior across multiple evaluation runs and prevent the agent from getting stuck waiting for human input.
|
||||
|
||||
@@ -1,12 +1,11 @@
|
||||
# Use OpenHands in OpenShift/K8S
|
||||
|
||||
There are different ways and scenarios that you can do, we're just mentioning one example here:
|
||||
1. Create a PV "as a cluster admin" to map workspace_base data and docker directory to the pod through the worker node.
|
||||
2. Create a PVC to be able to mount those PVs to the POD
|
||||
3. Create a POD which contains two containers; the OpenHands and Sandbox containers.
|
||||
|
||||
## Steps to follow the above example.
|
||||
There are different ways this can be accomplished. This guide goes through one possible way:
|
||||
1. Create a PV "as a cluster admin" to map workspace_base data and docker directory to the pod through the worker node
|
||||
2. Create a PVC to be able to mount those PVs to the pod
|
||||
3. Create a pod which contains two containers; the OpenHands and Sandbox containers
|
||||
|
||||
## Detailed Steps for the Example Above
|
||||
> Note: Make sure you are logged in to the cluster first with the proper account for each step. PV creation requires cluster administrator!
|
||||
|
||||
> Make sure you have read/write permissions on the hostPath used below (i.e. /tmp/workspace)
|
||||
@@ -135,8 +134,8 @@ LAST SEEN TYPE REASON OBJECT
|
||||
10s Normal WaitForFirstConsumer persistentvolumeclaim/workspace-pvc waiting for first consumer to be created before binding
|
||||
```
|
||||
|
||||
3. Create the POD yaml file:
|
||||
Sample POD yaml file below:
|
||||
3. Create the pod yaml file:
|
||||
Sample pod yaml file below:
|
||||
|
||||
- pod.yaml
|
||||
|
||||
@@ -262,35 +261,3 @@ Events: <none>
|
||||
6. Connect to OpenHands UI, configure the Agent, then test:
|
||||
|
||||

|
||||
|
||||
|
||||
## Challenges
|
||||
Some of the challenages that would be needed to improve:
|
||||
|
||||
1. Install GIT into the container:
|
||||
This can be resolved by building a custom image which includes GIT software and use that image during pod deplyment.
|
||||
|
||||
Example below: "to be tested!"
|
||||
|
||||
```dockerfile
|
||||
FROM ghcr.io/all-hands-ai/openhands:main
|
||||
|
||||
# Install Git
|
||||
RUN apt-get update && apt-get install -y git
|
||||
|
||||
# Ensure /opt/workspace_base is writable
|
||||
RUN mkdir -p /opt/workspace_base && chown -R 1000:1000 /opt/workspace_base
|
||||
|
||||
# Verify Git installation
|
||||
RUN git --version
|
||||
```
|
||||
|
||||
2. Mount a shared development directory "i.e. one hosted in EC2 instance" to the POD:
|
||||
This can be also done by sharing the developement directory to the worker node through a sharing software (NFS), then creating a pv and pvc as described above to access that directory.
|
||||
|
||||
3. Not all Agents working! Just tested CoderAgent with an openai API key and produced results.
|
||||
|
||||
|
||||
## Discuss
|
||||
|
||||
For other issues or questions join the [Slack](https://join.slack.com/t/opendevin/shared_invite/zt-2oikve2hu-UDxHeo8nsE69y6T7yFX_BA) or [Discord](https://discord.gg/ESHStjSjD4) and ask!
|
||||
|
||||
@@ -4,7 +4,11 @@ sidebar_position: 2
|
||||
|
||||
# 🤖 LLM Backends
|
||||
|
||||
OpenHands can work with any LLM backend.
|
||||
OpenHands can connect to many LLMs. However, the recommended models to use are GPT-4 and Claude 3.5.
|
||||
|
||||
Current local and open source models are not nearly as powerful. When using an alternative model, you may see long
|
||||
wait times between messages, poor responses, or errors about malformed JSON. OpenHands can only be as powerful as the
|
||||
models driving it.
|
||||
For a full list of the LM providers and models available, please consult the
|
||||
[litellm documentation](https://docs.litellm.ai/docs/providers).
|
||||
|
||||
@@ -33,17 +37,10 @@ We have a few guides for running OpenHands with specific model providers:
|
||||
|
||||
If you're using another provider, we encourage you to open a PR to share your setup!
|
||||
|
||||
## Note on Alternative Models
|
||||
|
||||
The best models are GPT-4 and Claude 3. Current local and open source models are
|
||||
not nearly as powerful. When using an alternative model,
|
||||
you may see long wait times between messages,
|
||||
poor responses, or errors about malformed JSON. OpenHands
|
||||
can only be as powerful as the models driving it--fortunately folks on our team
|
||||
are actively working on building better open source models!
|
||||
|
||||
## API retries and rate limits
|
||||
|
||||
Some LLMs have rate limits and may require retries. OpenHands will automatically retry requests if it receives a 429 error or API connection error.
|
||||
You can set `LLM_NUM_RETRIES`, `LLM_RETRY_MIN_WAIT`, `LLM_RETRY_MAX_WAIT` environment variables to control the number of retries and the time between retries.
|
||||
By default, `LLM_NUM_RETRIES` is 5 and `LLM_RETRY_MIN_WAIT`, `LLM_RETRY_MAX_WAIT` are 3 seconds and 60 seconds respectively.
|
||||
You can set the following environment variables to control the number of retries and the time between retries:
|
||||
* `LLM_NUM_RETRIES` (Default of 5)
|
||||
* `LLM_RETRY_MIN_WAIT` (Default of 3 seconds)
|
||||
* `LLM_RETRY_MAX_WAIT` (Default of 60 seconds)
|
||||
|
||||
@@ -5,18 +5,11 @@ sidebar_position: 4
|
||||
# 🚧 Troubleshooting
|
||||
|
||||
There are some error messages that frequently get reported by users.
|
||||
|
||||
We'll try to make the install process easier and these error messages
|
||||
better in the future. But for now, you can look for your error message below and see if there are any workarounds.
|
||||
|
||||
For each of these error messages **there is an existing issue**. Please do not
|
||||
open a new issue--just comment there.
|
||||
|
||||
If you find more information or a workaround for one of these issues, please
|
||||
open a *PR* to add details to this file.
|
||||
We'll try to make the install process easier, but for now you can look for your error message below and see if there are any workarounds.
|
||||
If you find more information or a workaround for one of these issues, please open a *PR* to add details to this file.
|
||||
|
||||
:::tip
|
||||
If you're running on Windows and having trouble, check out our [guide for Windows (WSL) users](troubleshooting/windows).
|
||||
If you're running on Windows and having trouble, check out our [Notes for Windows and WSL users](troubleshooting/windows).
|
||||
:::
|
||||
|
||||
## Common Issues
|
||||
@@ -141,7 +134,7 @@ the API endpoint you're trying to connect to. Most often this happens for Azure
|
||||
**Workarounds**
|
||||
|
||||
* Check that you've set `LLM_BASE_URL` properly
|
||||
* Check that model is set properly, based on the [LiteLLM docs](https://docs.litellm.ai/docs/providers)
|
||||
* Check that the model is set properly, based on the [LiteLLM docs](https://docs.litellm.ai/docs/providers)
|
||||
* If you're running inside the UI, be sure to set the `model` in the settings modal
|
||||
* If you're running headless (via main.py) be sure to set `LLM_MODEL` in your env/config
|
||||
* Make sure you've followed any special instructions for your LLM provider
|
||||
|
||||
@@ -8,12 +8,11 @@ Please be sure to run all commands inside your WSL terminal.
|
||||
### Recommendation: Do not run as root user
|
||||
|
||||
For security reasons, it is highly recommended to not run OpenHands as the root user, but a user with a non-zero UID.
|
||||
In addition, persistent sandboxes won't be supported when running as root and during start of OpenHands an appropriate message may appear.
|
||||
|
||||
References:
|
||||
|
||||
* [Why it is bad to login as root](https://askubuntu.com/questions/16178/why-is-it-bad-to-log-in-as-root)
|
||||
* [Set default user in WSL](https://www.tenforums.com/tutorials/128152-set-default-user-windows-subsystem-linux-distro-windows-10-a.html#option2)
|
||||
* [Set default user in WSL](https://www.tenforums.com/tutorials/128152-set-default-user-windows-subsystem-linux-distro-windows-10-a.html#option2)
|
||||
Hint about the 2nd reference: for Ubuntu users, the command could actually be "ubuntupreview" instead of "ubuntu".
|
||||
|
||||
---
|
||||
@@ -22,21 +21,6 @@ Hint about the 2nd reference: for Ubuntu users, the command could actually be "u
|
||||
If you are using Docker Desktop, make sure to start it before calling any docker command from inside WSL.
|
||||
Docker also needs to have the WSL integration option activated.
|
||||
|
||||
---
|
||||
### Failed to create openhands user
|
||||
|
||||
If you encounter the following error during setup:
|
||||
|
||||
```sh
|
||||
Exception: Failed to create openhands user in sandbox: 'useradd: UID 0 is not unique'
|
||||
```
|
||||
|
||||
You can resolve it by running:
|
||||
|
||||
```sh
|
||||
export SANDBOX_USER_ID=1000
|
||||
```
|
||||
|
||||
---
|
||||
### Poetry Installation
|
||||
|
||||
@@ -76,5 +60,5 @@ localhostForwarding=true
|
||||
|
||||
* Save the `.wslconfig` file.
|
||||
* Restart WSL2 completely by exiting any running WSL2 instances and executing the command `wsl --shutdown` in your command prompt or terminal.
|
||||
* After restarting WSL, attempt to execute `make run` again.
|
||||
The networking issue should be resolved.
|
||||
* After restarting WSL, attempt to execute `make run` again.
|
||||
The networking issue should be resolved.
|
||||
|
||||
Reference in New Issue
Block a user