mirror of
https://github.com/xlang-ai/OSWorld.git
synced 2024-04-29 12:26:03 +03:00
26ed70ef705dcfe404b09ee9e5ebc7e29c11e4b6
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Updates
- 2024-03-28: We released our paper, environment and benchmark, and project page. Check it out!
Install
- Install VMWare and configure
vmruncommand, and verify by:
vmrun -T ws list
- Install the environment package, download the examples and the virtual machine image.
For x86_64 CPU Linux or Windows, you can install the environment package and download the examples and the virtual machine image by running the following commands:
Remove the
noguiparameter if you want to see what happens in the virtual machine.
git clone https://github.com/xlang-ai/OSWorld
cd OSWorld
pip install -r requirements.txt
gdown https://drive.google.com/drive/folders/1HX5gcf7UeyR-2UmiA15Q9U-Wr6E6Gio8 -O Ubuntu --folder
vmrun -T ws start "Ubuntu/Ubuntu.vmx" nogui
vmrun -T ws snapshot "Ubuntu/Ubuntu.vmx" "init_state"
For Apple-chip macOS, you should install the specially prepared virtual machine image by running the following commands:
gdown https://drive.google.com/drive/folders/xxx -O Ubuntu --folder
vmrun -T fusion start "Ubuntu/Ubuntu.vmx"
vmrun -T fusion snapshot "Ubuntu/Ubuntu.vmx" "init_state"
Quick Start
Run the following minimal example to interact with the environment:
from desktop_env.envs.desktop_env import DesktopEnv
example = {
"id": "94d95f96-9699-4208-98ba-3c3119edf9c2",
"instruction": "I want to install Spotify on my current system. Could you please help me?",
"config": [{"type": "execute", "parameters": {"command": ["python","-c","import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"]}}], "evaluator": {"func": "check_include_exclude", "result": {"type": "vm_command_line","command": "which spotify"}, "expected": {"type": "rule","rules": {"include": ["spotify"], "exclude": ["not found"]}}}
}
env = DesktopEnv(
path_to_vm="Ubuntu/Ubuntu.vmx",
action_space="pyautogui",
task_config=example
)
obs = env.reset()
obs, reward, done, info = env.step("pyautogui.rightClick()")
Run Benchmark
Run the Baseline Agent
If you want to run the baseline agent we use in our paper, you can run the following command as an example:
Run Evaluation of Your Agent
Please first read through the agent interface and the environment interface.
And implement the agent interface correctly and import you customized one in the run.py file.
Then, you can run the following command to evaluate your agent:
Citation
If you find this environment useful, please consider citing our work:
@article{DesktopEnv,
title={},
author={},
journal={arXiv preprint arXiv:xxxx.xxxx},
year={2024}
}
Description
OSWorld: A real computer environment for multimodal agents to evaluate open-ended computer tasks
agentartificial-intelligencebenchmarkclicode-generationguilanguage-modellarge-action-modelllmmultimodalnatural-language-processingreinforcement-learningrpavlm
Readme
Apache-2.0
45 MiB
Languages
Python
95.6%
JavaScript
4.4%