Suppose you are on a system that has not yet been virtualized, meaning you are not on an AWS, Azure, or k8s virtualized environment. Otherwise, refer to the virtualized platform part.

Install VMware Work Station Pro (for Apple Chips, it should be VMware Fusion) and configure vmrun command, and verify successful installation by:

vmrun -T ws list

If the installation along with the environment variable set is successful, you will see the message showing the current running virtual machines.

Install the environment package, and download the examples and the virtual machine image. For x86_64 CPU Linux or Windows, you can install the environment package and download the examples and the virtual machine image by running the following commands: Remove the nogui parameter if you want to see what happens in the virtual machine.

git clone https://github.com/xlang-ai/OSWorld
cd OSWorld
pip install -r requirements.txt
gdown https://drive.google.com/drive/folders/1HX5gcf7UeyR-2UmiA15Q9U-Wr6E6Gio8 -O Ubuntu --folder
vmrun -T ws start "Ubuntu/Ubuntu.vmx" nogui
vmrun -T ws snapshot "Ubuntu/Ubuntu.vmx" "init_state"

For Apple-chip macOS, you should install the specially prepared virtual machine image by running the following commands:

gdown https://drive.google.com/drive/folders/xxx -O Ubuntu --folder
vmrun -T fusion start "Ubuntu/Ubuntu.vmx"
vmrun -T fusion snapshot "Ubuntu/Ubuntu.vmx" "init_state"

Virtualized platform

We are working on supporting it👷, hold tight!

Quick Start

Run the following minimal example to interact with the environment:

from desktop_env.envs.desktop_env import DesktopEnv

example = {
    "id": "94d95f96-9699-4208-98ba-3c3119edf9c2",
    "instruction": "I want to install Spotify on my current system. Could you please help me?",
    "config": [{"type": "execute", "parameters": {
        "command": ["python", "-c", "import pyautogui; import time; pyautogui.click(960, 540); time.sleep(0.5);"]}}],
    "evaluator": {"func": "check_include_exclude", "result": {"type": "vm_command_line", "command": "which spotify"},
                  "expected": {"type": "rule", "rules": {"include": ["spotify"], "exclude": ["not found"]}}}
}
env = DesktopEnv(
    path_to_vm="Ubuntu/Ubuntu.vmx",
    action_space="pyautogui",
    task_config=example
)
obs = env.reset()
obs, reward, done, info = env.step("pyautogui.rightClick()")

Run Benchmark

Run the Baseline Agent

If you want to run the baseline agent we use in our paper, you can run the following command to run under the GPT-4V pure-screenshot setting as an example:

python run.py --path_to_vm Ubuntu/Ubuntu.vmx --headless --observation_type screenshot --model gpt-4-vision-preview

Run Evaluation of Your Agent

Please first read through the agent interface and the environment interface. Implement the agent interface correctly and import your customized one in the run.py file. Then, you can run a similar command as the previous section to run the benchmark on your agent.

Citation

If you find this environment useful, please consider citing our work:

@article{DesktopEnv,
  title={},
  author={},
  journal={arXiv preprint arXiv:xxxx.xxxx},
  year={2024}
}