2024-03-20 14:25:09 +08:00
2024-03-19 18:57:47 +08:00
2023-12-16 21:32:43 +08:00
2024-03-19 22:58:47 +08:00
2024-03-18 21:43:35 +08:00

OSWorld: Open-Ended Tasks in Real Computer Environments

Logo
SLOGAN

WebsitePaper

Overview

Updates

Install

  1. Install VMWare and configure vmrun command: Please refer to guidance

  2. Install the environment package, download the examples and the virtual machine image. For x86_64 Linux or Windows, you can install the environment package and download the examples and the virtual machine image by running the following commands:

git clone https://github.com/xlang-ai/DesktopEnv
cd DesktopEnv
pip install -r requirements.txt
gdown https://drive.google.com/drive/folders/1HX5gcf7UeyR-2UmiA15Q9U-
Wr6E6Gio8 -O Ubuntu --folder
vmrun -T ws start "Ubuntu/Ubuntu.vmx" nogui
vmrun -T ws snapshot "Ubuntu/Ubuntu.vmx" "init_state"

Quick Start

Run the following minimal example to interact with the environment:

import json
from desktop_env.envs.desktop_env import DesktopEnv

with open("evaluation_examples/examples/gimp/f723c744-e62c-4ae6-98d1-750d3cd7d79d.json", "r", encoding="utf-8") as f:
    example = json.load(f)

env = DesktopEnv(
    path_to_vm=r"path_to_vm",
    action_space="computer_13",
    task_config=example
)
observation = env.reset()

observation, reward, done, info = env.step({"action_type": "CLICK", "parameters": {"button": "right", "num_clicks": 1}})

Annotation Tool Usage

We provide an annotation tool to help you annotate the examples.

Agent Usage

We provide a simple agent to interact with the environment. You can use it as a starting point to build your own agent.

Road map of infra (Proposed)

  • Explore VMWare, and whether it can be connected and control through mouse package
  • Explore Windows and MacOS, whether it can be installed
    • MacOS is closed source and cannot be legally installed
    • Windows is available legally and can be installed
  • Build gym-like python interface for controlling the VM
  • Recording of actions (mouse movement, click, keyboard) for humans to annotate, and we can replay it and compress it
  • Build a simple task, e.g. open a browser, open a website, click on a button, and close the browser
  • Set up a pipeline and build agents implementation (zero-shot) for the task
  • Start to design on which tasks inside the DesktopENv to focus on, start to wrap up the environment to be public
  • Start to annotate the examples for training and testing
  • Error handling during file passing and file opening, etc.
  • Add accessibility tree from the OS into the observation space
  • Add pre-process and post-process action support for benchmarking setup and evaluation
  • Multiprocess support, this can enable the reinforcement learning to be more efficient
  • Experiment logging and visualization system
  • Add more tasks, maybe scale to 300 for v1.0.0, and create a dynamic leaderboard

Road map of benchmark, tools and resources (Proposed)

  • Improve the annotation tool base on DuckTrack, make it more robust which align on accessibility tree
  • Annotate the steps of doing the task
  • Build a website for the project
  • Crawl all resources we explored from the internet, and make it easy to access
  • Set up ways for community to contribute new examples

Citation

If you find this environment useful, please consider citing our work:

@article{DesktopEnv,
  title={},
  author={},
  journal={arXiv preprint arXiv:xxxx.xxxx},
  year={2024}
}
Languages
Python 95.6%
JavaScript 4.4%