alihan/OSWorld

mirror of https://github.com/xlang-ai/OSWorld.git synced 2024-04-29 12:26:03 +03:00

Go to file

rhythmcao 1f91777f3c remove googledrive credentials; update google account keys

2024-03-20 22:24:35 +08:00

new timer, but need to set in setting.json file, need to be upgraded into parameters

2024-03-16 12:36:23 +08:00

Add DuckTrack as initial annotation tool; Initial multimodal test

2023-11-27 00:34:57 +08:00

ver Mar20th

2024-03-20 14:25:09 +08:00

evaluation_examples

remove googledrive credentials; update google account keys

2024-03-20 22:24:35 +08:00

Fix typo

2024-03-19 18:57:47 +08:00

resouce_collection

modified libreoffice writer eval examples

2024-01-23 22:02:09 +08:00

Fix conflicts

2023-12-16 21:32:43 +08:00

vscodeEvalExtension

update vscode eval extension: eval-0.0.1.vsix

2024-01-12 16:14:41 +08:00

.gitignore

remove googledrive credentials; update google account keys

2024-03-20 22:24:35 +08:00

lib_run_single.py

Disable wandb temporarily, speedup the environment step speed by remove useless a11y tree re-get and terminal output

2024-03-19 08:57:05 +08:00

main.py

fix incompatible errors in main.py (temporarily fixup, will be dropped in future after snapshot download is ok)

2024-03-15 22:09:24 +08:00

README.md

Update README.md

2024-03-19 22:58:47 +08:00

requirements.txt

Update requirements.txt add wandb and wrapt_timeout_decorator

2024-03-18 21:43:59 +08:00

run.py

Disable a11y tree temporarily

2024-03-18 21:43:35 +08:00

screenshot.png

Run through gpt_4v agent pipeline

2023-11-29 20:21:57 +08:00

settings.json

Remove examples that cannot be runned

2024-03-20 13:13:58 +08:00

README.md

OSWorld: Open-Ended Tasks in Real Computer Environments

SLOGAN

Website • Paper

Overview

Updates

2024-03-01: We released our paper, environment code, dataset, and project page. Check it out!

Install

Install VMWare and configure vmrun command: Please refer to guidance
Install the environment package, download the examples and the virtual machine image. For x86_64 Linux or Windows, you can install the environment package and download the examples and the virtual machine image by running the following commands:

git clone https://github.com/xlang-ai/DesktopEnv
cd DesktopEnv
pip install -r requirements.txt
gdown https://drive.google.com/drive/folders/1HX5gcf7UeyR-2UmiA15Q9U-
Wr6E6Gio8 -O Ubuntu --folder
vmrun -T ws start "Ubuntu/Ubuntu.vmx" nogui
vmrun -T ws snapshot "Ubuntu/Ubuntu.vmx" "init_state"

Quick Start

Run the following minimal example to interact with the environment:

import json
from desktop_env.envs.desktop_env import DesktopEnv

with open("evaluation_examples/examples/gimp/f723c744-e62c-4ae6-98d1-750d3cd7d79d.json", "r", encoding="utf-8") as f:
    example = json.load(f)

env = DesktopEnv(
    path_to_vm=r"path_to_vm",
    action_space="computer_13",
    task_config=example
)
observation = env.reset()

observation, reward, done, info = env.step({"action_type": "CLICK", "parameters": {"button": "right", "num_clicks": 1}})

Annotation Tool Usage

We provide an annotation tool to help you annotate the examples.

Agent Usage

We provide a simple agent to interact with the environment. You can use it as a starting point to build your own agent.

Road map of infra (Proposed)

Explore VMWare, and whether it can be connected and control through mouse package
Explore Windows and MacOS, whether it can be installed
- MacOS is closed source and cannot be legally installed
- Windows is available legally and can be installed
Build gym-like python interface for controlling the VM
Recording of actions (mouse movement, click, keyboard) for humans to annotate, and we can replay it and compress it
Build a simple task, e.g. open a browser, open a website, click on a button, and close the browser
Set up a pipeline and build agents implementation (zero-shot) for the task
Start to design on which tasks inside the DesktopENv to focus on, start to wrap up the environment to be public
Start to annotate the examples for ~~training~~ and testing
Error handling during file passing and file opening, etc.
Add accessibility tree from the OS into the observation space
Add pre-process and post-process action support for benchmarking setup and evaluation
Multiprocess support, this can enable the reinforcement learning to be more efficient
Experiment logging and visualization system
Add more tasks, maybe scale to 300 for v1.0.0, and create a dynamic leaderboard

Road map of benchmark, tools and resources (Proposed)

Improve the annotation tool base on DuckTrack, make it more robust which align on accessibility tree
Annotate the steps of doing the task
Build a website for the project
Crawl all resources we explored from the internet, and make it easy to access
Set up ways for community to contribute new examples

Citation

If you find this environment useful, please consider citing our work:

@article{DesktopEnv,
  title={},
  author={},
  journal={arXiv preprint arXiv:xxxx.xxxx},
  year={2024}
}