alihan/OSWorld

mirror of https://github.com/xlang-ai/OSWorld.git synced 2024-04-29 12:26:03 +03:00

Go to file

Timothyxxx 3f19cc5117 Fix bugs in chrome example

2024-03-10 17:06:39 +08:00

Add DuckTrack as initial annotation tool; Initial multimodal test

2023-11-27 00:34:57 +08:00

Fix bugs in multiple apps example 0e53

2024-03-10 15:18:14 +08:00

evaluation_examples

Fix bugs in chrome example

2024-03-10 17:06:39 +08:00

Improve on mmagents prompts; initialize online tasks from Mind2Web

2024-02-22 22:01:22 +08:00

resouce_collection

modified libreoffice writer eval examples

2024-01-23 22:02:09 +08:00

Fix conflicts

2023-12-16 21:32:43 +08:00

vscodeEvalExtension

update vscode eval extension: eval-0.0.1.vsix

2024-01-12 16:14:41 +08:00

.gitignore

Merge branch 'zdy'

2024-02-20 17:16:22 +08:00

experiment_a11y_tree.py

122324154

2024-02-02 14:36:53 +08:00

experiment_screenshot_a11y_tree.py

122324154

2024-02-02 14:36:53 +08:00

experiment_screenshot_seeact.py

Update some config

2024-01-31 23:50:45 +08:00

experiment_screenshot_som.py

122324154

2024-02-02 14:36:53 +08:00

experiment_screenshot.py

122324154

2024-02-02 14:36:53 +08:00

main.py

fix Desktop path error, revise main.py and update google writer tutorial

2024-02-06 21:45:03 +08:00

README.md

Update README

2024-03-09 20:23:55 +08:00

requirements.txt

merge

2024-03-09 18:53:27 +08:00

screenshot.png

Run through gpt_4v agent pipeline

2023-11-29 20:21:57 +08:00

README.md

OSWorld: Open-Ended Tasks in Real Computer Environments

SLOGAN

Website • Paper

Overview

Updates

2024-03-01: We released our paper, environment code, dataset, and project page. Check it out!

Install

Install VMWare and configure vmrun command: Please refer to guidance
Install the environment package, download the examples and the virtual machine image.

pip install desktop-env
gdown xxxx
gdown xxxx

Quick Start

Run the following minimal example to interact with the environment:

import json
from desktop_env.envs.desktop_env import DesktopEnv

with open("evaluation_examples/examples/gimp/f723c744-e62c-4ae6-98d1-750d3cd7d79d.json", "r", encoding="utf-8") as f:
    example = json.load(f)

env = DesktopEnv(
    path_to_vm=r"path_to_vm",
    action_space="computer_13",
    task_config=example
)
observation = env.reset()

observation, reward, done, info = env.step({"action_type": "CLICK", "parameters": {"button": "right", "num_clicks": 1}})

Annotation Tool Usage

We provide an annotation tool to help you annotate the examples.

Agent Usage

We provide a simple agent to interact with the environment. You can use it as a starting point to build your own agent.

Road map of infra (Proposed)

Explore VMWare, and whether it can be connected and control through mouse package
Explore Windows and MacOS, whether it can be installed
- MacOS is closed source and cannot be legally installed
- Windows is available legally and can be installed
Build gym-like python interface for controlling the VM
Recording of actions (mouse movement, click, keyboard) for humans to annotate, and we can replay it and compress it
Build a simple task, e.g. open a browser, open a website, click on a button, and close the browser
Set up a pipeline and build agents implementation (zero-shot) for the task
Start to design on which tasks inside the DesktopENv to focus on, start to wrap up the environment to be public
Start to annotate the examples for ~~training~~ and testing
Error handling during file passing and file opening, etc.
Add accessibility tree from the OS into the observation space
Add pre-process and post-process action support for benchmarking setup and evaluation
Multiprocess support, this can enable the reinforcement learning to be more efficient
Experiment logging and visualization system
Add more tasks, maybe scale to 300 for v1.0.0, and create a dynamic leaderboard

Road map of benchmark, tools and resources (Proposed)

Improve the annotation tool base on DuckTrack, make it more robust which align on accessibility tree
Annotate the steps of doing the task
Build a website for the project
Crawl all resources we explored from the internet, and make it easy to access
Set up ways for community to contribute new examples

Citation

If you find this environment useful, please consider citing our work:

@article{DesktopEnv,
  title={},
  author={},
  journal={arXiv preprint arXiv:xxxx.xxxx},
  year={2024}
}