2023-10-30 00:28:33 +08:00
2023-10-30 00:28:33 +08:00
2023-10-29 01:04:25 +08:00
2023-10-25 18:26:13 +08:00
2023-10-30 00:28:33 +08:00
2023-10-30 00:28:33 +08:00
2023-10-30 00:28:33 +08:00
2023-10-25 18:26:13 +08:00

DesktopEnv: A Learning Environment for Human-like Computer Task Mastery

Setup guide

  1. Download OS image
    1. Download kubuntu from https://kubuntu.org/getkubuntu/
    2. Download ubuntu from https://ubuntu.com/download/desktop
      1. If mac OS, use https://cdimage.ubuntu.com/jammy/daily-live/current/jammy-desktop-arm64.iso
    3. Download Windows, TODO
    4. Download MacOS, TODO
  2. Setup virtual machine
    1. Create Host Only Adapter and add it to the network adapter in the settings
  3. Set up bridge for connecting to VM
    1. Option 1: Install xdotool on VM
    2. Option 2: Install mouse TODO
  4. Set up SSH server on VM: https://averagelinuxuser.com/ssh-into-virtualbox/
    1. sudo apt install openssh-server
    2. sudo systemctl enable ssh --now
    3. sudo ufw disable (disable firewall - safe for local network, otherwise sudo ufw allow ssh)
    4. ip a - find ip address
    5. ssh username@<ip_address>
    6. On host, run ssh-copy-id <username>@<ip_address>
  5. Install screenshot tool (in vm)
    1. sudo apt install imagemagick-6.q16hdri
    2. DISPLAY=:0 import -window root screenshot.png
  6. Get screenshot
    1. scp user@192.168.7.128:~/screenshot.png screenshot.png
    2. rm -rf ~/screenshot.png

Road map (Proposed)

  • Explore VMWare, and whether it can be connected and control through mouse package
  • Explore Windows and MacOS, whether it can be installed
  • Build gym-like python interface for controlling the VM
  • Recording of actions (mouse movement, click, keyboard) for human to annotate, and we can replay it
  • Build a simple task, e.g. open a browser, open a website, click on a button, and close the browser
  • Set up a pipeline and build agents implementation (zero-shot) for the task
  • Start to design on which tasks inside the DesktopENv to focus on, start to wrap up the environment to be public
  • Start to annotate the examples for training and testing
Languages
Python 95.6%
JavaScript 4.4%