mirror of
				https://github.com/xlang-ai/OSWorld.git
				synced 2024-04-29 12:26:03 +03:00 
			
		
		
		
	
			
				
					
						
					
					b3da09a8607dbd66ad26dbce662c45ae8c61084d
				
			
			
		
	DesktopEnv: A Learning Environment for Human-like Computer Task Mastery
Setup guide
- Download OS image
- Download kubuntu from https://kubuntu.org/getkubuntu/
 - Download ubuntu from https://ubuntu.com/download/desktop
 - Download Windows, TODO
 - Download MacOS, TODO
 
 - Setup virtual machine
- Create 
Host Only Adapterand add it to the network adapter in the settings 
 - Create 
 - Set up bridge for connecting to VM
 - Set up SSH server on VM: https://averagelinuxuser.com/ssh-into-virtualbox/
sudo apt install openssh-serversudo systemctl enable ssh --nowsudo ufw disable(disable firewall - safe for local network, otherwisesudo ufw allow ssh)ip a- find ip address- ssh username@<ip_address>
 - On host, run 
ssh-copy-id <username>@<ip_address> 
 - Install screenshot tool (in vm)
sudo apt install imagemagick-6.q16hdriDISPLAY=:0 import -window root screenshot.png
 - Get screenshot
scp user@192.168.7.128:~/screenshot.png screenshot.pngrm -rf ~/screenshot.png
 
Road map (Proposed)
- Explore VMWare, and whether it can be connected and control through mouse package
 - Explore Windows and MacOS, whether it can be installed
 - Build gym-like python interface for controlling the VM
 - Recording of actions (mouse movement, click, keyboard) for human to annotate, and we can replay it
- This part may be conflict with work from Aran Komatsuzaki team, a.k.a. Duck AI
 
 - Build a simple task, e.g. open a browser, open a website, click on a button, and close the browser
 - Set up a pipeline and build agents implementation (zero-shot) for the task
 - Start to design on which tasks inside the DesktopENv to focus on, start to wrap up the environment to be public
 - Start to annotate the examples for training and testing
 
Description
				OSWorld: A real computer environment for multimodal agents to evaluate open-ended computer tasks
						
						
						
							
							agentartificial-intelligencebenchmarkclicode-generationguilanguage-modellarge-action-modelllmmultimodalnatural-language-processingreinforcement-learningrpavlm
						
						
						
							
								 Readme
							
						
						
							
								 Apache-2.0
							
						
						
						
							
							
							 45 MiB
						
					
					Languages
				
				
								
								
									Python
								
								95.6%
							
						
							
								
								
									JavaScript
								
								4.4%