This commit is contained in:
yadonglu
2024-10-01 17:25:16 +00:00
parent 6cd06a7a86
commit 80572b823a
10 changed files with 423 additions and 54 deletions

1
.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
weights/

View File

@@ -6,6 +6,12 @@
**OmniParser** is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.
## Install
```python
conda create -n "omni" python==3.12
pip install -r requirements.txt
```
## Examples:
We put together a few simple examples in the demo.ipynb.

Binary file not shown.

Binary file not shown.

File diff suppressed because one or more lines are too long

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.0 MiB

After

Width:  |  Height:  |  Size: 59 KiB

View File

@@ -1,14 +1,15 @@
torch==2.2.2
easyocr==1.7.1
torchvision==0.17.2
torch
easyocr
torchvision
supervision==0.18.0
openai==1.3.5
transformers==4.40.2
transformers
ultralytics==8.1.24
azure-identity
numpy
opencv-python==4.8.1.78
opencv-python-headless==4.8.0.74
supervision==0.18.0
gradio==4.40.0
opencv-python
opencv-python-headless
gradio
dill
accelerate

View File

@@ -74,11 +74,10 @@ def get_caption_model_processor(model_name="Salesforce/blip2-opt-2.7b", device=N
return {'model': model.to(device), 'processor': processor}
def get_yolo_model():
def get_yolo_model(model_path):
from ultralytics import YOLO
# Load the model.
# model = YOLO('/home/yadonglu/sandbox/data/yolo/runs/detect/yolov8n_v8_xcyc/weights/best.pt')
model = YOLO('/home/yadonglu/sandbox/data/yolo/runs/detect/yolov8n_v8_seq_xcyc_b32_n4_office_ep20/weights/best.pt')
model = YOLO(model_path)
return model