Merge branch 'main' into feature/arm64-support
This commit is contained in:
18
.env.example
18
.env.example
@@ -2,6 +2,7 @@ OPENAI_ENDPOINT=https://api.openai.com/v1
|
||||
OPENAI_API_KEY=
|
||||
|
||||
ANTHROPIC_API_KEY=
|
||||
ANTHROPIC_ENDPOINT=https://api.anthropic.com
|
||||
|
||||
GOOGLE_API_KEY=
|
||||
|
||||
@@ -11,6 +12,11 @@ AZURE_OPENAI_API_KEY=
|
||||
DEEPSEEK_ENDPOINT=https://api.deepseek.com
|
||||
DEEPSEEK_API_KEY=
|
||||
|
||||
MISTRAL_API_KEY=
|
||||
MISTRAL_ENDPOINT=https://api.mistral.ai/v1
|
||||
|
||||
OLLAMA_ENDPOINT=http://localhost:11434
|
||||
|
||||
# Set to false to disable anonymized telemetry
|
||||
ANONYMIZED_TELEMETRY=true
|
||||
|
||||
@@ -22,12 +28,16 @@ CHROME_PATH=
|
||||
CHROME_USER_DATA=
|
||||
CHROME_DEBUGGING_PORT=9222
|
||||
CHROME_DEBUGGING_HOST=localhost
|
||||
CHROME_PERSISTENT_SESSION=false # Set to true to keep browser open between AI tasks
|
||||
# Set to true to keep browser open between AI tasks
|
||||
CHROME_PERSISTENT_SESSION=false
|
||||
|
||||
# Display settings
|
||||
RESOLUTION=1920x1080x24 # Format: WIDTHxHEIGHTxDEPTH
|
||||
RESOLUTION_WIDTH=1920 # Width in pixels
|
||||
RESOLUTION_HEIGHT=1080 # Height in pixels
|
||||
# Format: WIDTHxHEIGHTxDEPTH
|
||||
RESOLUTION=1920x1080x24
|
||||
# Width in pixels
|
||||
RESOLUTION_WIDTH=1920
|
||||
# Height in pixels
|
||||
RESOLUTION_HEIGHT=1080
|
||||
|
||||
# VNC settings
|
||||
VNC_PASSWORD=youvncpassword
|
||||
@@ -3,6 +3,7 @@ FROM python:3.11-slim
|
||||
# Install system dependencies
|
||||
RUN apt-get update && apt-get install -y \
|
||||
wget \
|
||||
netcat-traditional \
|
||||
gnupg \
|
||||
curl \
|
||||
unzip \
|
||||
|
||||
107
README.md
107
README.md
@@ -11,7 +11,7 @@ This project builds upon the foundation of the [browser-use](https://github.com/
|
||||
|
||||
We would like to officially thank [WarmShao](https://github.com/warmshao) for his contribution to this project.
|
||||
|
||||
**WebUI:** is built on Gradio and supports a most of `browser-use` functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent.
|
||||
**WebUI:** is built on Gradio and supports most of `browser-use` functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent.
|
||||
|
||||
**Expanded LLM Support:** We've integrated support for various Large Language Models (LLMs), including: Gemini, OpenAI, Azure OpenAI, Anthropic, DeepSeek, Ollama etc. And we plan to add support for even more models in the future.
|
||||
|
||||
@@ -21,64 +21,93 @@ We would like to officially thank [WarmShao](https://github.com/warmshao) for hi
|
||||
|
||||
<video src="https://github.com/user-attachments/assets/56bc7080-f2e3-4367-af22-6bf2245ff6cb" controls="controls">Your browser does not support playing this video!</video>
|
||||
|
||||
## Installation Options
|
||||
## Installation Guide
|
||||
|
||||
### Prerequisites
|
||||
- Python 3.11 or higher
|
||||
- Git (for cloning the repository)
|
||||
|
||||
### Option 1: Local Installation
|
||||
|
||||
Read the [quickstart guide](https://docs.browser-use.com/quickstart#prepare-the-environment) or follow the steps below to get started.
|
||||
|
||||
> Python 3.11 or higher is required.
|
||||
#### Step 1: Clone the Repository
|
||||
```bash
|
||||
git clone https://github.com/browser-use/web-ui.git
|
||||
cd web-ui
|
||||
```
|
||||
|
||||
First, we recommend using [uv](https://docs.astral.sh/uv/) to setup the Python environment.
|
||||
#### Step 2: Set Up Python Environment
|
||||
We recommend using [uv](https://docs.astral.sh/uv/) for managing the Python environment.
|
||||
|
||||
Using uv (recommended):
|
||||
```bash
|
||||
uv venv --python 3.11
|
||||
```
|
||||
|
||||
and activate it with:
|
||||
|
||||
Activate the virtual environment:
|
||||
- Windows (Command Prompt):
|
||||
```cmd
|
||||
.venv\Scripts\activate
|
||||
```
|
||||
- Windows (PowerShell):
|
||||
```powershell
|
||||
.\.venv\Scripts\Activate.ps1
|
||||
```
|
||||
- macOS/Linux:
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
Install the dependencies:
|
||||
|
||||
#### Step 3: Install Dependencies
|
||||
Install Python packages:
|
||||
```bash
|
||||
uv pip install -r requirements.txt
|
||||
```
|
||||
|
||||
Then install playwright:
|
||||
|
||||
Install Playwright:
|
||||
```bash
|
||||
playwright install
|
||||
```
|
||||
|
||||
#### Step 4: Configure Environment
|
||||
1. Create a copy of the example environment file:
|
||||
- Windows (Command Prompt):
|
||||
```bash
|
||||
copy .env.example .env
|
||||
```
|
||||
- macOS/Linux/Windows (PowerShell):
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
2. Open `.env` in your preferred text editor and add your API keys and other settings
|
||||
|
||||
### Option 2: Docker Installation
|
||||
|
||||
1. **Prerequisites:**
|
||||
- Docker and Docker Compose installed on your system
|
||||
- Git to clone the repository
|
||||
#### Prerequisites
|
||||
- Docker and Docker Compose installed
|
||||
- [Docker Desktop](https://www.docker.com/products/docker-desktop/) (For Windows/macOS)
|
||||
- [Docker Engine](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) (For Linux)
|
||||
|
||||
2. **Setup:**
|
||||
#### Installation Steps
|
||||
1. Clone the repository:
|
||||
```bash
|
||||
# Clone the repository
|
||||
git clone https://github.com/browser-use/web-ui.git
|
||||
cd web-ui
|
||||
|
||||
# Copy and configure environment variables
|
||||
cp .env.example .env
|
||||
# Edit .env with your preferred text editor and add your API keys
|
||||
```
|
||||
|
||||
3. **Run with Docker:**
|
||||
2. Create and configure environment file:
|
||||
- Windows (Command Prompt):
|
||||
```bash
|
||||
# Build and start the container with default settings (browser closes after AI tasks)
|
||||
docker compose up --build
|
||||
|
||||
# Or run with persistent browser (browser stays open between AI tasks)
|
||||
CHROME_PERSISTENT_SESSION=true docker compose up --build
|
||||
copy .env.example .env
|
||||
```
|
||||
- macOS/Linux/Windows (PowerShell):
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
Edit `.env` with your preferred text editor and add your API keys
|
||||
|
||||
feature/arm64-support
|
||||
4. **Access the Application:**
|
||||
- WebUI: `http://localhost:7788`
|
||||
- VNC Viewer (to see browser interactions): `http://localhost:6080/vnc.html`
|
||||
@@ -86,16 +115,32 @@ playwright install
|
||||
|
||||
Default VNC password is "vncpassword". You can change it by setting the `VNC_PASSWORD` environment variable in your `.env` file.
|
||||
|
||||
3. Run with Docker:
|
||||
```bash
|
||||
# Build and start the container with default settings (browser closes after AI tasks)
|
||||
docker compose up --build
|
||||
```
|
||||
```bash
|
||||
# Or run with persistent browser (browser stays open between AI tasks)
|
||||
CHROME_PERSISTENT_SESSION=true docker compose up --build
|
||||
```
|
||||
|
||||
|
||||
4. Access the Application:
|
||||
- Web Interface: Open `http://localhost:7788` in your browser
|
||||
- VNC Viewer (for watching browser interactions): Open `http://localhost:6080/vnc.html`
|
||||
- Default VNC password: "youvncpassword"
|
||||
- Can be changed by setting `VNC_PASSWORD` in your `.env` file
|
||||
|
||||
## Usage
|
||||
|
||||
### Local Setup
|
||||
1. Copy `.env.example` to `.env` and set your environment variables, including API keys for the LLM. `cp .env.example .env`
|
||||
2. **Run the WebUI:**
|
||||
1. **Run the WebUI:**
|
||||
After completing the installation steps above, start the application:
|
||||
```bash
|
||||
python webui.py --ip 127.0.0.1 --port 7788
|
||||
```
|
||||
4. WebUI options:
|
||||
2. WebUI options:
|
||||
- `--ip`: The IP address to bind the WebUI to. Default is `127.0.0.1`.
|
||||
- `--port`: The port to bind the WebUI to. Default is `7788`.
|
||||
- `--theme`: The theme for the user interface. Default is `Ocean`.
|
||||
@@ -109,7 +154,7 @@ playwright install
|
||||
- `--dark-mode`: Enables dark mode for the user interface.
|
||||
3. **Access the WebUI:** Open your web browser and navigate to `http://127.0.0.1:7788`.
|
||||
4. **Using Your Own Browser(Optional):**
|
||||
- Set `CHROME_PATH` to the executable path of your browser and `CHROME_USER_DATA` to the user data directory of your browser.
|
||||
- Set `CHROME_PATH` to the executable path of your browser and `CHROME_USER_DATA` to the user data directory of your browser. Leave `CHROME_USER_DATA` empty if you want to use local user data.
|
||||
- Windows
|
||||
```env
|
||||
CHROME_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe"
|
||||
@@ -119,7 +164,7 @@ playwright install
|
||||
- Mac
|
||||
```env
|
||||
CHROME_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
|
||||
CHROME_USER_DATA="~/Library/Application Support/Google/Chrome/Profile 1"
|
||||
CHROME_USER_DATA="/Users/YourUsername/Library/Application Support/Google/Chrome"
|
||||
```
|
||||
- Close all Chrome windows
|
||||
- Open the WebUI in a non-Chrome browser, such as Firefox or Edge. This is important because the persistent browser context will use the Chrome data when running the agent.
|
||||
@@ -185,6 +230,6 @@ playwright install
|
||||
```
|
||||
|
||||
## Changelog
|
||||
|
||||
- [x] **2025/01/26:** Thanks to @vvincent1234. Now browser-use-webui can combine with DeepSeek-r1 to engage in deep thinking!
|
||||
- [x] **2025/01/10:** Thanks to @casistack. Now we have Docker Setup option and also Support keep browser open between tasks.[Video tutorial demo](https://github.com/browser-use/web-ui/issues/1#issuecomment-2582511750).
|
||||
- [x] **2025/01/06:** Thanks to @richard-devbot. A New and Well-Designed WebUI is released. [Video tutorial demo](https://github.com/warmshao/browser-use-webui/issues/1#issuecomment-2573393113).
|
||||
19
SECURITY.md
Normal file
19
SECURITY.md
Normal file
@@ -0,0 +1,19 @@
|
||||
## Reporting Security Issues
|
||||
|
||||
If you believe you have found a security vulnerability in browser-use, please report it through coordinated disclosure.
|
||||
|
||||
**Please do not report security vulnerabilities through the repository issues, discussions, or pull requests.**
|
||||
|
||||
Instead, please open a new [Github security advisory](https://github.com/browser-use/web-ui/security/advisories/new).
|
||||
|
||||
Please include as much of the information listed below as you can to help me better understand and resolve the issue:
|
||||
|
||||
* The type of issue (e.g., buffer overflow, SQL injection, or cross-site scripting)
|
||||
* Full paths of source file(s) related to the manifestation of the issue
|
||||
* The location of the affected source code (tag/branch/commit or direct URL)
|
||||
* Any special configuration required to reproduce the issue
|
||||
* Step-by-step instructions to reproduce the issue
|
||||
* Proof-of-concept or exploit code (if possible)
|
||||
* Impact of the issue, including how an attacker might exploit the issue
|
||||
|
||||
This information will help me triage your report more quickly.
|
||||
@@ -1,5 +1,6 @@
|
||||
services:
|
||||
browser-use-webui:
|
||||
platform: linux/amd64
|
||||
build:
|
||||
context: .
|
||||
dockerfile: ${DOCKERFILE:-Dockerfile}
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
browser-use==0.1.19
|
||||
langchain-google-genai==2.0.8
|
||||
browser-use==0.1.29
|
||||
pyperclip==1.9.0
|
||||
gradio==5.9.1
|
||||
langchain-ollama==0.2.2
|
||||
langchain-openai==0.2.14
|
||||
gradio==5.10.0
|
||||
json-repair
|
||||
langchain-mistralai==0.2.4
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/1
|
||||
# @Author : wenshao
|
||||
# @Email : wenshaoguo1026@gmail.com
|
||||
# @Project : browser-use-webui
|
||||
# @FileName: __init__.py.py
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/1
|
||||
# @Author : wenshao
|
||||
# @Email : wenshaoguo1026@gmail.com
|
||||
# @Project : browser-use-webui
|
||||
# @FileName: __init__.py.py
|
||||
|
||||
@@ -1,23 +1,18 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/2
|
||||
# @Author : wenshao
|
||||
# @ProjectName: browser-use-webui
|
||||
# @FileName: custom_agent.py
|
||||
|
||||
import json
|
||||
import logging
|
||||
import pdb
|
||||
import traceback
|
||||
from typing import Optional, Type
|
||||
from typing import Optional, Type, List, Dict, Any, Callable
|
||||
from PIL import Image, ImageDraw, ImageFont
|
||||
import os
|
||||
import base64
|
||||
import io
|
||||
|
||||
from browser_use.agent.prompts import SystemPrompt
|
||||
import platform
|
||||
from browser_use.agent.prompts import SystemPrompt, AgentMessagePrompt
|
||||
from browser_use.agent.service import Agent
|
||||
from browser_use.agent.views import (
|
||||
ActionResult,
|
||||
ActionModel,
|
||||
AgentHistoryList,
|
||||
AgentOutput,
|
||||
AgentHistory,
|
||||
@@ -29,13 +24,14 @@ from browser_use.controller.service import Controller
|
||||
from browser_use.telemetry.views import (
|
||||
AgentEndTelemetryEvent,
|
||||
AgentRunTelemetryEvent,
|
||||
AgentStepErrorTelemetryEvent,
|
||||
AgentStepTelemetryEvent,
|
||||
)
|
||||
from browser_use.utils import time_execution_async
|
||||
from langchain_core.language_models.chat_models import BaseChatModel
|
||||
from langchain_core.messages import (
|
||||
BaseMessage,
|
||||
)
|
||||
from json_repair import repair_json
|
||||
from src.utils.agent_state import AgentState
|
||||
|
||||
from .custom_massage_manager import CustomMassageManager
|
||||
@@ -58,6 +54,7 @@ class CustomAgent(Agent):
|
||||
max_failures: int = 5,
|
||||
retry_delay: int = 10,
|
||||
system_prompt_class: Type[SystemPrompt] = SystemPrompt,
|
||||
agent_prompt_class: Type[AgentMessagePrompt] = AgentMessagePrompt,
|
||||
max_input_tokens: int = 128000,
|
||||
validate_output: bool = False,
|
||||
include_attributes: list[str] = [
|
||||
@@ -76,6 +73,11 @@ class CustomAgent(Agent):
|
||||
max_actions_per_step: int = 10,
|
||||
tool_call_in_content: bool = True,
|
||||
agent_state: AgentState = None,
|
||||
initial_actions: Optional[List[Dict[str, Dict[str, Any]]]] = None,
|
||||
# Cloud Callbacks
|
||||
register_new_step_callback: Callable[['BrowserState', 'AgentOutput', int], None] | None = None,
|
||||
register_done_callback: Callable[['AgentHistoryList'], None] | None = None,
|
||||
tool_calling_method: Optional[str] = 'auto',
|
||||
):
|
||||
super().__init__(
|
||||
task=task,
|
||||
@@ -94,26 +96,36 @@ class CustomAgent(Agent):
|
||||
max_error_length=max_error_length,
|
||||
max_actions_per_step=max_actions_per_step,
|
||||
tool_call_in_content=tool_call_in_content,
|
||||
initial_actions=initial_actions,
|
||||
register_new_step_callback=register_new_step_callback,
|
||||
register_done_callback=register_done_callback,
|
||||
tool_calling_method=tool_calling_method
|
||||
)
|
||||
if self.llm.model_name in ["deepseek-reasoner"]:
|
||||
self.use_function_calling = False
|
||||
# TODO: deepseek-reasoner only support 64000 context
|
||||
if self.model_name in ["deepseek-reasoner"] or "deepseek-r1" in self.model_name:
|
||||
# deepseek-reasoner does not support function calling
|
||||
self.use_deepseek_r1 = True
|
||||
# deepseek-reasoner only support 64000 context
|
||||
self.max_input_tokens = 64000
|
||||
else:
|
||||
self.use_function_calling = True
|
||||
self.use_deepseek_r1 = False
|
||||
|
||||
# record last actions
|
||||
self._last_actions = None
|
||||
# custom new info
|
||||
self.add_infos = add_infos
|
||||
# agent_state for Stop
|
||||
self.agent_state = agent_state
|
||||
self.agent_prompt_class = agent_prompt_class
|
||||
self.message_manager = CustomMassageManager(
|
||||
llm=self.llm,
|
||||
task=self.task,
|
||||
action_descriptions=self.controller.registry.get_prompt_description(),
|
||||
system_prompt_class=self.system_prompt_class,
|
||||
agent_prompt_class=agent_prompt_class,
|
||||
max_input_tokens=self.max_input_tokens,
|
||||
include_attributes=self.include_attributes,
|
||||
max_error_length=self.max_error_length,
|
||||
max_actions_per_step=self.max_actions_per_step,
|
||||
tool_call_in_content=tool_call_in_content,
|
||||
use_function_calling=self.use_function_calling
|
||||
max_actions_per_step=self.max_actions_per_step
|
||||
)
|
||||
|
||||
def _setup_action_models(self) -> None:
|
||||
@@ -172,52 +184,35 @@ class CustomAgent(Agent):
|
||||
@time_execution_async("--get_next_action")
|
||||
async def get_next_action(self, input_messages: list[BaseMessage]) -> AgentOutput:
|
||||
"""Get next action from LLM based on current state"""
|
||||
if self.use_function_calling:
|
||||
try:
|
||||
structured_llm = self.llm.with_structured_output(self.AgentOutput, include_raw=True)
|
||||
response: dict[str, Any] = await structured_llm.ainvoke(input_messages) # type: ignore
|
||||
messages_to_process = (
|
||||
self.message_manager.merge_successive_human_messages(input_messages)
|
||||
if self.use_deepseek_r1
|
||||
else input_messages
|
||||
)
|
||||
|
||||
parsed: AgentOutput = response['parsed']
|
||||
# cut the number of actions to max_actions_per_step
|
||||
parsed.action = parsed.action[: self.max_actions_per_step]
|
||||
self._log_response(parsed)
|
||||
self.n_steps += 1
|
||||
ai_message = self.llm.invoke(messages_to_process)
|
||||
self.message_manager._add_message_with_tokens(ai_message)
|
||||
|
||||
return parsed
|
||||
except Exception as e:
|
||||
# If something goes wrong, try to invoke the LLM again without structured output,
|
||||
# and Manually parse the response. Temporarily solution for DeepSeek
|
||||
ret = self.llm.invoke(input_messages)
|
||||
if isinstance(ret.content, list):
|
||||
parsed_json = json.loads(ret.content[0].replace("```json", "").replace("```", ""))
|
||||
if self.use_deepseek_r1:
|
||||
logger.info("🤯 Start Deep Thinking: ")
|
||||
logger.info(ai_message.reasoning_content)
|
||||
logger.info("🤯 End Deep Thinking")
|
||||
|
||||
if isinstance(ai_message.content, list):
|
||||
ai_content = ai_message.content[0]
|
||||
else:
|
||||
parsed_json = json.loads(ret.content.replace("```json", "").replace("```", ""))
|
||||
ai_content = ai_message.content
|
||||
|
||||
ai_content = ai_content.replace("```json", "").replace("```", "")
|
||||
ai_content = repair_json(ai_content)
|
||||
parsed_json = json.loads(ai_content)
|
||||
parsed: AgentOutput = self.AgentOutput(**parsed_json)
|
||||
|
||||
if parsed is None:
|
||||
raise ValueError(f'Could not parse response.')
|
||||
logger.debug(ai_message.content)
|
||||
raise ValueError('Could not parse response.')
|
||||
|
||||
# cut the number of actions to max_actions_per_step
|
||||
parsed.action = parsed.action[: self.max_actions_per_step]
|
||||
self._log_response(parsed)
|
||||
self.n_steps += 1
|
||||
|
||||
return parsed
|
||||
else:
|
||||
ret = self.llm.invoke(input_messages)
|
||||
if not self.use_function_calling:
|
||||
self.message_manager._add_message_with_tokens(ret)
|
||||
logger.info(f"🤯 Start Deep Thinking: ")
|
||||
logger.info(ret.reasoning_content)
|
||||
logger.info(f"🤯 End Deep Thinking")
|
||||
if isinstance(ret.content, list):
|
||||
parsed_json = json.loads(ret.content[0].replace("```json", "").replace("```", ""))
|
||||
else:
|
||||
parsed_json = json.loads(ret.content.replace("```json", "").replace("```", ""))
|
||||
parsed: AgentOutput = self.AgentOutput(**parsed_json)
|
||||
if parsed is None:
|
||||
raise ValueError(f'Could not parse response.')
|
||||
|
||||
# cut the number of actions to max_actions_per_step
|
||||
# Limit actions to maximum allowed per step
|
||||
parsed.action = parsed.action[: self.max_actions_per_step]
|
||||
self._log_response(parsed)
|
||||
self.n_steps += 1
|
||||
@@ -234,50 +229,200 @@ class CustomAgent(Agent):
|
||||
|
||||
try:
|
||||
state = await self.browser_context.get_state(use_vision=self.use_vision)
|
||||
self.message_manager.add_state_message(state, self._last_result, step_info)
|
||||
self.message_manager.add_state_message(state, self._last_actions, self._last_result, step_info)
|
||||
input_messages = self.message_manager.get_messages()
|
||||
try:
|
||||
model_output = await self.get_next_action(input_messages)
|
||||
if self.register_new_step_callback:
|
||||
self.register_new_step_callback(state, model_output, self.n_steps)
|
||||
self.update_step_info(model_output, step_info)
|
||||
logger.info(f"🧠 All Memory: \n{step_info.memory}")
|
||||
self._save_conversation(input_messages, model_output)
|
||||
if self.use_function_calling:
|
||||
self.message_manager._remove_last_state_message() # we dont want the whole state in the chat history
|
||||
self.message_manager.add_model_output(model_output)
|
||||
if self.model_name != "deepseek-reasoner":
|
||||
# remove prev message
|
||||
self.message_manager._remove_state_message_by_index(-1)
|
||||
except Exception as e:
|
||||
# model call failed, remove last state message from history
|
||||
self.message_manager._remove_state_message_by_index(-1)
|
||||
raise e
|
||||
|
||||
actions: list[ActionModel] = model_output.action
|
||||
result: list[ActionResult] = await self.controller.multi_act(
|
||||
model_output.action, self.browser_context
|
||||
actions, self.browser_context
|
||||
)
|
||||
if len(result) != len(model_output.action):
|
||||
for ri in range(len(result), len(model_output.action)):
|
||||
if len(result) != len(actions):
|
||||
# I think something changes, such information should let LLM know
|
||||
for ri in range(len(result), len(actions)):
|
||||
result.append(ActionResult(extracted_content=None,
|
||||
include_in_memory=True,
|
||||
error=f"{model_output.action[ri].model_dump_json(exclude_unset=True)} is Failed to execute. \
|
||||
Something new appeared after action {model_output.action[len(result) - 1].model_dump_json(exclude_unset=True)}",
|
||||
error=f"{actions[ri].model_dump_json(exclude_unset=True)} is Failed to execute. \
|
||||
Something new appeared after action {actions[len(result) - 1].model_dump_json(exclude_unset=True)}",
|
||||
is_done=False))
|
||||
if len(actions) == 0:
|
||||
# TODO: fix no action case
|
||||
result = [ActionResult(is_done=True, extracted_content=step_info.memory, include_in_memory=True)]
|
||||
self._last_result = result
|
||||
|
||||
self._last_actions = actions
|
||||
if len(result) > 0 and result[-1].is_done:
|
||||
logger.info(f"📄 Result: {result[-1].extracted_content}")
|
||||
|
||||
self.consecutive_failures = 0
|
||||
|
||||
except Exception as e:
|
||||
result = self._handle_step_error(e)
|
||||
result = await self._handle_step_error(e)
|
||||
self._last_result = result
|
||||
|
||||
finally:
|
||||
actions = [a.model_dump(exclude_unset=True) for a in model_output.action] if model_output else []
|
||||
self.telemetry.capture(
|
||||
AgentStepTelemetryEvent(
|
||||
agent_id=self.agent_id,
|
||||
step=self.n_steps,
|
||||
actions=actions,
|
||||
consecutive_failures=self.consecutive_failures,
|
||||
step_error=[r.error for r in result if r.error] if result else ['No result'],
|
||||
)
|
||||
)
|
||||
if not result:
|
||||
return
|
||||
for r in result:
|
||||
if r.error:
|
||||
self.telemetry.capture(
|
||||
AgentStepErrorTelemetryEvent(
|
||||
agent_id=self.agent_id,
|
||||
error=r.error,
|
||||
)
|
||||
)
|
||||
|
||||
if state:
|
||||
self._make_history_item(model_output, state, result)
|
||||
|
||||
async def run(self, max_steps: int = 100) -> AgentHistoryList:
|
||||
"""Execute the task with maximum number of steps"""
|
||||
try:
|
||||
self._log_agent_run()
|
||||
|
||||
# Execute initial actions if provided
|
||||
if self.initial_actions:
|
||||
result = await self.controller.multi_act(self.initial_actions, self.browser_context, check_for_new_elements=False)
|
||||
self._last_result = result
|
||||
|
||||
step_info = CustomAgentStepInfo(
|
||||
task=self.task,
|
||||
add_infos=self.add_infos,
|
||||
step_number=1,
|
||||
max_steps=max_steps,
|
||||
memory="",
|
||||
task_progress="",
|
||||
future_plans=""
|
||||
)
|
||||
|
||||
for step in range(max_steps):
|
||||
# 1) Check if stop requested
|
||||
if self.agent_state and self.agent_state.is_stop_requested():
|
||||
logger.info("🛑 Stop requested by user")
|
||||
self._create_stop_history_item()
|
||||
break
|
||||
|
||||
# 2) Store last valid state before step
|
||||
if self.browser_context and self.agent_state:
|
||||
state = await self.browser_context.get_state(use_vision=self.use_vision)
|
||||
self.agent_state.set_last_valid_state(state)
|
||||
|
||||
if self._too_many_failures():
|
||||
break
|
||||
|
||||
# 3) Do the step
|
||||
await self.step(step_info)
|
||||
|
||||
if self.history.is_done():
|
||||
if (
|
||||
self.validate_output and step < max_steps - 1
|
||||
): # if last step, we dont need to validate
|
||||
if not await self._validate_output():
|
||||
continue
|
||||
|
||||
logger.info("✅ Task completed successfully")
|
||||
break
|
||||
else:
|
||||
logger.info("❌ Failed to complete task in maximum steps")
|
||||
|
||||
return self.history
|
||||
|
||||
finally:
|
||||
self.telemetry.capture(
|
||||
AgentEndTelemetryEvent(
|
||||
agent_id=self.agent_id,
|
||||
success=self.history.is_done(),
|
||||
steps=self.n_steps,
|
||||
max_steps_reached=self.n_steps >= max_steps,
|
||||
errors=self.history.errors(),
|
||||
)
|
||||
)
|
||||
|
||||
if not self.injected_browser_context:
|
||||
await self.browser_context.close()
|
||||
|
||||
if not self.injected_browser and self.browser:
|
||||
await self.browser.close()
|
||||
|
||||
if self.generate_gif:
|
||||
output_path: str = 'agent_history.gif'
|
||||
if isinstance(self.generate_gif, str):
|
||||
output_path = self.generate_gif
|
||||
|
||||
self.create_history_gif(output_path=output_path)
|
||||
|
||||
def _create_stop_history_item(self):
|
||||
"""Create a history item for when the agent is stopped."""
|
||||
try:
|
||||
# Attempt to retrieve the last valid state from agent_state
|
||||
state = None
|
||||
if self.agent_state:
|
||||
last_state = self.agent_state.get_last_valid_state()
|
||||
if last_state:
|
||||
# Convert to BrowserStateHistory
|
||||
state = BrowserStateHistory(
|
||||
url=getattr(last_state, 'url', ""),
|
||||
title=getattr(last_state, 'title', ""),
|
||||
tabs=getattr(last_state, 'tabs', []),
|
||||
interacted_element=[None],
|
||||
screenshot=getattr(last_state, 'screenshot', None)
|
||||
)
|
||||
else:
|
||||
state = self._create_empty_state()
|
||||
else:
|
||||
state = self._create_empty_state()
|
||||
|
||||
# Create a final item in the agent history indicating done
|
||||
stop_history = AgentHistory(
|
||||
model_output=None,
|
||||
state=state,
|
||||
result=[ActionResult(extracted_content=None, error=None, is_done=True)]
|
||||
)
|
||||
self.history.history.append(stop_history)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error creating stop history item: {e}")
|
||||
# Create empty state as fallback
|
||||
state = self._create_empty_state()
|
||||
stop_history = AgentHistory(
|
||||
model_output=None,
|
||||
state=state,
|
||||
result=[ActionResult(extracted_content=None, error=None, is_done=True)]
|
||||
)
|
||||
self.history.history.append(stop_history)
|
||||
|
||||
def _convert_to_browser_state_history(self, browser_state):
|
||||
return BrowserStateHistory(
|
||||
url=getattr(browser_state, 'url', ""),
|
||||
title=getattr(browser_state, 'title', ""),
|
||||
tabs=getattr(browser_state, 'tabs', []),
|
||||
interacted_element=[None],
|
||||
screenshot=getattr(browser_state, 'screenshot', None)
|
||||
)
|
||||
|
||||
def _create_empty_state(self):
|
||||
return BrowserStateHistory(
|
||||
url="",
|
||||
title="",
|
||||
tabs=[],
|
||||
interacted_element=[None],
|
||||
screenshot=None
|
||||
)
|
||||
|
||||
def create_history_gif(
|
||||
self,
|
||||
output_path: str = 'agent_history.gif',
|
||||
@@ -310,10 +455,9 @@ class CustomAgent(Agent):
|
||||
|
||||
for font_name in font_options:
|
||||
try:
|
||||
import platform
|
||||
if platform.system() == "Windows":
|
||||
if platform.system() == 'Windows':
|
||||
# Need to specify the abs font path on Windows
|
||||
font_name = os.path.join(os.getenv("WIN_FONT_DIR", "C:\\Windows\\Fonts"), font_name + ".ttf")
|
||||
font_name = os.path.join(os.getenv('WIN_FONT_DIR', 'C:\\Windows\\Fonts'), font_name + '.ttf')
|
||||
regular_font = ImageFont.truetype(font_name, font_size)
|
||||
title_font = ImageFont.truetype(font_name, title_font_size)
|
||||
goal_font = ImageFont.truetype(font_name, goal_font_size)
|
||||
@@ -391,133 +535,3 @@ class CustomAgent(Agent):
|
||||
logger.info(f'Created GIF at {output_path}')
|
||||
else:
|
||||
logger.warning('No images found in history to create GIF')
|
||||
|
||||
async def run(self, max_steps: int = 100) -> AgentHistoryList:
|
||||
"""Execute the task with maximum number of steps"""
|
||||
try:
|
||||
logger.info(f"🚀 Starting task: {self.task}")
|
||||
|
||||
self.telemetry.capture(
|
||||
AgentRunTelemetryEvent(
|
||||
agent_id=self.agent_id,
|
||||
task=self.task,
|
||||
)
|
||||
)
|
||||
|
||||
step_info = CustomAgentStepInfo(
|
||||
task=self.task,
|
||||
add_infos=self.add_infos,
|
||||
step_number=1,
|
||||
max_steps=max_steps,
|
||||
memory="",
|
||||
task_progress="",
|
||||
future_plans=""
|
||||
)
|
||||
|
||||
for step in range(max_steps):
|
||||
# 1) Check if stop requested
|
||||
if self.agent_state and self.agent_state.is_stop_requested():
|
||||
logger.info("🛑 Stop requested by user")
|
||||
self._create_stop_history_item()
|
||||
break
|
||||
|
||||
# 2) Store last valid state before step
|
||||
if self.browser_context and self.agent_state:
|
||||
state = await self.browser_context.get_state(use_vision=self.use_vision)
|
||||
self.agent_state.set_last_valid_state(state)
|
||||
|
||||
if self._too_many_failures():
|
||||
break
|
||||
|
||||
# 3) Do the step
|
||||
await self.step(step_info)
|
||||
|
||||
if self.history.is_done():
|
||||
if (
|
||||
self.validate_output and step < max_steps - 1
|
||||
): # if last step, we dont need to validate
|
||||
if not await self._validate_output():
|
||||
continue
|
||||
|
||||
logger.info("✅ Task completed successfully")
|
||||
break
|
||||
else:
|
||||
logger.info("❌ Failed to complete task in maximum steps")
|
||||
|
||||
return self.history
|
||||
|
||||
finally:
|
||||
self.telemetry.capture(
|
||||
AgentEndTelemetryEvent(
|
||||
agent_id=self.agent_id,
|
||||
task=self.task,
|
||||
success=self.history.is_done(),
|
||||
steps=len(self.history.history),
|
||||
)
|
||||
)
|
||||
if not self.injected_browser_context:
|
||||
await self.browser_context.close()
|
||||
|
||||
if not self.injected_browser and self.browser:
|
||||
await self.browser.close()
|
||||
|
||||
if self.generate_gif:
|
||||
self.create_history_gif()
|
||||
|
||||
def _create_stop_history_item(self):
|
||||
"""Create a history item for when the agent is stopped."""
|
||||
try:
|
||||
# Attempt to retrieve the last valid state from agent_state
|
||||
state = None
|
||||
if self.agent_state:
|
||||
last_state = self.agent_state.get_last_valid_state()
|
||||
if last_state:
|
||||
# Convert to BrowserStateHistory
|
||||
state = BrowserStateHistory(
|
||||
url=getattr(last_state, 'url', ""),
|
||||
title=getattr(last_state, 'title', ""),
|
||||
tabs=getattr(last_state, 'tabs', []),
|
||||
interacted_element=[None],
|
||||
screenshot=getattr(last_state, 'screenshot', None)
|
||||
)
|
||||
else:
|
||||
state = self._create_empty_state()
|
||||
else:
|
||||
state = self._create_empty_state()
|
||||
|
||||
# Create a final item in the agent history indicating done
|
||||
stop_history = AgentHistory(
|
||||
model_output=None,
|
||||
state=state,
|
||||
result=[ActionResult(extracted_content=None, error=None, is_done=True)]
|
||||
)
|
||||
self.history.history.append(stop_history)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error creating stop history item: {e}")
|
||||
# Create empty state as fallback
|
||||
state = self._create_empty_state()
|
||||
stop_history = AgentHistory(
|
||||
model_output=None,
|
||||
state=state,
|
||||
result=[ActionResult(extracted_content=None, error=None, is_done=True)]
|
||||
)
|
||||
self.history.history.append(stop_history)
|
||||
|
||||
def _convert_to_browser_state_history(self, browser_state):
|
||||
return BrowserStateHistory(
|
||||
url=getattr(browser_state, 'url', ""),
|
||||
title=getattr(browser_state, 'title', ""),
|
||||
tabs=getattr(browser_state, 'tabs', []),
|
||||
interacted_element=[None],
|
||||
screenshot=getattr(browser_state, 'screenshot', None)
|
||||
)
|
||||
|
||||
def _create_empty_state(self):
|
||||
return BrowserStateHistory(
|
||||
url="",
|
||||
title="",
|
||||
tabs=[],
|
||||
interacted_element=[None],
|
||||
screenshot=None
|
||||
)
|
||||
|
||||
@@ -1,9 +1,3 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/2
|
||||
# @Author : wenshao
|
||||
# @ProjectName: browser-use-webui
|
||||
# @FileName: custom_massage_manager.py
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
@@ -11,15 +5,20 @@ from typing import List, Optional, Type
|
||||
|
||||
from browser_use.agent.message_manager.service import MessageManager
|
||||
from browser_use.agent.message_manager.views import MessageHistory
|
||||
from browser_use.agent.prompts import SystemPrompt
|
||||
from browser_use.agent.views import ActionResult, AgentStepInfo
|
||||
from browser_use.agent.prompts import SystemPrompt, AgentMessagePrompt
|
||||
from browser_use.agent.views import ActionResult, AgentStepInfo, ActionModel
|
||||
from browser_use.browser.views import BrowserState
|
||||
from langchain_core.language_models import BaseChatModel
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_core.language_models import BaseChatModel
|
||||
from langchain_core.messages import (
|
||||
AIMessage,
|
||||
BaseMessage,
|
||||
HumanMessage,
|
||||
AIMessage
|
||||
ToolMessage
|
||||
)
|
||||
|
||||
from langchain_openai import ChatOpenAI
|
||||
from ..utils.llm import DeepSeekR1ChatOpenAI
|
||||
from .custom_prompts import CustomAgentMessagePrompt
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -32,14 +31,14 @@ class CustomMassageManager(MessageManager):
|
||||
task: str,
|
||||
action_descriptions: str,
|
||||
system_prompt_class: Type[SystemPrompt],
|
||||
agent_prompt_class: Type[AgentMessagePrompt],
|
||||
max_input_tokens: int = 128000,
|
||||
estimated_tokens_per_character: int = 3,
|
||||
estimated_characters_per_token: int = 3,
|
||||
image_tokens: int = 800,
|
||||
include_attributes: list[str] = [],
|
||||
max_error_length: int = 400,
|
||||
max_actions_per_step: int = 10,
|
||||
tool_call_in_content: bool = False,
|
||||
use_function_calling: bool = True
|
||||
message_context: Optional[str] = None
|
||||
):
|
||||
super().__init__(
|
||||
llm=llm,
|
||||
@@ -47,72 +46,72 @@ class CustomMassageManager(MessageManager):
|
||||
action_descriptions=action_descriptions,
|
||||
system_prompt_class=system_prompt_class,
|
||||
max_input_tokens=max_input_tokens,
|
||||
estimated_tokens_per_character=estimated_tokens_per_character,
|
||||
estimated_characters_per_token=estimated_characters_per_token,
|
||||
image_tokens=image_tokens,
|
||||
include_attributes=include_attributes,
|
||||
max_error_length=max_error_length,
|
||||
max_actions_per_step=max_actions_per_step,
|
||||
tool_call_in_content=tool_call_in_content,
|
||||
message_context=message_context
|
||||
)
|
||||
self.use_function_calling = use_function_calling
|
||||
self.agent_prompt_class = agent_prompt_class
|
||||
# Custom: Move Task info to state_message
|
||||
self.history = MessageHistory()
|
||||
self._add_message_with_tokens(self.system_prompt)
|
||||
|
||||
if self.use_function_calling:
|
||||
tool_calls = [
|
||||
{
|
||||
'name': 'CustomAgentOutput',
|
||||
'args': {
|
||||
'current_state': {
|
||||
'prev_action_evaluation': 'Unknown - No previous actions to evaluate.',
|
||||
'important_contents': '',
|
||||
'completed_contents': '',
|
||||
'thought': 'Now Google is open. Need to type OpenAI to search.',
|
||||
'summary': 'Type OpenAI to search.',
|
||||
},
|
||||
'action': [],
|
||||
},
|
||||
'id': '',
|
||||
'type': 'tool_call',
|
||||
}
|
||||
]
|
||||
if self.tool_call_in_content:
|
||||
# openai throws error if tool_calls are not responded -> move to content
|
||||
example_tool_call = AIMessage(
|
||||
content=f'{tool_calls}',
|
||||
tool_calls=[],
|
||||
)
|
||||
else:
|
||||
example_tool_call = AIMessage(
|
||||
content=f'',
|
||||
tool_calls=tool_calls,
|
||||
)
|
||||
|
||||
self._add_message_with_tokens(example_tool_call)
|
||||
if self.message_context:
|
||||
context_message = HumanMessage(content=self.message_context)
|
||||
self._add_message_with_tokens(context_message)
|
||||
|
||||
def cut_messages(self):
|
||||
"""Get current message list, potentially trimmed to max tokens"""
|
||||
diff = self.history.total_tokens - self.max_input_tokens
|
||||
i = 1 # start from 1 to keep system message in history
|
||||
while diff > 0 and i < len(self.history.messages):
|
||||
self.history.remove_message(i)
|
||||
min_message_len = 2 if self.message_context is not None else 1
|
||||
|
||||
while diff > 0 and len(self.history.messages) > min_message_len:
|
||||
self.history.remove_message(min_message_len) # alway remove the oldest message
|
||||
diff = self.history.total_tokens - self.max_input_tokens
|
||||
i += 1
|
||||
|
||||
def add_state_message(
|
||||
self,
|
||||
state: BrowserState,
|
||||
actions: Optional[List[ActionModel]] = None,
|
||||
result: Optional[List[ActionResult]] = None,
|
||||
step_info: Optional[AgentStepInfo] = None,
|
||||
) -> None:
|
||||
"""Add browser state as human message"""
|
||||
# otherwise add state message and result to next message (which will not stay in memory)
|
||||
state_message = CustomAgentMessagePrompt(
|
||||
state_message = self.agent_prompt_class(
|
||||
state,
|
||||
actions,
|
||||
result,
|
||||
include_attributes=self.include_attributes,
|
||||
max_error_length=self.max_error_length,
|
||||
step_info=step_info,
|
||||
).get_user_message()
|
||||
self._add_message_with_tokens(state_message)
|
||||
|
||||
def _count_text_tokens(self, text: str) -> int:
|
||||
if isinstance(self.llm, (ChatOpenAI, ChatAnthropic, DeepSeekR1ChatOpenAI)):
|
||||
try:
|
||||
tokens = self.llm.get_num_tokens(text)
|
||||
except Exception:
|
||||
tokens = (
|
||||
len(text) // self.estimated_characters_per_token
|
||||
) # Rough estimate if no tokenizer available
|
||||
else:
|
||||
tokens = (
|
||||
len(text) // self.estimated_characters_per_token
|
||||
) # Rough estimate if no tokenizer available
|
||||
return tokens
|
||||
|
||||
def _remove_state_message_by_index(self, remove_ind=-1) -> None:
|
||||
"""Remove last state message from history"""
|
||||
i = len(self.history.messages) - 1
|
||||
remove_cnt = 0
|
||||
while i >= 0:
|
||||
if isinstance(self.history.messages[i].message, HumanMessage):
|
||||
remove_cnt += 1
|
||||
if remove_cnt == abs(remove_ind):
|
||||
self.history.remove_message(i)
|
||||
break
|
||||
i -= 1
|
||||
@@ -1,13 +1,8 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/2
|
||||
# @Author : wenshao
|
||||
# @ProjectName: browser-use-webui
|
||||
# @FileName: custom_prompts.py
|
||||
import pdb
|
||||
from typing import List, Optional
|
||||
|
||||
from browser_use.agent.prompts import SystemPrompt
|
||||
from browser_use.agent.views import ActionResult
|
||||
from browser_use.agent.prompts import SystemPrompt, AgentMessagePrompt
|
||||
from browser_use.agent.views import ActionResult, ActionModel
|
||||
from browser_use.browser.views import BrowserState
|
||||
from langchain_core.messages import HumanMessage, SystemMessage
|
||||
|
||||
@@ -19,24 +14,19 @@ class CustomSystemPrompt(SystemPrompt):
|
||||
"""
|
||||
Returns the important rules for the agent.
|
||||
"""
|
||||
text = """
|
||||
text = r"""
|
||||
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
|
||||
{
|
||||
"current_state": {
|
||||
"prev_action_evaluation": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Ignore the action result. The website is the ground truth. Also mention if something unexpected happened like new suggestions in an input field. Shortly state why/why not. Note that the result you output must be consistent with the reasoning you output afterwards. If you consider it to be 'Failed,' you should reflect on this during your thought.",
|
||||
"important_contents": "Output important contents closely related to user\'s instruction or task on the current page. If there is, please output the contents. If not, please output empty string ''.",
|
||||
"important_contents": "Output important contents closely related to user\'s instruction on the current page. If there is, please output the contents. If not, please output empty string ''.",
|
||||
"task_progress": "Task Progress is a general summary of the current contents that have been completed. Just summarize the contents that have been actually completed based on the content at current step and the history operations. Please list each completed item individually, such as: 1. Input username. 2. Input Password. 3. Click confirm button. Please return string type not a list.",
|
||||
"future_plans": "Based on the user's request and the current state, outline the remaining steps needed to complete the task. This should be a concise list of actions yet to be performed, such as: 1. Select a date. 2. Choose a specific time slot. 3. Confirm booking. Please return string type not a list.",
|
||||
"thought": "Think about the requirements that have been completed in previous operations and the requirements that need to be completed in the next one operation. If your output of prev_action_evaluation is 'Failed', please reflect and output your reflection here.",
|
||||
"summary": "Please generate a brief natural language description for the operation in next actions based on your Thought."
|
||||
},
|
||||
"action": [
|
||||
{
|
||||
"action_name": {
|
||||
// action-specific parameters
|
||||
}
|
||||
},
|
||||
// ... more actions in sequence
|
||||
* actions in sequences, please refer to **Common action sequences**. Each output action MUST be formated as: \{action_name\: action_params\}*
|
||||
]
|
||||
}
|
||||
|
||||
@@ -49,7 +39,6 @@ class CustomSystemPrompt(SystemPrompt):
|
||||
{"click_element": {"index": 3}}
|
||||
]
|
||||
- Navigation and extraction: [
|
||||
{"open_new_tab": {}},
|
||||
{"go_to_url": {"url": "https://example.com"}},
|
||||
{"extract_page_content": {}}
|
||||
]
|
||||
@@ -67,7 +56,7 @@ class CustomSystemPrompt(SystemPrompt):
|
||||
- Use scroll to find elements you are looking for
|
||||
|
||||
5. TASK COMPLETION:
|
||||
- If you think all the requirements of user\'s instruction have been completed and no further operation is required, output the done action to terminate the operation process.
|
||||
- If you think all the requirements of user\'s instruction have been completed and no further operation is required, output the **Done** action to terminate the operation process.
|
||||
- Don't hallucinate actions.
|
||||
- If the task requires specific information - make sure to include everything in the done function. This is what the user will see.
|
||||
- If you are running out of steps (current step), think about speeding it up, and ALWAYS use the done action as the last action.
|
||||
@@ -132,7 +121,7 @@ class CustomSystemPrompt(SystemPrompt):
|
||||
AGENT_PROMPT = f"""You are a precise browser automation agent that interacts with websites through structured commands. Your role is to:
|
||||
1. Analyze the provided webpage elements and structure
|
||||
2. Plan a sequence of actions to accomplish the given task
|
||||
3. Respond with valid JSON containing your action sequence and state assessment
|
||||
3. Your final result MUST be a valid JSON as the **RESPONSE FORMAT** described, containing your action sequence and state assessment, No need extra content to expalin.
|
||||
|
||||
Current date and time: {time_str}
|
||||
|
||||
@@ -147,33 +136,54 @@ class CustomSystemPrompt(SystemPrompt):
|
||||
return SystemMessage(content=AGENT_PROMPT)
|
||||
|
||||
|
||||
class CustomAgentMessagePrompt:
|
||||
class CustomAgentMessagePrompt(AgentMessagePrompt):
|
||||
def __init__(
|
||||
self,
|
||||
state: BrowserState,
|
||||
actions: Optional[List[ActionModel]] = None,
|
||||
result: Optional[List[ActionResult]] = None,
|
||||
include_attributes: list[str] = [],
|
||||
max_error_length: int = 400,
|
||||
step_info: Optional[CustomAgentStepInfo] = None,
|
||||
):
|
||||
self.state = state
|
||||
self.result = result
|
||||
self.max_error_length = max_error_length
|
||||
self.include_attributes = include_attributes
|
||||
self.step_info = step_info
|
||||
super(CustomAgentMessagePrompt, self).__init__(state=state,
|
||||
result=result,
|
||||
include_attributes=include_attributes,
|
||||
max_error_length=max_error_length,
|
||||
step_info=step_info
|
||||
)
|
||||
self.actions = actions
|
||||
|
||||
def get_user_message(self) -> HumanMessage:
|
||||
if self.step_info:
|
||||
step_info_description = f'Current step: {self.step_info.step_number + 1}/{self.step_info.max_steps}'
|
||||
step_info_description = f'Current step: {self.step_info.step_number}/{self.step_info.max_steps}\n'
|
||||
else:
|
||||
step_info_description = ''
|
||||
|
||||
elements_text = self.state.element_tree.clickable_elements_to_string(include_attributes=self.include_attributes)
|
||||
if not elements_text:
|
||||
|
||||
has_content_above = (self.state.pixels_above or 0) > 0
|
||||
has_content_below = (self.state.pixels_below or 0) > 0
|
||||
|
||||
if elements_text != '':
|
||||
if has_content_above:
|
||||
elements_text = (
|
||||
f'... {self.state.pixels_above} pixels above - scroll or extract content to see more ...\n{elements_text}'
|
||||
)
|
||||
else:
|
||||
elements_text = f'[Start of page]\n{elements_text}'
|
||||
if has_content_below:
|
||||
elements_text = (
|
||||
f'{elements_text}\n... {self.state.pixels_below} pixels below - scroll or extract content to see more ...'
|
||||
)
|
||||
else:
|
||||
elements_text = f'{elements_text}\n[End of page]'
|
||||
else:
|
||||
elements_text = 'empty page'
|
||||
|
||||
state_description = f"""
|
||||
{step_info_description}
|
||||
1. Task: {self.step_info.task}
|
||||
1. Task: {self.step_info.task}.
|
||||
2. Hints(Optional):
|
||||
{self.step_info.add_infos}
|
||||
3. Memory:
|
||||
@@ -185,15 +195,20 @@ class CustomAgentMessagePrompt:
|
||||
{elements_text}
|
||||
"""
|
||||
|
||||
if self.result:
|
||||
if self.actions and self.result:
|
||||
state_description += "\n **Previous Actions** \n"
|
||||
state_description += f'Previous step: {self.step_info.step_number-1}/{self.step_info.max_steps} \n'
|
||||
for i, result in enumerate(self.result):
|
||||
action = self.actions[i]
|
||||
state_description += f"Previous action {i + 1}/{len(self.result)}: {action.model_dump_json(exclude_unset=True)}\n"
|
||||
if result.include_in_memory:
|
||||
if result.extracted_content:
|
||||
state_description += f"\nResult of action {i + 1}/{len(self.result)}: {result.extracted_content}"
|
||||
state_description += f"Result of previous action {i + 1}/{len(self.result)}: {result.extracted_content}\n"
|
||||
if result.error:
|
||||
# only use last 300 characters of error
|
||||
error = result.error[-self.max_error_length:]
|
||||
state_description += (
|
||||
f"\nError of action {i + 1}/{len(self.result)}: ...{error}"
|
||||
f"Error of previous action {i + 1}/{len(self.result)}: ...{error}\n"
|
||||
)
|
||||
|
||||
if self.state.screenshot:
|
||||
|
||||
@@ -1,9 +1,3 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/2
|
||||
# @Author : wenshao
|
||||
# @ProjectName: browser-use-webui
|
||||
# @FileName: custom_views.py
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import Type
|
||||
|
||||
@@ -51,7 +45,7 @@ class CustomAgentOutput(AgentOutput):
|
||||
) -> Type["CustomAgentOutput"]:
|
||||
"""Extend actions with custom actions"""
|
||||
return create_model(
|
||||
"AgentOutput",
|
||||
"CustomAgentOutput",
|
||||
__base__=CustomAgentOutput,
|
||||
action=(
|
||||
list[custom_actions],
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/1
|
||||
# @Author : wenshao
|
||||
# @Email : wenshaoguo1026@gmail.com
|
||||
# @Project : browser-use-webui
|
||||
# @FileName: __init__.py.py
|
||||
|
||||
@@ -1,30 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/6
|
||||
# @Author : wenshao
|
||||
# @ProjectName: browser-use-webui
|
||||
# @FileName: config.py
|
||||
|
||||
import os
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional
|
||||
|
||||
|
||||
@dataclass
|
||||
class BrowserPersistenceConfig:
|
||||
"""Configuration for browser persistence"""
|
||||
|
||||
persistent_session: bool = False
|
||||
user_data_dir: Optional[str] = None
|
||||
debugging_port: Optional[int] = None
|
||||
debugging_host: Optional[str] = None
|
||||
|
||||
@classmethod
|
||||
def from_env(cls) -> "BrowserPersistenceConfig":
|
||||
"""Create config from environment variables"""
|
||||
return cls(
|
||||
persistent_session=os.getenv("CHROME_PERSISTENT_SESSION", "").lower()
|
||||
== "true",
|
||||
user_data_dir=os.getenv("CHROME_USER_DATA"),
|
||||
debugging_port=int(os.getenv("CHROME_DEBUGGING_PORT", "9222")),
|
||||
debugging_host=os.getenv("CHROME_DEBUGGING_HOST", "localhost"),
|
||||
)
|
||||
@@ -1,9 +1,3 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/2
|
||||
# @Author : wenshao
|
||||
# @ProjectName: browser-use-webui
|
||||
# @FileName: browser.py
|
||||
|
||||
import asyncio
|
||||
import pdb
|
||||
|
||||
@@ -20,7 +14,6 @@ from browser_use.browser.context import BrowserContext, BrowserContextConfig
|
||||
from playwright.async_api import BrowserContext as PlaywrightBrowserContext
|
||||
import logging
|
||||
|
||||
from .config import BrowserPersistenceConfig
|
||||
from .custom_context import CustomBrowserContext
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -33,12 +26,10 @@ class CustomBrowser(Browser):
|
||||
) -> CustomBrowserContext:
|
||||
return CustomBrowserContext(config=config, browser=self)
|
||||
|
||||
async def _setup_browser(self, playwright: Playwright) -> PlaywrightBrowser:
|
||||
async def _setup_browser_with_instance(self, playwright: Playwright) -> PlaywrightBrowser:
|
||||
"""Sets up and returns a Playwright Browser instance with anti-detection measures."""
|
||||
if self.config.wss_url:
|
||||
browser = await playwright.chromium.connect(self.config.wss_url)
|
||||
return browser
|
||||
elif self.config.chrome_instance_path:
|
||||
if not self.config.chrome_instance_path:
|
||||
raise ValueError('Chrome instance path is required')
|
||||
import subprocess
|
||||
|
||||
import requests
|
||||
@@ -61,12 +52,12 @@ class CustomBrowser(Browser):
|
||||
[
|
||||
self.config.chrome_instance_path,
|
||||
'--remote-debugging-port=9222',
|
||||
],
|
||||
] + self.config.extra_chromium_args,
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.DEVNULL,
|
||||
)
|
||||
|
||||
# Attempt to connect again after starting a new instance
|
||||
# try to connect first in case the browser have not started
|
||||
for _ in range(10):
|
||||
try:
|
||||
response = requests.get('http://localhost:9222/json/version', timeout=2)
|
||||
@@ -76,6 +67,7 @@ class CustomBrowser(Browser):
|
||||
pass
|
||||
await asyncio.sleep(1)
|
||||
|
||||
# Attempt to connect again after starting a new instance
|
||||
try:
|
||||
browser = await playwright.chromium.connect_over_cdp(
|
||||
endpoint_url='http://localhost:9222',
|
||||
@@ -87,41 +79,3 @@ class CustomBrowser(Browser):
|
||||
raise RuntimeError(
|
||||
' To start chrome in Debug mode, you need to close all existing Chrome instances and try again otherwise we can not connect to the instance.'
|
||||
)
|
||||
|
||||
else:
|
||||
try:
|
||||
disable_security_args = []
|
||||
if self.config.disable_security:
|
||||
disable_security_args = [
|
||||
'--disable-web-security',
|
||||
'--disable-site-isolation-trials',
|
||||
'--disable-features=IsolateOrigins,site-per-process',
|
||||
]
|
||||
|
||||
browser = await playwright.chromium.launch(
|
||||
headless=self.config.headless,
|
||||
args=[
|
||||
'--no-sandbox',
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
'--disable-infobars',
|
||||
'--disable-background-timer-throttling',
|
||||
'--disable-popup-blocking',
|
||||
'--disable-backgrounding-occluded-windows',
|
||||
'--disable-renderer-backgrounding',
|
||||
'--disable-window-activation',
|
||||
'--disable-focus-on-load',
|
||||
'--no-first-run',
|
||||
'--no-default-browser-check',
|
||||
'--no-startup-window',
|
||||
'--window-position=0,0',
|
||||
# '--window-size=1280,1000',
|
||||
]
|
||||
+ disable_security_args
|
||||
+ self.config.extra_chromium_args,
|
||||
proxy=self.config.proxy,
|
||||
)
|
||||
|
||||
return browser
|
||||
except Exception as e:
|
||||
logger.error(f'Failed to initialize Playwright browser: {str(e)}')
|
||||
raise
|
||||
|
||||
@@ -1,10 +1,3 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/1
|
||||
# @Author : wenshao
|
||||
# @Email : wenshaoguo1026@gmail.com
|
||||
# @Project : browser-use-webui
|
||||
# @FileName: context.py
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
@@ -14,7 +7,6 @@ from browser_use.browser.context import BrowserContext, BrowserContextConfig
|
||||
from playwright.async_api import Browser as PlaywrightBrowser
|
||||
from playwright.async_api import BrowserContext as PlaywrightBrowserContext
|
||||
|
||||
from .config import BrowserPersistenceConfig
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@@ -25,72 +17,3 @@ class CustomBrowserContext(BrowserContext):
|
||||
config: BrowserContextConfig = BrowserContextConfig()
|
||||
):
|
||||
super(CustomBrowserContext, self).__init__(browser=browser, config=config)
|
||||
|
||||
async def _create_context(self, browser: PlaywrightBrowser) -> PlaywrightBrowserContext:
|
||||
"""Creates a new browser context with anti-detection measures and loads cookies if available."""
|
||||
# If we have a context, return it directly
|
||||
|
||||
# Check if we should use existing context for persistence
|
||||
if self.browser.config.chrome_instance_path and len(browser.contexts) > 0:
|
||||
# Connect to existing Chrome instance instead of creating new one
|
||||
context = browser.contexts[0]
|
||||
else:
|
||||
# Original code for creating new context
|
||||
context = await browser.new_context(
|
||||
viewport=self.config.browser_window_size,
|
||||
no_viewport=False,
|
||||
user_agent=(
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
|
||||
"(KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
|
||||
),
|
||||
java_script_enabled=True,
|
||||
bypass_csp=self.config.disable_security,
|
||||
ignore_https_errors=self.config.disable_security,
|
||||
record_video_dir=self.config.save_recording_path,
|
||||
record_video_size=self.config.browser_window_size,
|
||||
)
|
||||
|
||||
if self.config.trace_path:
|
||||
await context.tracing.start(screenshots=True, snapshots=True, sources=True)
|
||||
|
||||
# Load cookies if they exist
|
||||
if self.config.cookies_file and os.path.exists(self.config.cookies_file):
|
||||
with open(self.config.cookies_file, "r") as f:
|
||||
cookies = json.load(f)
|
||||
logger.info(
|
||||
f"Loaded {len(cookies)} cookies from {self.config.cookies_file}"
|
||||
)
|
||||
await context.add_cookies(cookies)
|
||||
|
||||
# Expose anti-detection scripts
|
||||
await context.add_init_script(
|
||||
"""
|
||||
// Webdriver property
|
||||
Object.defineProperty(navigator, 'webdriver', {
|
||||
get: () => undefined
|
||||
});
|
||||
|
||||
// Languages
|
||||
Object.defineProperty(navigator, 'languages', {
|
||||
get: () => ['en-US', 'en']
|
||||
});
|
||||
|
||||
// Plugins
|
||||
Object.defineProperty(navigator, 'plugins', {
|
||||
get: () => [1, 2, 3, 4, 5]
|
||||
});
|
||||
|
||||
// Chrome runtime
|
||||
window.chrome = { runtime: {} };
|
||||
|
||||
// Permissions
|
||||
const originalQuery = window.navigator.permissions.query;
|
||||
window.navigator.permissions.query = (parameters) => (
|
||||
parameters.name === 'notifications' ?
|
||||
Promise.resolve({ state: Notification.permission }) :
|
||||
originalQuery(parameters)
|
||||
);
|
||||
"""
|
||||
)
|
||||
|
||||
return context
|
||||
|
||||
@@ -1,5 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/2
|
||||
# @Author : wenshao
|
||||
# @ProjectName: browser-use-webui
|
||||
# @FileName: __init__.py.py
|
||||
|
||||
@@ -1,18 +1,16 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/2
|
||||
# @Author : wenshao
|
||||
# @ProjectName: browser-use-webui
|
||||
# @FileName: custom_action.py
|
||||
|
||||
import pyperclip
|
||||
from typing import Optional, Type
|
||||
from pydantic import BaseModel
|
||||
from browser_use.agent.views import ActionResult
|
||||
from browser_use.browser.context import BrowserContext
|
||||
from browser_use.controller.service import Controller
|
||||
from browser_use.controller.service import Controller, DoneAction
|
||||
|
||||
|
||||
class CustomController(Controller):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
def __init__(self, exclude_actions: list[str] = [],
|
||||
output_model: Optional[Type[BaseModel]] = None
|
||||
):
|
||||
super().__init__(exclude_actions=exclude_actions, output_model=output_model)
|
||||
self._register_custom_actions()
|
||||
|
||||
def _register_custom_actions(self):
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/1
|
||||
# @Author : wenshao
|
||||
# @Email : wenshaoguo1026@gmail.com
|
||||
# @Project : browser-use-webui
|
||||
# @FileName: __init__.py.py
|
||||
|
||||
@@ -11,13 +11,13 @@ def default_config():
|
||||
"max_steps": 100,
|
||||
"max_actions_per_step": 10,
|
||||
"use_vision": True,
|
||||
"tool_call_in_content": True,
|
||||
"tool_calling_method": "auto",
|
||||
"llm_provider": "openai",
|
||||
"llm_model_name": "gpt-4o",
|
||||
"llm_temperature": 1.0,
|
||||
"llm_base_url": "",
|
||||
"llm_api_key": "",
|
||||
"use_own_browser": os.getenv("CHROME_PERSISTENT_SESSION", False),
|
||||
"use_own_browser": os.getenv("CHROME_PERSISTENT_SESSION", "false").lower() == "true",
|
||||
"keep_browser_open": False,
|
||||
"headless": False,
|
||||
"disable_security": True,
|
||||
@@ -56,7 +56,7 @@ def save_current_config(*args):
|
||||
"max_steps": args[1],
|
||||
"max_actions_per_step": args[2],
|
||||
"use_vision": args[3],
|
||||
"tool_call_in_content": args[4],
|
||||
"tool_calling_method": args[4],
|
||||
"llm_provider": args[5],
|
||||
"llm_model_name": args[6],
|
||||
"llm_temperature": args[7],
|
||||
@@ -86,7 +86,7 @@ def update_ui_from_config(config_file):
|
||||
gr.update(value=loaded_config.get("max_steps", 100)),
|
||||
gr.update(value=loaded_config.get("max_actions_per_step", 10)),
|
||||
gr.update(value=loaded_config.get("use_vision", True)),
|
||||
gr.update(value=loaded_config.get("tool_call_in_content", True)),
|
||||
gr.update(value=loaded_config.get("tool_calling_method", True)),
|
||||
gr.update(value=loaded_config.get("llm_provider", "openai")),
|
||||
gr.update(value=loaded_config.get("llm_model_name", "gpt-4o")),
|
||||
gr.update(value=loaded_config.get("llm_temperature", 1.0)),
|
||||
|
||||
@@ -25,6 +25,7 @@ from langchain_core.outputs import (
|
||||
LLMResult,
|
||||
RunInfo,
|
||||
)
|
||||
from langchain_ollama import ChatOllama
|
||||
from langchain_core.output_parsers.base import OutputParserLike
|
||||
from langchain_core.runnables import Runnable, RunnableConfig
|
||||
from langchain_core.tools import BaseTool
|
||||
@@ -99,3 +100,37 @@ class DeepSeekR1ChatOpenAI(ChatOpenAI):
|
||||
reasoning_content = response.choices[0].message.reasoning_content
|
||||
content = response.choices[0].message.content
|
||||
return AIMessage(content=content, reasoning_content=reasoning_content)
|
||||
|
||||
class DeepSeekR1ChatOllama(ChatOllama):
|
||||
|
||||
async def ainvoke(
|
||||
self,
|
||||
input: LanguageModelInput,
|
||||
config: Optional[RunnableConfig] = None,
|
||||
*,
|
||||
stop: Optional[list[str]] = None,
|
||||
**kwargs: Any,
|
||||
) -> AIMessage:
|
||||
org_ai_message = await super().ainvoke(input=input)
|
||||
org_content = org_ai_message.content
|
||||
reasoning_content = org_content.split("</think>")[0].replace("<think>", "")
|
||||
content = org_content.split("</think>")[1]
|
||||
if "**JSON Response:**" in content:
|
||||
content = content.split("**JSON Response:**")[-1]
|
||||
return AIMessage(content=content, reasoning_content=reasoning_content)
|
||||
|
||||
def invoke(
|
||||
self,
|
||||
input: LanguageModelInput,
|
||||
config: Optional[RunnableConfig] = None,
|
||||
*,
|
||||
stop: Optional[list[str]] = None,
|
||||
**kwargs: Any,
|
||||
) -> AIMessage:
|
||||
org_ai_message = super().invoke(input=input)
|
||||
org_content = org_ai_message.content
|
||||
reasoning_content = org_content.split("</think>")[0].replace("<think>", "")
|
||||
content = org_content.split("</think>")[1]
|
||||
if "**JSON Response:**" in content:
|
||||
content = content.split("**JSON Response:**")[-1]
|
||||
return AIMessage(content=content, reasoning_content=reasoning_content)
|
||||
@@ -1,9 +1,3 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/1
|
||||
# @Author : wenshao
|
||||
# @Email : wenshaoguo1026@gmail.com
|
||||
# @Project : browser-use-webui
|
||||
# @FileName: utils.py
|
||||
import base64
|
||||
import os
|
||||
import time
|
||||
@@ -11,12 +5,21 @@ from pathlib import Path
|
||||
from typing import Dict, Optional
|
||||
|
||||
from langchain_anthropic import ChatAnthropic
|
||||
from langchain_mistralai import ChatMistralAI
|
||||
from langchain_google_genai import ChatGoogleGenerativeAI
|
||||
from langchain_ollama import ChatOllama
|
||||
from langchain_openai import AzureChatOpenAI, ChatOpenAI
|
||||
import gradio as gr
|
||||
|
||||
from .llm import DeepSeekR1ChatOpenAI
|
||||
from .llm import DeepSeekR1ChatOpenAI, DeepSeekR1ChatOllama
|
||||
|
||||
PROVIDER_DISPLAY_NAMES = {
|
||||
"openai": "OpenAI",
|
||||
"azure_openai": "Azure OpenAI",
|
||||
"anthropic": "Anthropic",
|
||||
"deepseek": "DeepSeek",
|
||||
"gemini": "Gemini"
|
||||
}
|
||||
|
||||
def get_llm_model(provider: str, **kwargs):
|
||||
"""
|
||||
@@ -25,19 +28,37 @@ def get_llm_model(provider: str, **kwargs):
|
||||
:param kwargs:
|
||||
:return:
|
||||
"""
|
||||
if provider not in ["ollama"]:
|
||||
env_var = "GOOGLE_API_KEY" if provider == "gemini" else f"{provider.upper()}_API_KEY"
|
||||
api_key = kwargs.get("api_key", "") or os.getenv(env_var, "")
|
||||
if not api_key:
|
||||
handle_api_key_error(provider, env_var)
|
||||
kwargs["api_key"] = api_key
|
||||
|
||||
if provider == "anthropic":
|
||||
if not kwargs.get("base_url", ""):
|
||||
base_url = "https://api.anthropic.com"
|
||||
else:
|
||||
base_url = kwargs.get("base_url")
|
||||
|
||||
return ChatAnthropic(
|
||||
model_name=kwargs.get("model_name", "claude-3-5-sonnet-20240620"),
|
||||
temperature=kwargs.get("temperature", 0.0),
|
||||
base_url=base_url,
|
||||
api_key=api_key,
|
||||
)
|
||||
elif provider == 'mistral':
|
||||
if not kwargs.get("base_url", ""):
|
||||
base_url = os.getenv("MISTRAL_ENDPOINT", "https://api.mistral.ai/v1")
|
||||
else:
|
||||
base_url = kwargs.get("base_url")
|
||||
if not kwargs.get("api_key", ""):
|
||||
api_key = os.getenv("ANTHROPIC_API_KEY", "")
|
||||
api_key = os.getenv("MISTRAL_API_KEY", "")
|
||||
else:
|
||||
api_key = kwargs.get("api_key")
|
||||
|
||||
return ChatAnthropic(
|
||||
model_name=kwargs.get("model_name", "claude-3-5-sonnet-20240620"),
|
||||
return ChatMistralAI(
|
||||
model=kwargs.get("model_name", "mistral-large-latest"),
|
||||
temperature=kwargs.get("temperature", 0.0),
|
||||
base_url=base_url,
|
||||
api_key=api_key,
|
||||
@@ -48,11 +69,6 @@ def get_llm_model(provider: str, **kwargs):
|
||||
else:
|
||||
base_url = kwargs.get("base_url")
|
||||
|
||||
if not kwargs.get("api_key", ""):
|
||||
api_key = os.getenv("OPENAI_API_KEY", "")
|
||||
else:
|
||||
api_key = kwargs.get("api_key")
|
||||
|
||||
return ChatOpenAI(
|
||||
model=kwargs.get("model_name", "gpt-4o"),
|
||||
temperature=kwargs.get("temperature", 0.0),
|
||||
@@ -65,11 +81,6 @@ def get_llm_model(provider: str, **kwargs):
|
||||
else:
|
||||
base_url = kwargs.get("base_url")
|
||||
|
||||
if not kwargs.get("api_key", ""):
|
||||
api_key = os.getenv("DEEPSEEK_API_KEY", "")
|
||||
else:
|
||||
api_key = kwargs.get("api_key")
|
||||
|
||||
if kwargs.get("model_name", "deepseek-chat") == "deepseek-reasoner":
|
||||
return DeepSeekR1ChatOpenAI(
|
||||
model=kwargs.get("model_name", "deepseek-reasoner"),
|
||||
@@ -85,31 +96,37 @@ def get_llm_model(provider: str, **kwargs):
|
||||
api_key=api_key,
|
||||
)
|
||||
elif provider == "gemini":
|
||||
if not kwargs.get("api_key", ""):
|
||||
api_key = os.getenv("GOOGLE_API_KEY", "")
|
||||
else:
|
||||
api_key = kwargs.get("api_key")
|
||||
return ChatGoogleGenerativeAI(
|
||||
model=kwargs.get("model_name", "gemini-2.0-flash-exp"),
|
||||
temperature=kwargs.get("temperature", 0.0),
|
||||
google_api_key=api_key,
|
||||
)
|
||||
elif provider == "ollama":
|
||||
if not kwargs.get("base_url", ""):
|
||||
base_url = os.getenv("OLLAMA_ENDPOINT", "http://localhost:11434")
|
||||
else:
|
||||
base_url = kwargs.get("base_url")
|
||||
|
||||
if "deepseek-r1" in kwargs.get("model_name", "qwen2.5:7b"):
|
||||
return DeepSeekR1ChatOllama(
|
||||
model=kwargs.get("model_name", "deepseek-r1:14b"),
|
||||
temperature=kwargs.get("temperature", 0.0),
|
||||
num_ctx=kwargs.get("num_ctx", 32000),
|
||||
base_url=base_url,
|
||||
)
|
||||
else:
|
||||
return ChatOllama(
|
||||
model=kwargs.get("model_name", "qwen2.5:7b"),
|
||||
temperature=kwargs.get("temperature", 0.0),
|
||||
num_ctx=kwargs.get("num_ctx", 32000),
|
||||
base_url=kwargs.get("base_url", "http://localhost:11434"),
|
||||
num_predict=kwargs.get("num_predict", 1024),
|
||||
base_url=base_url,
|
||||
)
|
||||
elif provider == "azure_openai":
|
||||
if not kwargs.get("base_url", ""):
|
||||
base_url = os.getenv("AZURE_OPENAI_ENDPOINT", "")
|
||||
else:
|
||||
base_url = kwargs.get("base_url")
|
||||
if not kwargs.get("api_key", ""):
|
||||
api_key = os.getenv("AZURE_OPENAI_API_KEY", "")
|
||||
else:
|
||||
api_key = kwargs.get("api_key")
|
||||
return AzureChatOpenAI(
|
||||
model=kwargs.get("model_name", "gpt-4o"),
|
||||
temperature=kwargs.get("temperature", 0.0),
|
||||
@@ -123,11 +140,12 @@ def get_llm_model(provider: str, **kwargs):
|
||||
# Predefined model names for common providers
|
||||
model_names = {
|
||||
"anthropic": ["claude-3-5-sonnet-20240620", "claude-3-opus-20240229"],
|
||||
"openai": ["gpt-4o", "gpt-4", "gpt-3.5-turbo"],
|
||||
"openai": ["gpt-4o", "gpt-4", "gpt-3.5-turbo", "o3-mini"],
|
||||
"deepseek": ["deepseek-chat", "deepseek-reasoner"],
|
||||
"gemini": ["gemini-2.0-flash-exp", "gemini-2.0-flash-thinking-exp", "gemini-1.5-flash-latest", "gemini-1.5-flash-8b-latest", "gemini-2.0-flash-thinking-exp-1219" ],
|
||||
"ollama": ["qwen2.5:7b", "llama2:7b"],
|
||||
"azure_openai": ["gpt-4o", "gpt-4", "gpt-3.5-turbo"]
|
||||
"ollama": ["qwen2.5:7b", "llama2:7b", "deepseek-r1:14b", "deepseek-r1:32b"],
|
||||
"azure_openai": ["gpt-4o", "gpt-4", "gpt-3.5-turbo"],
|
||||
"mistral": ["pixtral-large-latest", "mistral-large-latest", "mistral-small-latest", "ministral-8b-latest"]
|
||||
}
|
||||
|
||||
# Callback to update the model name dropdown based on the selected provider
|
||||
@@ -147,6 +165,16 @@ def update_model_dropdown(llm_provider, api_key=None, base_url=None):
|
||||
else:
|
||||
return gr.Dropdown(choices=[], value="", interactive=True, allow_custom_value=True)
|
||||
|
||||
def handle_api_key_error(provider: str, env_var: str):
|
||||
"""
|
||||
Handles the missing API key error by raising a gr.Error with a clear message.
|
||||
"""
|
||||
provider_display = PROVIDER_DISPLAY_NAMES.get(provider, provider.upper())
|
||||
raise gr.Error(
|
||||
f"💥 {provider_display} API key not found! 🔑 Please set the "
|
||||
f"`{env_var}` environment variable or provide it in the UI."
|
||||
)
|
||||
|
||||
def encode_image(img_path):
|
||||
if not img_path:
|
||||
return None
|
||||
|
||||
@@ -1,8 +1,3 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/2
|
||||
# @Author : wenshao
|
||||
# @ProjectName: browser-use-webui
|
||||
# @FileName: test_browser_use.py
|
||||
import pdb
|
||||
|
||||
from dotenv import load_dotenv
|
||||
@@ -37,15 +32,27 @@ async def test_browser_use_org():
|
||||
# api_key=os.getenv("AZURE_OPENAI_API_KEY", ""),
|
||||
# )
|
||||
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="deepseek",
|
||||
# model_name="deepseek-chat",
|
||||
# temperature=0.8
|
||||
# )
|
||||
|
||||
llm = utils.get_llm_model(
|
||||
provider="deepseek",
|
||||
model_name="deepseek-chat",
|
||||
temperature=0.8
|
||||
provider="ollama", model_name="deepseek-r1:14b", temperature=0.5
|
||||
)
|
||||
|
||||
window_w, window_h = 1920, 1080
|
||||
use_vision = False
|
||||
use_own_browser = False
|
||||
if use_own_browser:
|
||||
chrome_path = os.getenv("CHROME_PATH", None)
|
||||
if chrome_path == "":
|
||||
chrome_path = None
|
||||
else:
|
||||
chrome_path = None
|
||||
|
||||
tool_calling_method = "json_schema" # setting to json_schema when using ollma
|
||||
|
||||
browser = Browser(
|
||||
config=BrowserConfig(
|
||||
@@ -69,7 +76,8 @@ async def test_browser_use_org():
|
||||
task="go to google.com and type 'OpenAI' click search and give me the first url",
|
||||
llm=llm,
|
||||
browser_context=browser_context,
|
||||
use_vision=use_vision
|
||||
use_vision=use_vision,
|
||||
tool_calling_method=tool_calling_method
|
||||
)
|
||||
history: AgentHistoryList = await agent.run(max_steps=10)
|
||||
|
||||
@@ -95,7 +103,7 @@ async def test_browser_use_custom():
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
from src.agent.custom_agent import CustomAgent
|
||||
from src.agent.custom_prompts import CustomSystemPrompt
|
||||
from src.agent.custom_prompts import CustomSystemPrompt, CustomAgentMessagePrompt
|
||||
from src.browser.custom_browser import CustomBrowser
|
||||
from src.browser.custom_context import BrowserContextConfig
|
||||
from src.controller.custom_controller import CustomController
|
||||
@@ -103,143 +111,21 @@ async def test_browser_use_custom():
|
||||
window_w, window_h = 1920, 1080
|
||||
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="azure_openai",
|
||||
# provider="openai",
|
||||
# model_name="gpt-4o",
|
||||
# temperature=0.8,
|
||||
# base_url=os.getenv("AZURE_OPENAI_ENDPOINT", ""),
|
||||
# api_key=os.getenv("AZURE_OPENAI_API_KEY", ""),
|
||||
# base_url=os.getenv("OPENAI_ENDPOINT", ""),
|
||||
# api_key=os.getenv("OPENAI_API_KEY", ""),
|
||||
# )
|
||||
|
||||
llm = utils.get_llm_model(
|
||||
provider="gemini",
|
||||
model_name="gemini-2.0-flash-exp",
|
||||
temperature=1.0,
|
||||
api_key=os.getenv("GOOGLE_API_KEY", "")
|
||||
provider="azure_openai",
|
||||
model_name="gpt-4o",
|
||||
temperature=0.8,
|
||||
base_url=os.getenv("AZURE_OPENAI_ENDPOINT", ""),
|
||||
api_key=os.getenv("AZURE_OPENAI_API_KEY", ""),
|
||||
)
|
||||
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="deepseek",
|
||||
# model_name="deepseek-chat",
|
||||
# temperature=0.8
|
||||
# )
|
||||
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="ollama", model_name="qwen2.5:7b", temperature=0.8
|
||||
# )
|
||||
|
||||
controller = CustomController()
|
||||
use_own_browser = False
|
||||
disable_security = True
|
||||
use_vision = True # Set to False when using DeepSeek
|
||||
tool_call_in_content = True # Set to True when using Ollama
|
||||
max_actions_per_step = 1
|
||||
playwright = None
|
||||
browser_context_ = None
|
||||
try:
|
||||
if use_own_browser:
|
||||
playwright = await async_playwright().start()
|
||||
chrome_exe = os.getenv("CHROME_PATH", "")
|
||||
chrome_use_data = os.getenv("CHROME_USER_DATA", "")
|
||||
browser_context_ = await playwright.chromium.launch_persistent_context(
|
||||
user_data_dir=chrome_use_data,
|
||||
executable_path=chrome_exe,
|
||||
no_viewport=False,
|
||||
headless=False, # 保持浏览器窗口可见
|
||||
user_agent=(
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
|
||||
"(KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
|
||||
),
|
||||
java_script_enabled=True,
|
||||
bypass_csp=disable_security,
|
||||
ignore_https_errors=disable_security,
|
||||
record_video_dir="./tmp/record_videos",
|
||||
record_video_size={"width": window_w, "height": window_h},
|
||||
)
|
||||
else:
|
||||
browser_context_ = None
|
||||
|
||||
browser = CustomBrowser(
|
||||
config=BrowserConfig(
|
||||
headless=False,
|
||||
disable_security=True,
|
||||
extra_chromium_args=[f"--window-size={window_w},{window_h}"],
|
||||
)
|
||||
)
|
||||
|
||||
async with await browser.new_context(
|
||||
config=BrowserContextConfig(
|
||||
trace_path="./tmp/result_processing",
|
||||
save_recording_path="./tmp/record_videos",
|
||||
no_viewport=False,
|
||||
browser_window_size=BrowserContextWindowSize(
|
||||
width=window_w, height=window_h
|
||||
),
|
||||
),
|
||||
context=browser_context_,
|
||||
) as browser_context:
|
||||
agent = CustomAgent(
|
||||
task="go to google.com and type 'OpenAI' click search and give me the first url",
|
||||
add_infos="", # some hints for llm to complete the task
|
||||
llm=llm,
|
||||
browser_context=browser_context,
|
||||
controller=controller,
|
||||
system_prompt_class=CustomSystemPrompt,
|
||||
use_vision=use_vision,
|
||||
tool_call_in_content=tool_call_in_content,
|
||||
max_actions_per_step=max_actions_per_step
|
||||
)
|
||||
history: AgentHistoryList = await agent.run(max_steps=10)
|
||||
|
||||
print("Final Result:")
|
||||
pprint(history.final_result(), indent=4)
|
||||
|
||||
print("\nErrors:")
|
||||
pprint(history.errors(), indent=4)
|
||||
|
||||
# e.g. xPaths the model clicked on
|
||||
print("\nModel Outputs:")
|
||||
pprint(history.model_actions(), indent=4)
|
||||
|
||||
print("\nThoughts:")
|
||||
pprint(history.model_thoughts(), indent=4)
|
||||
# close browser
|
||||
except Exception:
|
||||
import traceback
|
||||
|
||||
traceback.print_exc()
|
||||
finally:
|
||||
# 显式关闭持久化上下文
|
||||
if browser_context_:
|
||||
await browser_context_.close()
|
||||
|
||||
# 关闭 Playwright 对象
|
||||
if playwright:
|
||||
await playwright.stop()
|
||||
|
||||
await browser.close()
|
||||
|
||||
|
||||
async def test_browser_use_custom_v2():
|
||||
from browser_use.browser.context import BrowserContextWindowSize
|
||||
from browser_use.browser.browser import BrowserConfig
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
from src.agent.custom_agent import CustomAgent
|
||||
from src.agent.custom_prompts import CustomSystemPrompt
|
||||
from src.browser.custom_browser import CustomBrowser
|
||||
from src.browser.custom_context import BrowserContextConfig
|
||||
from src.controller.custom_controller import CustomController
|
||||
|
||||
window_w, window_h = 1920, 1080
|
||||
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="azure_openai",
|
||||
# model_name="gpt-4o",
|
||||
# temperature=0.8,
|
||||
# base_url=os.getenv("AZURE_OPENAI_ENDPOINT", ""),
|
||||
# api_key=os.getenv("AZURE_OPENAI_API_KEY", ""),
|
||||
# )
|
||||
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="gemini",
|
||||
# model_name="gemini-2.0-flash-exp",
|
||||
@@ -247,31 +133,45 @@ async def test_browser_use_custom_v2():
|
||||
# api_key=os.getenv("GOOGLE_API_KEY", "")
|
||||
# )
|
||||
|
||||
llm = utils.get_llm_model(
|
||||
provider="deepseek",
|
||||
model_name="deepseek-reasoner",
|
||||
temperature=0.8
|
||||
)
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="deepseek",
|
||||
# model_name="deepseek-reasoner",
|
||||
# temperature=0.8
|
||||
# )
|
||||
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="deepseek",
|
||||
# model_name="deepseek-chat",
|
||||
# temperature=0.8
|
||||
# )
|
||||
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="ollama", model_name="qwen2.5:7b", temperature=0.5
|
||||
# )
|
||||
|
||||
# llm = utils.get_llm_model(
|
||||
# provider="ollama", model_name="deepseek-r1:14b", temperature=0.5
|
||||
# )
|
||||
|
||||
controller = CustomController()
|
||||
use_own_browser = False
|
||||
use_own_browser = True
|
||||
disable_security = True
|
||||
use_vision = False # Set to False when using DeepSeek
|
||||
tool_call_in_content = True # Set to True when using Ollama
|
||||
|
||||
max_actions_per_step = 1
|
||||
playwright = None
|
||||
browser = None
|
||||
browser_context = None
|
||||
|
||||
try:
|
||||
extra_chromium_args = [f"--window-size={window_w},{window_h}"]
|
||||
if use_own_browser:
|
||||
chrome_path = os.getenv("CHROME_PATH", None)
|
||||
if chrome_path == "":
|
||||
chrome_path = None
|
||||
chrome_user_data = os.getenv("CHROME_USER_DATA", None)
|
||||
if chrome_user_data:
|
||||
extra_chromium_args += [f"--user-data-dir={chrome_user_data}"]
|
||||
else:
|
||||
chrome_path = None
|
||||
browser = CustomBrowser(
|
||||
@@ -279,7 +179,7 @@ async def test_browser_use_custom_v2():
|
||||
headless=False,
|
||||
disable_security=disable_security,
|
||||
chrome_instance_path=chrome_path,
|
||||
extra_chromium_args=[f"--window-size={window_w},{window_h}"],
|
||||
extra_chromium_args=extra_chromium_args,
|
||||
)
|
||||
)
|
||||
browser_context = await browser.new_context(
|
||||
@@ -293,18 +193,18 @@ async def test_browser_use_custom_v2():
|
||||
)
|
||||
)
|
||||
agent = CustomAgent(
|
||||
task="go to google.com and type 'OpenAI' click search and give me the first url",
|
||||
task="Search 'Nvidia' and give me the first url",
|
||||
add_infos="", # some hints for llm to complete the task
|
||||
llm=llm,
|
||||
browser=browser,
|
||||
browser_context=browser_context,
|
||||
controller=controller,
|
||||
system_prompt_class=CustomSystemPrompt,
|
||||
agent_prompt_class=CustomAgentMessagePrompt,
|
||||
use_vision=use_vision,
|
||||
tool_call_in_content=tool_call_in_content,
|
||||
max_actions_per_step=max_actions_per_step
|
||||
)
|
||||
history: AgentHistoryList = await agent.run(max_steps=10)
|
||||
history: AgentHistoryList = await agent.run(max_steps=100)
|
||||
|
||||
print("Final Result:")
|
||||
pprint(history.final_result(), indent=4)
|
||||
@@ -336,5 +236,4 @@ async def test_browser_use_custom_v2():
|
||||
|
||||
if __name__ == "__main__":
|
||||
# asyncio.run(test_browser_use_org())
|
||||
# asyncio.run(test_browser_use_custom())
|
||||
asyncio.run(test_browser_use_custom_v2())
|
||||
asyncio.run(test_browser_use_custom())
|
||||
|
||||
@@ -1,13 +1,10 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/1
|
||||
# @Author : wenshao
|
||||
# @Email : wenshaoguo1026@gmail.com
|
||||
# @Project : browser-use-webui
|
||||
# @FileName: test_llm_api.py
|
||||
import os
|
||||
import pdb
|
||||
from dataclasses import dataclass
|
||||
|
||||
from dotenv import load_dotenv
|
||||
from langchain_core.messages import HumanMessage, SystemMessage
|
||||
from langchain_ollama import ChatOllama
|
||||
|
||||
load_dotenv()
|
||||
|
||||
@@ -15,145 +12,121 @@ import sys
|
||||
|
||||
sys.path.append(".")
|
||||
|
||||
@dataclass
|
||||
class LLMConfig:
|
||||
provider: str
|
||||
model_name: str
|
||||
temperature: float = 0.8
|
||||
base_url: str = None
|
||||
api_key: str = None
|
||||
|
||||
def test_openai_model():
|
||||
from langchain_core.messages import HumanMessage
|
||||
def create_message_content(text, image_path=None):
|
||||
content = [{"type": "text", "text": text}]
|
||||
|
||||
if image_path:
|
||||
from src.utils import utils
|
||||
|
||||
llm = utils.get_llm_model(
|
||||
provider="openai",
|
||||
model_name="gpt-4o",
|
||||
temperature=0.8,
|
||||
base_url=os.getenv("OPENAI_ENDPOINT", ""),
|
||||
api_key=os.getenv("OPENAI_API_KEY", "")
|
||||
)
|
||||
image_path = "assets/examples/test.png"
|
||||
image_data = utils.encode_image(image_path)
|
||||
message = HumanMessage(
|
||||
content=[
|
||||
{"type": "text", "text": "describe this image"},
|
||||
{
|
||||
content.append({
|
||||
"type": "image_url",
|
||||
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
|
||||
},
|
||||
]
|
||||
)
|
||||
ai_msg = llm.invoke([message])
|
||||
print(ai_msg.content)
|
||||
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"}
|
||||
})
|
||||
|
||||
return content
|
||||
|
||||
def test_gemini_model():
|
||||
# you need to enable your api key first: https://ai.google.dev/palm_docs/oauth_quickstart
|
||||
from langchain_core.messages import HumanMessage
|
||||
def get_env_value(key, provider):
|
||||
env_mappings = {
|
||||
"openai": {"api_key": "OPENAI_API_KEY", "base_url": "OPENAI_ENDPOINT"},
|
||||
"azure_openai": {"api_key": "AZURE_OPENAI_API_KEY", "base_url": "AZURE_OPENAI_ENDPOINT"},
|
||||
"gemini": {"api_key": "GOOGLE_API_KEY"},
|
||||
"deepseek": {"api_key": "DEEPSEEK_API_KEY", "base_url": "DEEPSEEK_ENDPOINT"},
|
||||
"mistral": {"api_key": "MISTRAL_API_KEY", "base_url": "MISTRAL_ENDPOINT"},
|
||||
}
|
||||
|
||||
if provider in env_mappings and key in env_mappings[provider]:
|
||||
return os.getenv(env_mappings[provider][key], "")
|
||||
return ""
|
||||
|
||||
def test_llm(config, query, image_path=None, system_message=None):
|
||||
from src.utils import utils
|
||||
|
||||
llm = utils.get_llm_model(
|
||||
provider="gemini",
|
||||
model_name="gemini-2.0-flash-exp",
|
||||
temperature=0.8,
|
||||
api_key=os.getenv("GOOGLE_API_KEY", "")
|
||||
)
|
||||
# Special handling for Ollama-based models
|
||||
if config.provider == "ollama":
|
||||
if "deepseek-r1" in config.model_name:
|
||||
from src.utils.llm import DeepSeekR1ChatOllama
|
||||
llm = DeepSeekR1ChatOllama(model=config.model_name)
|
||||
else:
|
||||
llm = ChatOllama(model=config.model_name)
|
||||
|
||||
image_path = "assets/examples/test.png"
|
||||
image_data = utils.encode_image(image_path)
|
||||
message = HumanMessage(
|
||||
content=[
|
||||
{"type": "text", "text": "describe this image"},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
|
||||
},
|
||||
]
|
||||
)
|
||||
ai_msg = llm.invoke([message])
|
||||
ai_msg = llm.invoke(query)
|
||||
print(ai_msg.content)
|
||||
if "deepseek-r1" in config.model_name:
|
||||
pdb.set_trace()
|
||||
return
|
||||
|
||||
|
||||
def test_azure_openai_model():
|
||||
from langchain_core.messages import HumanMessage
|
||||
from src.utils import utils
|
||||
|
||||
# For other providers, use the standard configuration
|
||||
llm = utils.get_llm_model(
|
||||
provider="azure_openai",
|
||||
model_name="gpt-4o",
|
||||
temperature=0.8,
|
||||
base_url=os.getenv("AZURE_OPENAI_ENDPOINT", ""),
|
||||
api_key=os.getenv("AZURE_OPENAI_API_KEY", "")
|
||||
provider=config.provider,
|
||||
model_name=config.model_name,
|
||||
temperature=config.temperature,
|
||||
base_url=config.base_url or get_env_value("base_url", config.provider),
|
||||
api_key=config.api_key or get_env_value("api_key", config.provider)
|
||||
)
|
||||
image_path = "assets/examples/test.png"
|
||||
image_data = utils.encode_image(image_path)
|
||||
message = HumanMessage(
|
||||
content=[
|
||||
{"type": "text", "text": "describe this image"},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
|
||||
},
|
||||
]
|
||||
)
|
||||
ai_msg = llm.invoke([message])
|
||||
print(ai_msg.content)
|
||||
|
||||
|
||||
def test_deepseek_model():
|
||||
from langchain_core.messages import HumanMessage
|
||||
from src.utils import utils
|
||||
|
||||
llm = utils.get_llm_model(
|
||||
provider="deepseek",
|
||||
model_name="deepseek-chat",
|
||||
temperature=0.8,
|
||||
base_url=os.getenv("DEEPSEEK_ENDPOINT", ""),
|
||||
api_key=os.getenv("DEEPSEEK_API_KEY", "")
|
||||
)
|
||||
message = HumanMessage(
|
||||
content=[
|
||||
{"type": "text", "text": "who are you?"}
|
||||
]
|
||||
)
|
||||
ai_msg = llm.invoke([message])
|
||||
print(ai_msg.content)
|
||||
|
||||
def test_deepseek_r1_model():
|
||||
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
|
||||
from src.utils import utils
|
||||
|
||||
llm = utils.get_llm_model(
|
||||
provider="deepseek",
|
||||
model_name="deepseek-reasoner",
|
||||
temperature=0.8,
|
||||
base_url=os.getenv("DEEPSEEK_ENDPOINT", ""),
|
||||
api_key=os.getenv("DEEPSEEK_API_KEY", "")
|
||||
)
|
||||
# Prepare messages for non-Ollama models
|
||||
messages = []
|
||||
sys_message = SystemMessage(
|
||||
content=[{"type": "text", "text": "you are a helpful AI assistant"}]
|
||||
)
|
||||
messages.append(sys_message)
|
||||
user_message = HumanMessage(
|
||||
content=[
|
||||
{"type": "text", "text": "9.11 and 9.8, which is greater?"}
|
||||
]
|
||||
)
|
||||
messages.append(user_message)
|
||||
if system_message:
|
||||
messages.append(SystemMessage(content=create_message_content(system_message)))
|
||||
messages.append(HumanMessage(content=create_message_content(query, image_path)))
|
||||
ai_msg = llm.invoke(messages)
|
||||
|
||||
# Handle different response types
|
||||
if hasattr(ai_msg, "reasoning_content"):
|
||||
print(ai_msg.reasoning_content)
|
||||
print(ai_msg.content)
|
||||
|
||||
if config.provider == "deepseek" and "deepseek-reasoner" in config.model_name:
|
||||
print(llm.model_name)
|
||||
pdb.set_trace()
|
||||
|
||||
def test_openai_model():
|
||||
config = LLMConfig(provider="openai", model_name="gpt-4o")
|
||||
test_llm(config, "Describe this image", "assets/examples/test.png")
|
||||
|
||||
def test_gemini_model():
|
||||
# Enable your API key first if you haven't: https://ai.google.dev/palm_docs/oauth_quickstart
|
||||
config = LLMConfig(provider="gemini", model_name="gemini-2.0-flash-exp")
|
||||
test_llm(config, "Describe this image", "assets/examples/test.png")
|
||||
|
||||
def test_azure_openai_model():
|
||||
config = LLMConfig(provider="azure_openai", model_name="gpt-4o")
|
||||
test_llm(config, "Describe this image", "assets/examples/test.png")
|
||||
|
||||
def test_deepseek_model():
|
||||
config = LLMConfig(provider="deepseek", model_name="deepseek-chat")
|
||||
test_llm(config, "Who are you?")
|
||||
|
||||
def test_deepseek_r1_model():
|
||||
config = LLMConfig(provider="deepseek", model_name="deepseek-reasoner")
|
||||
test_llm(config, "Which is greater, 9.11 or 9.8?", system_message="You are a helpful AI assistant.")
|
||||
|
||||
def test_ollama_model():
|
||||
from langchain_ollama import ChatOllama
|
||||
config = LLMConfig(provider="ollama", model_name="qwen2.5:7b")
|
||||
test_llm(config, "Sing a ballad of LangChain.")
|
||||
|
||||
llm = ChatOllama(model="qwen2.5:7b")
|
||||
ai_msg = llm.invoke("Sing a ballad of LangChain.")
|
||||
print(ai_msg.content)
|
||||
def test_deepseek_r1_ollama_model():
|
||||
config = LLMConfig(provider="ollama", model_name="deepseek-r1:14b")
|
||||
test_llm(config, "How many 'r's are in the word 'strawberry'?")
|
||||
|
||||
def test_mistral_model():
|
||||
config = LLMConfig(provider="mistral", model_name="pixtral-large-latest")
|
||||
test_llm(config, "Describe this image", "assets/examples/test.png")
|
||||
|
||||
if __name__ == '__main__':
|
||||
if __name__ == "__main__":
|
||||
# test_openai_model()
|
||||
# test_gemini_model()
|
||||
# test_azure_openai_model()
|
||||
#test_deepseek_model()
|
||||
# test_ollama_model()
|
||||
test_deepseek_r1_model()
|
||||
# test_deepseek_r1_model()
|
||||
# test_deepseek_r1_ollama_model()
|
||||
test_mistral_model()
|
||||
|
||||
@@ -1,9 +1,3 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/2
|
||||
# @Author : wenshao
|
||||
# @Email : wenshaoguo1026@gmail.com
|
||||
# @Project : browser-use-webui
|
||||
# @FileName: test_playwright.py
|
||||
import pdb
|
||||
from dotenv import load_dotenv
|
||||
|
||||
|
||||
89
webui.py
89
webui.py
@@ -1,10 +1,3 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# @Time : 2025/1/1
|
||||
# @Author : wenshao
|
||||
# @Email : wenshaoguo1026@gmail.com
|
||||
# @Project : browser-use-webui
|
||||
# @FileName: webui.py
|
||||
|
||||
import pdb
|
||||
import logging
|
||||
|
||||
@@ -28,22 +21,20 @@ from browser_use.browser.context import (
|
||||
BrowserContextConfig,
|
||||
BrowserContextWindowSize,
|
||||
)
|
||||
from langchain_ollama import ChatOllama
|
||||
from playwright.async_api import async_playwright
|
||||
from src.utils.agent_state import AgentState
|
||||
|
||||
from src.utils import utils
|
||||
from src.agent.custom_agent import CustomAgent
|
||||
from src.browser.custom_browser import CustomBrowser
|
||||
from src.agent.custom_prompts import CustomSystemPrompt
|
||||
from src.browser.config import BrowserPersistenceConfig
|
||||
from src.agent.custom_prompts import CustomSystemPrompt, CustomAgentMessagePrompt
|
||||
from src.browser.custom_context import BrowserContextConfig, CustomBrowserContext
|
||||
from src.controller.custom_controller import CustomController
|
||||
from gradio.themes import Citrus, Default, Glass, Monochrome, Ocean, Origin, Soft, Base
|
||||
from src.utils.default_config_settings import default_config, load_config_from_file, save_config_to_file, save_current_config, update_ui_from_config
|
||||
from src.utils.utils import update_model_dropdown, get_latest_files, capture_screenshot
|
||||
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv()
|
||||
|
||||
# Global variables for persistence
|
||||
_global_browser = None
|
||||
@@ -101,7 +92,7 @@ async def run_browser_agent(
|
||||
max_steps,
|
||||
use_vision,
|
||||
max_actions_per_step,
|
||||
tool_call_in_content
|
||||
tool_calling_method
|
||||
):
|
||||
global _global_agent_state
|
||||
_global_agent_state.clear_stop() # Clear any previous stop requests
|
||||
@@ -147,7 +138,7 @@ async def run_browser_agent(
|
||||
max_steps=max_steps,
|
||||
use_vision=use_vision,
|
||||
max_actions_per_step=max_actions_per_step,
|
||||
tool_call_in_content=tool_call_in_content
|
||||
tool_calling_method=tool_calling_method
|
||||
)
|
||||
elif agent_type == "custom":
|
||||
final_result, errors, model_actions, model_thoughts, trace_file, history_file = await run_custom_agent(
|
||||
@@ -166,7 +157,7 @@ async def run_browser_agent(
|
||||
max_steps=max_steps,
|
||||
use_vision=use_vision,
|
||||
max_actions_per_step=max_actions_per_step,
|
||||
tool_call_in_content=tool_call_in_content
|
||||
tool_calling_method=tool_calling_method
|
||||
)
|
||||
else:
|
||||
raise ValueError(f"Invalid agent type: {agent_type}")
|
||||
@@ -193,6 +184,9 @@ async def run_browser_agent(
|
||||
gr.update(interactive=True) # Re-enable run button
|
||||
)
|
||||
|
||||
except gr.Error:
|
||||
raise
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
@@ -225,7 +219,7 @@ async def run_org_agent(
|
||||
max_steps,
|
||||
use_vision,
|
||||
max_actions_per_step,
|
||||
tool_call_in_content
|
||||
tool_calling_method
|
||||
):
|
||||
try:
|
||||
global _global_browser, _global_browser_context, _global_agent_state
|
||||
@@ -233,10 +227,14 @@ async def run_org_agent(
|
||||
# Clear any previous stop request
|
||||
_global_agent_state.clear_stop()
|
||||
|
||||
extra_chromium_args = [f"--window-size={window_w},{window_h}"]
|
||||
if use_own_browser:
|
||||
chrome_path = os.getenv("CHROME_PATH", None)
|
||||
if chrome_path == "":
|
||||
chrome_path = None
|
||||
chrome_user_data = os.getenv("CHROME_USER_DATA", None)
|
||||
if chrome_user_data:
|
||||
extra_chromium_args += [f"--user-data-dir={chrome_user_data}"]
|
||||
else:
|
||||
chrome_path = None
|
||||
|
||||
@@ -246,7 +244,7 @@ async def run_org_agent(
|
||||
headless=headless,
|
||||
disable_security=disable_security,
|
||||
chrome_instance_path=chrome_path,
|
||||
extra_chromium_args=[f"--window-size={window_w},{window_h}"],
|
||||
extra_chromium_args=extra_chromium_args,
|
||||
)
|
||||
)
|
||||
|
||||
@@ -269,7 +267,7 @@ async def run_org_agent(
|
||||
browser=_global_browser,
|
||||
browser_context=_global_browser_context,
|
||||
max_actions_per_step=max_actions_per_step,
|
||||
tool_call_in_content=tool_call_in_content
|
||||
tool_calling_method=tool_calling_method
|
||||
)
|
||||
history = await agent.run(max_steps=max_steps)
|
||||
|
||||
@@ -316,7 +314,7 @@ async def run_custom_agent(
|
||||
max_steps,
|
||||
use_vision,
|
||||
max_actions_per_step,
|
||||
tool_call_in_content
|
||||
tool_calling_method
|
||||
):
|
||||
try:
|
||||
global _global_browser, _global_browser_context, _global_agent_state
|
||||
@@ -324,10 +322,14 @@ async def run_custom_agent(
|
||||
# Clear any previous stop request
|
||||
_global_agent_state.clear_stop()
|
||||
|
||||
extra_chromium_args = [f"--window-size={window_w},{window_h}"]
|
||||
if use_own_browser:
|
||||
chrome_path = os.getenv("CHROME_PATH", None)
|
||||
if chrome_path == "":
|
||||
chrome_path = None
|
||||
chrome_user_data = os.getenv("CHROME_USER_DATA", None)
|
||||
if chrome_user_data:
|
||||
extra_chromium_args += [f"--user-data-dir={chrome_user_data}"]
|
||||
else:
|
||||
chrome_path = None
|
||||
|
||||
@@ -340,7 +342,7 @@ async def run_custom_agent(
|
||||
headless=headless,
|
||||
disable_security=disable_security,
|
||||
chrome_instance_path=chrome_path,
|
||||
extra_chromium_args=[f"--window-size={window_w},{window_h}"],
|
||||
extra_chromium_args=extra_chromium_args,
|
||||
)
|
||||
)
|
||||
|
||||
@@ -366,9 +368,10 @@ async def run_custom_agent(
|
||||
browser_context=_global_browser_context,
|
||||
controller=controller,
|
||||
system_prompt_class=CustomSystemPrompt,
|
||||
agent_prompt_class=CustomAgentMessagePrompt,
|
||||
max_actions_per_step=max_actions_per_step,
|
||||
tool_call_in_content=tool_call_in_content,
|
||||
agent_state=_global_agent_state
|
||||
agent_state=_global_agent_state,
|
||||
tool_calling_method=tool_calling_method
|
||||
)
|
||||
history = await agent.run(max_steps=max_steps)
|
||||
|
||||
@@ -421,7 +424,7 @@ async def run_with_stream(
|
||||
max_steps,
|
||||
use_vision,
|
||||
max_actions_per_step,
|
||||
tool_call_in_content
|
||||
tool_calling_method
|
||||
):
|
||||
global _global_agent_state
|
||||
stream_vw = 80
|
||||
@@ -449,7 +452,7 @@ async def run_with_stream(
|
||||
max_steps=max_steps,
|
||||
use_vision=use_vision,
|
||||
max_actions_per_step=max_actions_per_step,
|
||||
tool_call_in_content=tool_call_in_content
|
||||
tool_calling_method=tool_calling_method
|
||||
)
|
||||
# Add HTML content at the start of the result array
|
||||
html_content = f"<h1 style='width:{stream_vw}vw; height:{stream_vh}vh'>Using browser...</h1>"
|
||||
@@ -481,7 +484,7 @@ async def run_with_stream(
|
||||
max_steps=max_steps,
|
||||
use_vision=use_vision,
|
||||
max_actions_per_step=max_actions_per_step,
|
||||
tool_call_in_content=tool_call_in_content
|
||||
tool_calling_method=tool_calling_method
|
||||
)
|
||||
)
|
||||
|
||||
@@ -535,6 +538,12 @@ async def run_with_stream(
|
||||
try:
|
||||
result = await agent_task
|
||||
final_result, errors, model_actions, model_thoughts, latest_videos, trace, history_file, stop_button, run_button = result
|
||||
except gr.Error:
|
||||
final_result = ""
|
||||
model_actions = ""
|
||||
model_thoughts = ""
|
||||
latest_videos = trace = history_file = None
|
||||
|
||||
except Exception as e:
|
||||
errors = f"Agent error: {str(e)}"
|
||||
|
||||
@@ -607,18 +616,8 @@ def create_ui(config, theme_name="Ocean"):
|
||||
}
|
||||
"""
|
||||
|
||||
js = """
|
||||
function refresh() {
|
||||
const url = new URL(window.location);
|
||||
if (url.searchParams.get('__theme') !== 'dark') {
|
||||
url.searchParams.set('__theme', 'dark');
|
||||
window.location.href = url.href;
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
with gr.Blocks(
|
||||
title="Browser Use WebUI", theme=theme_map[theme_name], css=css, js=js
|
||||
title="Browser Use WebUI", theme=theme_map[theme_name], css=css
|
||||
) as demo:
|
||||
with gr.Row():
|
||||
gr.Markdown(
|
||||
@@ -638,6 +637,7 @@ def create_ui(config, theme_name="Ocean"):
|
||||
value=config['agent_type'],
|
||||
info="Select the type of agent to use",
|
||||
)
|
||||
with gr.Column():
|
||||
max_steps = gr.Slider(
|
||||
minimum=1,
|
||||
maximum=200,
|
||||
@@ -654,15 +654,20 @@ def create_ui(config, theme_name="Ocean"):
|
||||
label="Max Actions per Step",
|
||||
info="Maximum number of actions the agent will take per step",
|
||||
)
|
||||
with gr.Column():
|
||||
use_vision = gr.Checkbox(
|
||||
label="Use Vision",
|
||||
value=config['use_vision'],
|
||||
info="Enable visual processing capabilities",
|
||||
)
|
||||
tool_call_in_content = gr.Checkbox(
|
||||
label="Use Tool Calls in Content",
|
||||
value=config['tool_call_in_content'],
|
||||
info="Enable Tool Calls in content",
|
||||
tool_calling_method = gr.Dropdown(
|
||||
label="Tool Calling Method",
|
||||
value=config['tool_calling_method'],
|
||||
interactive=True,
|
||||
allow_custom_value=True, # Allow users to input custom model names
|
||||
choices=["auto", "json_schema", "function_calling"],
|
||||
info="Tool Calls Funtion Name",
|
||||
visible=False
|
||||
)
|
||||
|
||||
with gr.TabItem("🔧 LLM Configuration", id=2):
|
||||
@@ -813,7 +818,7 @@ def create_ui(config, theme_name="Ocean"):
|
||||
fn=update_ui_from_config,
|
||||
inputs=[config_file_input],
|
||||
outputs=[
|
||||
agent_type, max_steps, max_actions_per_step, use_vision, tool_call_in_content,
|
||||
agent_type, max_steps, max_actions_per_step, use_vision, tool_calling_method,
|
||||
llm_provider, llm_model_name, llm_temperature, llm_base_url, llm_api_key,
|
||||
use_own_browser, keep_browser_open, headless, disable_security, enable_recording,
|
||||
window_w, window_h, save_recording_path, save_trace_path, save_agent_history_path,
|
||||
@@ -824,7 +829,7 @@ def create_ui(config, theme_name="Ocean"):
|
||||
save_config_button.click(
|
||||
fn=save_current_config,
|
||||
inputs=[
|
||||
agent_type, max_steps, max_actions_per_step, use_vision, tool_call_in_content,
|
||||
agent_type, max_steps, max_actions_per_step, use_vision, tool_calling_method,
|
||||
llm_provider, llm_model_name, llm_temperature, llm_base_url, llm_api_key,
|
||||
use_own_browser, keep_browser_open, headless, disable_security,
|
||||
enable_recording, window_w, window_h, save_recording_path, save_trace_path,
|
||||
@@ -876,7 +881,7 @@ def create_ui(config, theme_name="Ocean"):
|
||||
agent_type, llm_provider, llm_model_name, llm_temperature, llm_base_url, llm_api_key,
|
||||
use_own_browser, keep_browser_open, headless, disable_security, window_w, window_h,
|
||||
save_recording_path, save_agent_history_path, save_trace_path, # Include the new path
|
||||
enable_recording, task, add_infos, max_steps, use_vision, max_actions_per_step, tool_call_in_content
|
||||
enable_recording, task, add_infos, max_steps, use_vision, max_actions_per_step, tool_calling_method
|
||||
],
|
||||
outputs=[
|
||||
browser_view, # Browser view
|
||||
|
||||
Reference in New Issue
Block a user