Merge branch 'main' into feature/arm64-support

This commit is contained in:
Sheldon Aristide
2025-02-05 12:36:41 -05:00
committed by GitHub
26 changed files with 859 additions and 1012 deletions

View File

@@ -2,6 +2,7 @@ OPENAI_ENDPOINT=https://api.openai.com/v1
OPENAI_API_KEY= OPENAI_API_KEY=
ANTHROPIC_API_KEY= ANTHROPIC_API_KEY=
ANTHROPIC_ENDPOINT=https://api.anthropic.com
GOOGLE_API_KEY= GOOGLE_API_KEY=
@@ -11,6 +12,11 @@ AZURE_OPENAI_API_KEY=
DEEPSEEK_ENDPOINT=https://api.deepseek.com DEEPSEEK_ENDPOINT=https://api.deepseek.com
DEEPSEEK_API_KEY= DEEPSEEK_API_KEY=
MISTRAL_API_KEY=
MISTRAL_ENDPOINT=https://api.mistral.ai/v1
OLLAMA_ENDPOINT=http://localhost:11434
# Set to false to disable anonymized telemetry # Set to false to disable anonymized telemetry
ANONYMIZED_TELEMETRY=true ANONYMIZED_TELEMETRY=true
@@ -22,12 +28,16 @@ CHROME_PATH=
CHROME_USER_DATA= CHROME_USER_DATA=
CHROME_DEBUGGING_PORT=9222 CHROME_DEBUGGING_PORT=9222
CHROME_DEBUGGING_HOST=localhost CHROME_DEBUGGING_HOST=localhost
CHROME_PERSISTENT_SESSION=false # Set to true to keep browser open between AI tasks # Set to true to keep browser open between AI tasks
CHROME_PERSISTENT_SESSION=false
# Display settings # Display settings
RESOLUTION=1920x1080x24 # Format: WIDTHxHEIGHTxDEPTH # Format: WIDTHxHEIGHTxDEPTH
RESOLUTION_WIDTH=1920 # Width in pixels RESOLUTION=1920x1080x24
RESOLUTION_HEIGHT=1080 # Height in pixels # Width in pixels
RESOLUTION_WIDTH=1920
# Height in pixels
RESOLUTION_HEIGHT=1080
# VNC settings # VNC settings
VNC_PASSWORD=youvncpassword VNC_PASSWORD=youvncpassword

View File

@@ -3,6 +3,7 @@ FROM python:3.11-slim
# Install system dependencies # Install system dependencies
RUN apt-get update && apt-get install -y \ RUN apt-get update && apt-get install -y \
wget \ wget \
netcat-traditional \
gnupg \ gnupg \
curl \ curl \
unzip \ unzip \

121
README.md
View File

@@ -11,7 +11,7 @@ This project builds upon the foundation of the [browser-use](https://github.com/
We would like to officially thank [WarmShao](https://github.com/warmshao) for his contribution to this project. We would like to officially thank [WarmShao](https://github.com/warmshao) for his contribution to this project.
**WebUI:** is built on Gradio and supports a most of `browser-use` functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent. **WebUI:** is built on Gradio and supports most of `browser-use` functionalities. This UI is designed to be user-friendly and enables easy interaction with the browser agent.
**Expanded LLM Support:** We've integrated support for various Large Language Models (LLMs), including: Gemini, OpenAI, Azure OpenAI, Anthropic, DeepSeek, Ollama etc. And we plan to add support for even more models in the future. **Expanded LLM Support:** We've integrated support for various Large Language Models (LLMs), including: Gemini, OpenAI, Azure OpenAI, Anthropic, DeepSeek, Ollama etc. And we plan to add support for even more models in the future.
@@ -21,64 +21,93 @@ We would like to officially thank [WarmShao](https://github.com/warmshao) for hi
<video src="https://github.com/user-attachments/assets/56bc7080-f2e3-4367-af22-6bf2245ff6cb" controls="controls">Your browser does not support playing this video!</video> <video src="https://github.com/user-attachments/assets/56bc7080-f2e3-4367-af22-6bf2245ff6cb" controls="controls">Your browser does not support playing this video!</video>
## Installation Options ## Installation Guide
### Prerequisites
- Python 3.11 or higher
- Git (for cloning the repository)
### Option 1: Local Installation ### Option 1: Local Installation
Read the [quickstart guide](https://docs.browser-use.com/quickstart#prepare-the-environment) or follow the steps below to get started. Read the [quickstart guide](https://docs.browser-use.com/quickstart#prepare-the-environment) or follow the steps below to get started.
> Python 3.11 or higher is required. #### Step 1: Clone the Repository
```bash
git clone https://github.com/browser-use/web-ui.git
cd web-ui
```
First, we recommend using [uv](https://docs.astral.sh/uv/) to setup the Python environment. #### Step 2: Set Up Python Environment
We recommend using [uv](https://docs.astral.sh/uv/) for managing the Python environment.
Using uv (recommended):
```bash ```bash
uv venv --python 3.11 uv venv --python 3.11
``` ```
and activate it with: Activate the virtual environment:
- Windows (Command Prompt):
```cmd
.venv\Scripts\activate
```
- Windows (PowerShell):
```powershell
.\.venv\Scripts\Activate.ps1
```
- macOS/Linux:
```bash ```bash
source .venv/bin/activate source .venv/bin/activate
``` ```
Install the dependencies: #### Step 3: Install Dependencies
Install Python packages:
```bash ```bash
uv pip install -r requirements.txt uv pip install -r requirements.txt
``` ```
Then install playwright: Install Playwright:
```bash ```bash
playwright install playwright install
``` ```
#### Step 4: Configure Environment
1. Create a copy of the example environment file:
- Windows (Command Prompt):
```bash
copy .env.example .env
```
- macOS/Linux/Windows (PowerShell):
```bash
cp .env.example .env
```
2. Open `.env` in your preferred text editor and add your API keys and other settings
### Option 2: Docker Installation ### Option 2: Docker Installation
1. **Prerequisites:** #### Prerequisites
- Docker and Docker Compose installed on your system - Docker and Docker Compose installed
- Git to clone the repository - [Docker Desktop](https://www.docker.com/products/docker-desktop/) (For Windows/macOS)
- [Docker Engine](https://docs.docker.com/engine/install/) and [Docker Compose](https://docs.docker.com/compose/install/) (For Linux)
2. **Setup:** #### Installation Steps
```bash 1. Clone the repository:
# Clone the repository ```bash
git clone https://github.com/browser-use/web-ui.git git clone https://github.com/browser-use/web-ui.git
cd web-ui cd web-ui
```
# Copy and configure environment variables 2. Create and configure environment file:
cp .env.example .env - Windows (Command Prompt):
# Edit .env with your preferred text editor and add your API keys ```bash
``` copy .env.example .env
```
3. **Run with Docker:** - macOS/Linux/Windows (PowerShell):
```bash ```bash
# Build and start the container with default settings (browser closes after AI tasks) cp .env.example .env
docker compose up --build ```
Edit `.env` with your preferred text editor and add your API keys
# Or run with persistent browser (browser stays open between AI tasks)
CHROME_PERSISTENT_SESSION=true docker compose up --build
```
feature/arm64-support
4. **Access the Application:** 4. **Access the Application:**
- WebUI: `http://localhost:7788` - WebUI: `http://localhost:7788`
- VNC Viewer (to see browser interactions): `http://localhost:6080/vnc.html` - VNC Viewer (to see browser interactions): `http://localhost:6080/vnc.html`
@@ -86,16 +115,32 @@ playwright install
Default VNC password is "vncpassword". You can change it by setting the `VNC_PASSWORD` environment variable in your `.env` file. Default VNC password is "vncpassword". You can change it by setting the `VNC_PASSWORD` environment variable in your `.env` file.
3. Run with Docker:
```bash
# Build and start the container with default settings (browser closes after AI tasks)
docker compose up --build
```
```bash
# Or run with persistent browser (browser stays open between AI tasks)
CHROME_PERSISTENT_SESSION=true docker compose up --build
```
4. Access the Application:
- Web Interface: Open `http://localhost:7788` in your browser
- VNC Viewer (for watching browser interactions): Open `http://localhost:6080/vnc.html`
- Default VNC password: "youvncpassword"
- Can be changed by setting `VNC_PASSWORD` in your `.env` file
## Usage ## Usage
### Local Setup ### Local Setup
1. Copy `.env.example` to `.env` and set your environment variables, including API keys for the LLM. `cp .env.example .env` 1. **Run the WebUI:**
2. **Run the WebUI:** After completing the installation steps above, start the application:
```bash ```bash
python webui.py --ip 127.0.0.1 --port 7788 python webui.py --ip 127.0.0.1 --port 7788
``` ```
4. WebUI options: 2. WebUI options:
- `--ip`: The IP address to bind the WebUI to. Default is `127.0.0.1`. - `--ip`: The IP address to bind the WebUI to. Default is `127.0.0.1`.
- `--port`: The port to bind the WebUI to. Default is `7788`. - `--port`: The port to bind the WebUI to. Default is `7788`.
- `--theme`: The theme for the user interface. Default is `Ocean`. - `--theme`: The theme for the user interface. Default is `Ocean`.
@@ -109,7 +154,7 @@ playwright install
- `--dark-mode`: Enables dark mode for the user interface. - `--dark-mode`: Enables dark mode for the user interface.
3. **Access the WebUI:** Open your web browser and navigate to `http://127.0.0.1:7788`. 3. **Access the WebUI:** Open your web browser and navigate to `http://127.0.0.1:7788`.
4. **Using Your Own Browser(Optional):** 4. **Using Your Own Browser(Optional):**
- Set `CHROME_PATH` to the executable path of your browser and `CHROME_USER_DATA` to the user data directory of your browser. - Set `CHROME_PATH` to the executable path of your browser and `CHROME_USER_DATA` to the user data directory of your browser. Leave `CHROME_USER_DATA` empty if you want to use local user data.
- Windows - Windows
```env ```env
CHROME_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe" CHROME_PATH="C:\Program Files\Google\Chrome\Application\chrome.exe"
@@ -119,7 +164,7 @@ playwright install
- Mac - Mac
```env ```env
CHROME_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" CHROME_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"
CHROME_USER_DATA="~/Library/Application Support/Google/Chrome/Profile 1" CHROME_USER_DATA="/Users/YourUsername/Library/Application Support/Google/Chrome"
``` ```
- Close all Chrome windows - Close all Chrome windows
- Open the WebUI in a non-Chrome browser, such as Firefox or Edge. This is important because the persistent browser context will use the Chrome data when running the agent. - Open the WebUI in a non-Chrome browser, such as Firefox or Edge. This is important because the persistent browser context will use the Chrome data when running the agent.
@@ -185,6 +230,6 @@ playwright install
``` ```
## Changelog ## Changelog
- [x] **2025/01/26:** Thanks to @vvincent1234. Now browser-use-webui can combine with DeepSeek-r1 to engage in deep thinking!
- [x] **2025/01/10:** Thanks to @casistack. Now we have Docker Setup option and also Support keep browser open between tasks.[Video tutorial demo](https://github.com/browser-use/web-ui/issues/1#issuecomment-2582511750). - [x] **2025/01/10:** Thanks to @casistack. Now we have Docker Setup option and also Support keep browser open between tasks.[Video tutorial demo](https://github.com/browser-use/web-ui/issues/1#issuecomment-2582511750).
- [x] **2025/01/06:** Thanks to @richard-devbot. A New and Well-Designed WebUI is released. [Video tutorial demo](https://github.com/warmshao/browser-use-webui/issues/1#issuecomment-2573393113). - [x] **2025/01/06:** Thanks to @richard-devbot. A New and Well-Designed WebUI is released. [Video tutorial demo](https://github.com/warmshao/browser-use-webui/issues/1#issuecomment-2573393113).

19
SECURITY.md Normal file
View File

@@ -0,0 +1,19 @@
## Reporting Security Issues
If you believe you have found a security vulnerability in browser-use, please report it through coordinated disclosure.
**Please do not report security vulnerabilities through the repository issues, discussions, or pull requests.**
Instead, please open a new [Github security advisory](https://github.com/browser-use/web-ui/security/advisories/new).
Please include as much of the information listed below as you can to help me better understand and resolve the issue:
* The type of issue (e.g., buffer overflow, SQL injection, or cross-site scripting)
* Full paths of source file(s) related to the manifestation of the issue
* The location of the affected source code (tag/branch/commit or direct URL)
* Any special configuration required to reproduce the issue
* Step-by-step instructions to reproduce the issue
* Proof-of-concept or exploit code (if possible)
* Impact of the issue, including how an attacker might exploit the issue
This information will help me triage your report more quickly.

View File

@@ -1,5 +1,6 @@
services: services:
browser-use-webui: browser-use-webui:
platform: linux/amd64
build: build:
context: . context: .
dockerfile: ${DOCKERFILE:-Dockerfile} dockerfile: ${DOCKERFILE:-Dockerfile}

View File

@@ -1,6 +1,5 @@
browser-use==0.1.19 browser-use==0.1.29
langchain-google-genai==2.0.8
pyperclip==1.9.0 pyperclip==1.9.0
gradio==5.9.1 gradio==5.10.0
langchain-ollama==0.2.2 json-repair
langchain-openai==0.2.14 langchain-mistralai==0.2.4

View File

@@ -1,6 +0,0 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/1
# @Author : wenshao
# @Email : wenshaoguo1026@gmail.com
# @Project : browser-use-webui
# @FileName: __init__.py.py

View File

@@ -1,6 +0,0 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/1
# @Author : wenshao
# @Email : wenshaoguo1026@gmail.com
# @Project : browser-use-webui
# @FileName: __init__.py.py

View File

@@ -1,23 +1,18 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/2
# @Author : wenshao
# @ProjectName: browser-use-webui
# @FileName: custom_agent.py
import json import json
import logging import logging
import pdb import pdb
import traceback import traceback
from typing import Optional, Type from typing import Optional, Type, List, Dict, Any, Callable
from PIL import Image, ImageDraw, ImageFont from PIL import Image, ImageDraw, ImageFont
import os import os
import base64 import base64
import io import io
import platform
from browser_use.agent.prompts import SystemPrompt from browser_use.agent.prompts import SystemPrompt, AgentMessagePrompt
from browser_use.agent.service import Agent from browser_use.agent.service import Agent
from browser_use.agent.views import ( from browser_use.agent.views import (
ActionResult, ActionResult,
ActionModel,
AgentHistoryList, AgentHistoryList,
AgentOutput, AgentOutput,
AgentHistory, AgentHistory,
@@ -27,15 +22,16 @@ from browser_use.browser.context import BrowserContext
from browser_use.browser.views import BrowserStateHistory from browser_use.browser.views import BrowserStateHistory
from browser_use.controller.service import Controller from browser_use.controller.service import Controller
from browser_use.telemetry.views import ( from browser_use.telemetry.views import (
AgentEndTelemetryEvent, AgentEndTelemetryEvent,
AgentRunTelemetryEvent, AgentRunTelemetryEvent,
AgentStepErrorTelemetryEvent, AgentStepTelemetryEvent,
) )
from browser_use.utils import time_execution_async from browser_use.utils import time_execution_async
from langchain_core.language_models.chat_models import BaseChatModel from langchain_core.language_models.chat_models import BaseChatModel
from langchain_core.messages import ( from langchain_core.messages import (
BaseMessage, BaseMessage,
) )
from json_repair import repair_json
from src.utils.agent_state import AgentState from src.utils.agent_state import AgentState
from .custom_massage_manager import CustomMassageManager from .custom_massage_manager import CustomMassageManager
@@ -58,6 +54,7 @@ class CustomAgent(Agent):
max_failures: int = 5, max_failures: int = 5,
retry_delay: int = 10, retry_delay: int = 10,
system_prompt_class: Type[SystemPrompt] = SystemPrompt, system_prompt_class: Type[SystemPrompt] = SystemPrompt,
agent_prompt_class: Type[AgentMessagePrompt] = AgentMessagePrompt,
max_input_tokens: int = 128000, max_input_tokens: int = 128000,
validate_output: bool = False, validate_output: bool = False,
include_attributes: list[str] = [ include_attributes: list[str] = [
@@ -76,6 +73,11 @@ class CustomAgent(Agent):
max_actions_per_step: int = 10, max_actions_per_step: int = 10,
tool_call_in_content: bool = True, tool_call_in_content: bool = True,
agent_state: AgentState = None, agent_state: AgentState = None,
initial_actions: Optional[List[Dict[str, Dict[str, Any]]]] = None,
# Cloud Callbacks
register_new_step_callback: Callable[['BrowserState', 'AgentOutput', int], None] | None = None,
register_done_callback: Callable[['AgentHistoryList'], None] | None = None,
tool_calling_method: Optional[str] = 'auto',
): ):
super().__init__( super().__init__(
task=task, task=task,
@@ -94,26 +96,36 @@ class CustomAgent(Agent):
max_error_length=max_error_length, max_error_length=max_error_length,
max_actions_per_step=max_actions_per_step, max_actions_per_step=max_actions_per_step,
tool_call_in_content=tool_call_in_content, tool_call_in_content=tool_call_in_content,
initial_actions=initial_actions,
register_new_step_callback=register_new_step_callback,
register_done_callback=register_done_callback,
tool_calling_method=tool_calling_method
) )
if self.llm.model_name in ["deepseek-reasoner"]: if self.model_name in ["deepseek-reasoner"] or "deepseek-r1" in self.model_name:
self.use_function_calling = False # deepseek-reasoner does not support function calling
# TODO: deepseek-reasoner only support 64000 context self.use_deepseek_r1 = True
# deepseek-reasoner only support 64000 context
self.max_input_tokens = 64000 self.max_input_tokens = 64000
else: else:
self.use_function_calling = True self.use_deepseek_r1 = False
# record last actions
self._last_actions = None
# custom new info
self.add_infos = add_infos self.add_infos = add_infos
# agent_state for Stop
self.agent_state = agent_state self.agent_state = agent_state
self.agent_prompt_class = agent_prompt_class
self.message_manager = CustomMassageManager( self.message_manager = CustomMassageManager(
llm=self.llm, llm=self.llm,
task=self.task, task=self.task,
action_descriptions=self.controller.registry.get_prompt_description(), action_descriptions=self.controller.registry.get_prompt_description(),
system_prompt_class=self.system_prompt_class, system_prompt_class=self.system_prompt_class,
agent_prompt_class=agent_prompt_class,
max_input_tokens=self.max_input_tokens, max_input_tokens=self.max_input_tokens,
include_attributes=self.include_attributes, include_attributes=self.include_attributes,
max_error_length=self.max_error_length, max_error_length=self.max_error_length,
max_actions_per_step=self.max_actions_per_step, max_actions_per_step=self.max_actions_per_step
tool_call_in_content=tool_call_in_content,
use_function_calling=self.use_function_calling
) )
def _setup_action_models(self) -> None: def _setup_action_models(self) -> None:
@@ -172,57 +184,40 @@ class CustomAgent(Agent):
@time_execution_async("--get_next_action") @time_execution_async("--get_next_action")
async def get_next_action(self, input_messages: list[BaseMessage]) -> AgentOutput: async def get_next_action(self, input_messages: list[BaseMessage]) -> AgentOutput:
"""Get next action from LLM based on current state""" """Get next action from LLM based on current state"""
if self.use_function_calling: messages_to_process = (
try: self.message_manager.merge_successive_human_messages(input_messages)
structured_llm = self.llm.with_structured_output(self.AgentOutput, include_raw=True) if self.use_deepseek_r1
response: dict[str, Any] = await structured_llm.ainvoke(input_messages) # type: ignore else input_messages
)
parsed: AgentOutput = response['parsed'] ai_message = self.llm.invoke(messages_to_process)
# cut the number of actions to max_actions_per_step self.message_manager._add_message_with_tokens(ai_message)
parsed.action = parsed.action[: self.max_actions_per_step]
self._log_response(parsed)
self.n_steps += 1
return parsed if self.use_deepseek_r1:
except Exception as e: logger.info("🤯 Start Deep Thinking: ")
# If something goes wrong, try to invoke the LLM again without structured output, logger.info(ai_message.reasoning_content)
# and Manually parse the response. Temporarily solution for DeepSeek logger.info("🤯 End Deep Thinking")
ret = self.llm.invoke(input_messages)
if isinstance(ret.content, list):
parsed_json = json.loads(ret.content[0].replace("```json", "").replace("```", ""))
else:
parsed_json = json.loads(ret.content.replace("```json", "").replace("```", ""))
parsed: AgentOutput = self.AgentOutput(**parsed_json)
if parsed is None:
raise ValueError(f'Could not parse response.')
# cut the number of actions to max_actions_per_step if isinstance(ai_message.content, list):
parsed.action = parsed.action[: self.max_actions_per_step] ai_content = ai_message.content[0]
self._log_response(parsed)
self.n_steps += 1
return parsed
else: else:
ret = self.llm.invoke(input_messages) ai_content = ai_message.content
if not self.use_function_calling:
self.message_manager._add_message_with_tokens(ret)
logger.info(f"🤯 Start Deep Thinking: ")
logger.info(ret.reasoning_content)
logger.info(f"🤯 End Deep Thinking")
if isinstance(ret.content, list):
parsed_json = json.loads(ret.content[0].replace("```json", "").replace("```", ""))
else:
parsed_json = json.loads(ret.content.replace("```json", "").replace("```", ""))
parsed: AgentOutput = self.AgentOutput(**parsed_json)
if parsed is None:
raise ValueError(f'Could not parse response.')
# cut the number of actions to max_actions_per_step ai_content = ai_content.replace("```json", "").replace("```", "")
parsed.action = parsed.action[: self.max_actions_per_step] ai_content = repair_json(ai_content)
self._log_response(parsed) parsed_json = json.loads(ai_content)
self.n_steps += 1 parsed: AgentOutput = self.AgentOutput(**parsed_json)
if parsed is None:
logger.debug(ai_message.content)
raise ValueError('Could not parse response.')
return parsed # Limit actions to maximum allowed per step
parsed.action = parsed.action[: self.max_actions_per_step]
self._log_response(parsed)
self.n_steps += 1
return parsed
@time_execution_async("--step") @time_execution_async("--step")
async def step(self, step_info: Optional[CustomAgentStepInfo] = None) -> None: async def step(self, step_info: Optional[CustomAgentStepInfo] = None) -> None:
@@ -234,62 +229,212 @@ class CustomAgent(Agent):
try: try:
state = await self.browser_context.get_state(use_vision=self.use_vision) state = await self.browser_context.get_state(use_vision=self.use_vision)
self.message_manager.add_state_message(state, self._last_result, step_info) self.message_manager.add_state_message(state, self._last_actions, self._last_result, step_info)
input_messages = self.message_manager.get_messages() input_messages = self.message_manager.get_messages()
model_output = await self.get_next_action(input_messages) try:
self.update_step_info(model_output, step_info) model_output = await self.get_next_action(input_messages)
logger.info(f"🧠 All Memory: \n{step_info.memory}") if self.register_new_step_callback:
self._save_conversation(input_messages, model_output) self.register_new_step_callback(state, model_output, self.n_steps)
if self.use_function_calling: self.update_step_info(model_output, step_info)
self.message_manager._remove_last_state_message() # we dont want the whole state in the chat history logger.info(f"🧠 All Memory: \n{step_info.memory}")
self.message_manager.add_model_output(model_output) self._save_conversation(input_messages, model_output)
if self.model_name != "deepseek-reasoner":
# remove prev message
self.message_manager._remove_state_message_by_index(-1)
except Exception as e:
# model call failed, remove last state message from history
self.message_manager._remove_state_message_by_index(-1)
raise e
actions: list[ActionModel] = model_output.action
result: list[ActionResult] = await self.controller.multi_act( result: list[ActionResult] = await self.controller.multi_act(
model_output.action, self.browser_context actions, self.browser_context
) )
if len(result) != len(model_output.action): if len(result) != len(actions):
for ri in range(len(result), len(model_output.action)): # I think something changes, such information should let LLM know
for ri in range(len(result), len(actions)):
result.append(ActionResult(extracted_content=None, result.append(ActionResult(extracted_content=None,
include_in_memory=True, include_in_memory=True,
error=f"{model_output.action[ri].model_dump_json(exclude_unset=True)} is Failed to execute. \ error=f"{actions[ri].model_dump_json(exclude_unset=True)} is Failed to execute. \
Something new appeared after action {model_output.action[len(result) - 1].model_dump_json(exclude_unset=True)}", Something new appeared after action {actions[len(result) - 1].model_dump_json(exclude_unset=True)}",
is_done=False)) is_done=False))
if len(actions) == 0:
# TODO: fix no action case
result = [ActionResult(is_done=True, extracted_content=step_info.memory, include_in_memory=True)]
self._last_result = result self._last_result = result
self._last_actions = actions
if len(result) > 0 and result[-1].is_done: if len(result) > 0 and result[-1].is_done:
logger.info(f"📄 Result: {result[-1].extracted_content}") logger.info(f"📄 Result: {result[-1].extracted_content}")
self.consecutive_failures = 0 self.consecutive_failures = 0
except Exception as e: except Exception as e:
result = self._handle_step_error(e) result = await self._handle_step_error(e)
self._last_result = result self._last_result = result
finally: finally:
actions = [a.model_dump(exclude_unset=True) for a in model_output.action] if model_output else []
self.telemetry.capture(
AgentStepTelemetryEvent(
agent_id=self.agent_id,
step=self.n_steps,
actions=actions,
consecutive_failures=self.consecutive_failures,
step_error=[r.error for r in result if r.error] if result else ['No result'],
)
)
if not result: if not result:
return return
for r in result:
if r.error:
self.telemetry.capture(
AgentStepErrorTelemetryEvent(
agent_id=self.agent_id,
error=r.error,
)
)
if state: if state:
self._make_history_item(model_output, state, result) self._make_history_item(model_output, state, result)
async def run(self, max_steps: int = 100) -> AgentHistoryList:
"""Execute the task with maximum number of steps"""
try:
self._log_agent_run()
# Execute initial actions if provided
if self.initial_actions:
result = await self.controller.multi_act(self.initial_actions, self.browser_context, check_for_new_elements=False)
self._last_result = result
step_info = CustomAgentStepInfo(
task=self.task,
add_infos=self.add_infos,
step_number=1,
max_steps=max_steps,
memory="",
task_progress="",
future_plans=""
)
for step in range(max_steps):
# 1) Check if stop requested
if self.agent_state and self.agent_state.is_stop_requested():
logger.info("🛑 Stop requested by user")
self._create_stop_history_item()
break
# 2) Store last valid state before step
if self.browser_context and self.agent_state:
state = await self.browser_context.get_state(use_vision=self.use_vision)
self.agent_state.set_last_valid_state(state)
if self._too_many_failures():
break
# 3) Do the step
await self.step(step_info)
if self.history.is_done():
if (
self.validate_output and step < max_steps - 1
): # if last step, we dont need to validate
if not await self._validate_output():
continue
logger.info("✅ Task completed successfully")
break
else:
logger.info("❌ Failed to complete task in maximum steps")
return self.history
finally:
self.telemetry.capture(
AgentEndTelemetryEvent(
agent_id=self.agent_id,
success=self.history.is_done(),
steps=self.n_steps,
max_steps_reached=self.n_steps >= max_steps,
errors=self.history.errors(),
)
)
if not self.injected_browser_context:
await self.browser_context.close()
if not self.injected_browser and self.browser:
await self.browser.close()
if self.generate_gif:
output_path: str = 'agent_history.gif'
if isinstance(self.generate_gif, str):
output_path = self.generate_gif
self.create_history_gif(output_path=output_path)
def _create_stop_history_item(self):
"""Create a history item for when the agent is stopped."""
try:
# Attempt to retrieve the last valid state from agent_state
state = None
if self.agent_state:
last_state = self.agent_state.get_last_valid_state()
if last_state:
# Convert to BrowserStateHistory
state = BrowserStateHistory(
url=getattr(last_state, 'url', ""),
title=getattr(last_state, 'title', ""),
tabs=getattr(last_state, 'tabs', []),
interacted_element=[None],
screenshot=getattr(last_state, 'screenshot', None)
)
else:
state = self._create_empty_state()
else:
state = self._create_empty_state()
# Create a final item in the agent history indicating done
stop_history = AgentHistory(
model_output=None,
state=state,
result=[ActionResult(extracted_content=None, error=None, is_done=True)]
)
self.history.history.append(stop_history)
except Exception as e:
logger.error(f"Error creating stop history item: {e}")
# Create empty state as fallback
state = self._create_empty_state()
stop_history = AgentHistory(
model_output=None,
state=state,
result=[ActionResult(extracted_content=None, error=None, is_done=True)]
)
self.history.history.append(stop_history)
def _convert_to_browser_state_history(self, browser_state):
return BrowserStateHistory(
url=getattr(browser_state, 'url', ""),
title=getattr(browser_state, 'title', ""),
tabs=getattr(browser_state, 'tabs', []),
interacted_element=[None],
screenshot=getattr(browser_state, 'screenshot', None)
)
def _create_empty_state(self):
return BrowserStateHistory(
url="",
title="",
tabs=[],
interacted_element=[None],
screenshot=None
)
def create_history_gif( def create_history_gif(
self, self,
output_path: str = 'agent_history.gif', output_path: str = 'agent_history.gif',
duration: int = 3000, duration: int = 3000,
show_goals: bool = True, show_goals: bool = True,
show_task: bool = True, show_task: bool = True,
show_logo: bool = False, show_logo: bool = False,
font_size: int = 40, font_size: int = 40,
title_font_size: int = 56, title_font_size: int = 56,
goal_font_size: int = 44, goal_font_size: int = 44,
margin: int = 40, margin: int = 40,
line_spacing: float = 1.5, line_spacing: float = 1.5,
) -> None: ) -> None:
"""Create a GIF from the agent's history with overlaid task and goal text.""" """Create a GIF from the agent's history with overlaid task and goal text."""
if not self.history.history: if not self.history.history:
@@ -310,10 +455,9 @@ class CustomAgent(Agent):
for font_name in font_options: for font_name in font_options:
try: try:
import platform if platform.system() == 'Windows':
if platform.system() == "Windows":
# Need to specify the abs font path on Windows # Need to specify the abs font path on Windows
font_name = os.path.join(os.getenv("WIN_FONT_DIR", "C:\\Windows\\Fonts"), font_name + ".ttf") font_name = os.path.join(os.getenv('WIN_FONT_DIR', 'C:\\Windows\\Fonts'), font_name + '.ttf')
regular_font = ImageFont.truetype(font_name, font_size) regular_font = ImageFont.truetype(font_name, font_size)
title_font = ImageFont.truetype(font_name, title_font_size) title_font = ImageFont.truetype(font_name, title_font_size)
goal_font = ImageFont.truetype(font_name, goal_font_size) goal_font = ImageFont.truetype(font_name, goal_font_size)
@@ -390,134 +534,4 @@ class CustomAgent(Agent):
) )
logger.info(f'Created GIF at {output_path}') logger.info(f'Created GIF at {output_path}')
else: else:
logger.warning('No images found in history to create GIF') logger.warning('No images found in history to create GIF')
async def run(self, max_steps: int = 100) -> AgentHistoryList:
"""Execute the task with maximum number of steps"""
try:
logger.info(f"🚀 Starting task: {self.task}")
self.telemetry.capture(
AgentRunTelemetryEvent(
agent_id=self.agent_id,
task=self.task,
)
)
step_info = CustomAgentStepInfo(
task=self.task,
add_infos=self.add_infos,
step_number=1,
max_steps=max_steps,
memory="",
task_progress="",
future_plans=""
)
for step in range(max_steps):
# 1) Check if stop requested
if self.agent_state and self.agent_state.is_stop_requested():
logger.info("🛑 Stop requested by user")
self._create_stop_history_item()
break
# 2) Store last valid state before step
if self.browser_context and self.agent_state:
state = await self.browser_context.get_state(use_vision=self.use_vision)
self.agent_state.set_last_valid_state(state)
if self._too_many_failures():
break
# 3) Do the step
await self.step(step_info)
if self.history.is_done():
if (
self.validate_output and step < max_steps - 1
): # if last step, we dont need to validate
if not await self._validate_output():
continue
logger.info("✅ Task completed successfully")
break
else:
logger.info("❌ Failed to complete task in maximum steps")
return self.history
finally:
self.telemetry.capture(
AgentEndTelemetryEvent(
agent_id=self.agent_id,
task=self.task,
success=self.history.is_done(),
steps=len(self.history.history),
)
)
if not self.injected_browser_context:
await self.browser_context.close()
if not self.injected_browser and self.browser:
await self.browser.close()
if self.generate_gif:
self.create_history_gif()
def _create_stop_history_item(self):
"""Create a history item for when the agent is stopped."""
try:
# Attempt to retrieve the last valid state from agent_state
state = None
if self.agent_state:
last_state = self.agent_state.get_last_valid_state()
if last_state:
# Convert to BrowserStateHistory
state = BrowserStateHistory(
url=getattr(last_state, 'url', ""),
title=getattr(last_state, 'title', ""),
tabs=getattr(last_state, 'tabs', []),
interacted_element=[None],
screenshot=getattr(last_state, 'screenshot', None)
)
else:
state = self._create_empty_state()
else:
state = self._create_empty_state()
# Create a final item in the agent history indicating done
stop_history = AgentHistory(
model_output=None,
state=state,
result=[ActionResult(extracted_content=None, error=None, is_done=True)]
)
self.history.history.append(stop_history)
except Exception as e:
logger.error(f"Error creating stop history item: {e}")
# Create empty state as fallback
state = self._create_empty_state()
stop_history = AgentHistory(
model_output=None,
state=state,
result=[ActionResult(extracted_content=None, error=None, is_done=True)]
)
self.history.history.append(stop_history)
def _convert_to_browser_state_history(self, browser_state):
return BrowserStateHistory(
url=getattr(browser_state, 'url', ""),
title=getattr(browser_state, 'title', ""),
tabs=getattr(browser_state, 'tabs', []),
interacted_element=[None],
screenshot=getattr(browser_state, 'screenshot', None)
)
def _create_empty_state(self):
return BrowserStateHistory(
url="",
title="",
tabs=[],
interacted_element=[None],
screenshot=None
)

View File

@@ -1,9 +1,3 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/2
# @Author : wenshao
# @ProjectName: browser-use-webui
# @FileName: custom_massage_manager.py
from __future__ import annotations from __future__ import annotations
import logging import logging
@@ -11,15 +5,20 @@ from typing import List, Optional, Type
from browser_use.agent.message_manager.service import MessageManager from browser_use.agent.message_manager.service import MessageManager
from browser_use.agent.message_manager.views import MessageHistory from browser_use.agent.message_manager.views import MessageHistory
from browser_use.agent.prompts import SystemPrompt from browser_use.agent.prompts import SystemPrompt, AgentMessagePrompt
from browser_use.agent.views import ActionResult, AgentStepInfo from browser_use.agent.views import ActionResult, AgentStepInfo, ActionModel
from browser_use.browser.views import BrowserState from browser_use.browser.views import BrowserState
from langchain_core.language_models import BaseChatModel from langchain_core.language_models import BaseChatModel
from langchain_anthropic import ChatAnthropic
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import ( from langchain_core.messages import (
HumanMessage, AIMessage,
AIMessage BaseMessage,
HumanMessage,
ToolMessage
) )
from langchain_openai import ChatOpenAI
from ..utils.llm import DeepSeekR1ChatOpenAI
from .custom_prompts import CustomAgentMessagePrompt from .custom_prompts import CustomAgentMessagePrompt
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -32,14 +31,14 @@ class CustomMassageManager(MessageManager):
task: str, task: str,
action_descriptions: str, action_descriptions: str,
system_prompt_class: Type[SystemPrompt], system_prompt_class: Type[SystemPrompt],
agent_prompt_class: Type[AgentMessagePrompt],
max_input_tokens: int = 128000, max_input_tokens: int = 128000,
estimated_tokens_per_character: int = 3, estimated_characters_per_token: int = 3,
image_tokens: int = 800, image_tokens: int = 800,
include_attributes: list[str] = [], include_attributes: list[str] = [],
max_error_length: int = 400, max_error_length: int = 400,
max_actions_per_step: int = 10, max_actions_per_step: int = 10,
tool_call_in_content: bool = False, message_context: Optional[str] = None
use_function_calling: bool = True
): ):
super().__init__( super().__init__(
llm=llm, llm=llm,
@@ -47,72 +46,72 @@ class CustomMassageManager(MessageManager):
action_descriptions=action_descriptions, action_descriptions=action_descriptions,
system_prompt_class=system_prompt_class, system_prompt_class=system_prompt_class,
max_input_tokens=max_input_tokens, max_input_tokens=max_input_tokens,
estimated_tokens_per_character=estimated_tokens_per_character, estimated_characters_per_token=estimated_characters_per_token,
image_tokens=image_tokens, image_tokens=image_tokens,
include_attributes=include_attributes, include_attributes=include_attributes,
max_error_length=max_error_length, max_error_length=max_error_length,
max_actions_per_step=max_actions_per_step, max_actions_per_step=max_actions_per_step,
tool_call_in_content=tool_call_in_content, message_context=message_context
) )
self.use_function_calling = use_function_calling self.agent_prompt_class = agent_prompt_class
# Custom: Move Task info to state_message # Custom: Move Task info to state_message
self.history = MessageHistory() self.history = MessageHistory()
self._add_message_with_tokens(self.system_prompt) self._add_message_with_tokens(self.system_prompt)
if self.use_function_calling: if self.message_context:
tool_calls = [ context_message = HumanMessage(content=self.message_context)
{ self._add_message_with_tokens(context_message)
'name': 'CustomAgentOutput',
'args': {
'current_state': {
'prev_action_evaluation': 'Unknown - No previous actions to evaluate.',
'important_contents': '',
'completed_contents': '',
'thought': 'Now Google is open. Need to type OpenAI to search.',
'summary': 'Type OpenAI to search.',
},
'action': [],
},
'id': '',
'type': 'tool_call',
}
]
if self.tool_call_in_content:
# openai throws error if tool_calls are not responded -> move to content
example_tool_call = AIMessage(
content=f'{tool_calls}',
tool_calls=[],
)
else:
example_tool_call = AIMessage(
content=f'',
tool_calls=tool_calls,
)
self._add_message_with_tokens(example_tool_call)
def cut_messages(self): def cut_messages(self):
"""Get current message list, potentially trimmed to max tokens""" """Get current message list, potentially trimmed to max tokens"""
diff = self.history.total_tokens - self.max_input_tokens diff = self.history.total_tokens - self.max_input_tokens
i = 1 # start from 1 to keep system message in history min_message_len = 2 if self.message_context is not None else 1
while diff > 0 and i < len(self.history.messages):
self.history.remove_message(i) while diff > 0 and len(self.history.messages) > min_message_len:
self.history.remove_message(min_message_len) # alway remove the oldest message
diff = self.history.total_tokens - self.max_input_tokens diff = self.history.total_tokens - self.max_input_tokens
i += 1
def add_state_message( def add_state_message(
self, self,
state: BrowserState, state: BrowserState,
actions: Optional[List[ActionModel]] = None,
result: Optional[List[ActionResult]] = None, result: Optional[List[ActionResult]] = None,
step_info: Optional[AgentStepInfo] = None, step_info: Optional[AgentStepInfo] = None,
) -> None: ) -> None:
"""Add browser state as human message""" """Add browser state as human message"""
# otherwise add state message and result to next message (which will not stay in memory) # otherwise add state message and result to next message (which will not stay in memory)
state_message = CustomAgentMessagePrompt( state_message = self.agent_prompt_class(
state, state,
actions,
result, result,
include_attributes=self.include_attributes, include_attributes=self.include_attributes,
max_error_length=self.max_error_length, max_error_length=self.max_error_length,
step_info=step_info, step_info=step_info,
).get_user_message() ).get_user_message()
self._add_message_with_tokens(state_message) self._add_message_with_tokens(state_message)
def _count_text_tokens(self, text: str) -> int:
if isinstance(self.llm, (ChatOpenAI, ChatAnthropic, DeepSeekR1ChatOpenAI)):
try:
tokens = self.llm.get_num_tokens(text)
except Exception:
tokens = (
len(text) // self.estimated_characters_per_token
) # Rough estimate if no tokenizer available
else:
tokens = (
len(text) // self.estimated_characters_per_token
) # Rough estimate if no tokenizer available
return tokens
def _remove_state_message_by_index(self, remove_ind=-1) -> None:
"""Remove last state message from history"""
i = len(self.history.messages) - 1
remove_cnt = 0
while i >= 0:
if isinstance(self.history.messages[i].message, HumanMessage):
remove_cnt += 1
if remove_cnt == abs(remove_ind):
self.history.remove_message(i)
break
i -= 1

View File

@@ -1,13 +1,8 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/2
# @Author : wenshao
# @ProjectName: browser-use-webui
# @FileName: custom_prompts.py
import pdb import pdb
from typing import List, Optional from typing import List, Optional
from browser_use.agent.prompts import SystemPrompt from browser_use.agent.prompts import SystemPrompt, AgentMessagePrompt
from browser_use.agent.views import ActionResult from browser_use.agent.views import ActionResult, ActionModel
from browser_use.browser.views import BrowserState from browser_use.browser.views import BrowserState
from langchain_core.messages import HumanMessage, SystemMessage from langchain_core.messages import HumanMessage, SystemMessage
@@ -19,24 +14,19 @@ class CustomSystemPrompt(SystemPrompt):
""" """
Returns the important rules for the agent. Returns the important rules for the agent.
""" """
text = """ text = r"""
1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format: 1. RESPONSE FORMAT: You must ALWAYS respond with valid JSON in this exact format:
{ {
"current_state": { "current_state": {
"prev_action_evaluation": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Ignore the action result. The website is the ground truth. Also mention if something unexpected happened like new suggestions in an input field. Shortly state why/why not. Note that the result you output must be consistent with the reasoning you output afterwards. If you consider it to be 'Failed,' you should reflect on this during your thought.", "prev_action_evaluation": "Success|Failed|Unknown - Analyze the current elements and the image to check if the previous goals/actions are successful like intended by the task. Ignore the action result. The website is the ground truth. Also mention if something unexpected happened like new suggestions in an input field. Shortly state why/why not. Note that the result you output must be consistent with the reasoning you output afterwards. If you consider it to be 'Failed,' you should reflect on this during your thought.",
"important_contents": "Output important contents closely related to user\'s instruction or task on the current page. If there is, please output the contents. If not, please output empty string ''.", "important_contents": "Output important contents closely related to user\'s instruction on the current page. If there is, please output the contents. If not, please output empty string ''.",
"task_progress": "Task Progress is a general summary of the current contents that have been completed. Just summarize the contents that have been actually completed based on the content at current step and the history operations. Please list each completed item individually, such as: 1. Input username. 2. Input Password. 3. Click confirm button. Please return string type not a list.", "task_progress": "Task Progress is a general summary of the current contents that have been completed. Just summarize the contents that have been actually completed based on the content at current step and the history operations. Please list each completed item individually, such as: 1. Input username. 2. Input Password. 3. Click confirm button. Please return string type not a list.",
"future_plans": "Based on the user's request and the current state, outline the remaining steps needed to complete the task. This should be a concise list of actions yet to be performed, such as: 1. Select a date. 2. Choose a specific time slot. 3. Confirm booking. Please return string type not a list.", "future_plans": "Based on the user's request and the current state, outline the remaining steps needed to complete the task. This should be a concise list of actions yet to be performed, such as: 1. Select a date. 2. Choose a specific time slot. 3. Confirm booking. Please return string type not a list.",
"thought": "Think about the requirements that have been completed in previous operations and the requirements that need to be completed in the next one operation. If your output of prev_action_evaluation is 'Failed', please reflect and output your reflection here.", "thought": "Think about the requirements that have been completed in previous operations and the requirements that need to be completed in the next one operation. If your output of prev_action_evaluation is 'Failed', please reflect and output your reflection here.",
"summary": "Please generate a brief natural language description for the operation in next actions based on your Thought." "summary": "Please generate a brief natural language description for the operation in next actions based on your Thought."
}, },
"action": [ "action": [
{ * actions in sequences, please refer to **Common action sequences**. Each output action MUST be formated as: \{action_name\: action_params\}*
"action_name": {
// action-specific parameters
}
},
// ... more actions in sequence
] ]
} }
@@ -49,7 +39,6 @@ class CustomSystemPrompt(SystemPrompt):
{"click_element": {"index": 3}} {"click_element": {"index": 3}}
] ]
- Navigation and extraction: [ - Navigation and extraction: [
{"open_new_tab": {}},
{"go_to_url": {"url": "https://example.com"}}, {"go_to_url": {"url": "https://example.com"}},
{"extract_page_content": {}} {"extract_page_content": {}}
] ]
@@ -67,7 +56,7 @@ class CustomSystemPrompt(SystemPrompt):
- Use scroll to find elements you are looking for - Use scroll to find elements you are looking for
5. TASK COMPLETION: 5. TASK COMPLETION:
- If you think all the requirements of user\'s instruction have been completed and no further operation is required, output the done action to terminate the operation process. - If you think all the requirements of user\'s instruction have been completed and no further operation is required, output the **Done** action to terminate the operation process.
- Don't hallucinate actions. - Don't hallucinate actions.
- If the task requires specific information - make sure to include everything in the done function. This is what the user will see. - If the task requires specific information - make sure to include everything in the done function. This is what the user will see.
- If you are running out of steps (current step), think about speeding it up, and ALWAYS use the done action as the last action. - If you are running out of steps (current step), think about speeding it up, and ALWAYS use the done action as the last action.
@@ -132,7 +121,7 @@ class CustomSystemPrompt(SystemPrompt):
AGENT_PROMPT = f"""You are a precise browser automation agent that interacts with websites through structured commands. Your role is to: AGENT_PROMPT = f"""You are a precise browser automation agent that interacts with websites through structured commands. Your role is to:
1. Analyze the provided webpage elements and structure 1. Analyze the provided webpage elements and structure
2. Plan a sequence of actions to accomplish the given task 2. Plan a sequence of actions to accomplish the given task
3. Respond with valid JSON containing your action sequence and state assessment 3. Your final result MUST be a valid JSON as the **RESPONSE FORMAT** described, containing your action sequence and state assessment, No need extra content to expalin.
Current date and time: {time_str} Current date and time: {time_str}
@@ -147,33 +136,54 @@ class CustomSystemPrompt(SystemPrompt):
return SystemMessage(content=AGENT_PROMPT) return SystemMessage(content=AGENT_PROMPT)
class CustomAgentMessagePrompt: class CustomAgentMessagePrompt(AgentMessagePrompt):
def __init__( def __init__(
self, self,
state: BrowserState, state: BrowserState,
actions: Optional[List[ActionModel]] = None,
result: Optional[List[ActionResult]] = None, result: Optional[List[ActionResult]] = None,
include_attributes: list[str] = [], include_attributes: list[str] = [],
max_error_length: int = 400, max_error_length: int = 400,
step_info: Optional[CustomAgentStepInfo] = None, step_info: Optional[CustomAgentStepInfo] = None,
): ):
self.state = state super(CustomAgentMessagePrompt, self).__init__(state=state,
self.result = result result=result,
self.max_error_length = max_error_length include_attributes=include_attributes,
self.include_attributes = include_attributes max_error_length=max_error_length,
self.step_info = step_info step_info=step_info
)
self.actions = actions
def get_user_message(self) -> HumanMessage: def get_user_message(self) -> HumanMessage:
if self.step_info: if self.step_info:
step_info_description = f'Current step: {self.step_info.step_number + 1}/{self.step_info.max_steps}' step_info_description = f'Current step: {self.step_info.step_number}/{self.step_info.max_steps}\n'
else: else:
step_info_description = '' step_info_description = ''
elements_text = self.state.element_tree.clickable_elements_to_string(include_attributes=self.include_attributes) elements_text = self.state.element_tree.clickable_elements_to_string(include_attributes=self.include_attributes)
if not elements_text:
has_content_above = (self.state.pixels_above or 0) > 0
has_content_below = (self.state.pixels_below or 0) > 0
if elements_text != '':
if has_content_above:
elements_text = (
f'... {self.state.pixels_above} pixels above - scroll or extract content to see more ...\n{elements_text}'
)
else:
elements_text = f'[Start of page]\n{elements_text}'
if has_content_below:
elements_text = (
f'{elements_text}\n... {self.state.pixels_below} pixels below - scroll or extract content to see more ...'
)
else:
elements_text = f'{elements_text}\n[End of page]'
else:
elements_text = 'empty page' elements_text = 'empty page'
state_description = f""" state_description = f"""
{step_info_description} {step_info_description}
1. Task: {self.step_info.task} 1. Task: {self.step_info.task}.
2. Hints(Optional): 2. Hints(Optional):
{self.step_info.add_infos} {self.step_info.add_infos}
3. Memory: 3. Memory:
@@ -185,16 +195,21 @@ class CustomAgentMessagePrompt:
{elements_text} {elements_text}
""" """
if self.result: if self.actions and self.result:
state_description += "\n **Previous Actions** \n"
state_description += f'Previous step: {self.step_info.step_number-1}/{self.step_info.max_steps} \n'
for i, result in enumerate(self.result): for i, result in enumerate(self.result):
if result.extracted_content: action = self.actions[i]
state_description += f"\nResult of action {i + 1}/{len(self.result)}: {result.extracted_content}" state_description += f"Previous action {i + 1}/{len(self.result)}: {action.model_dump_json(exclude_unset=True)}\n"
if result.error: if result.include_in_memory:
# only use last 300 characters of error if result.extracted_content:
error = result.error[-self.max_error_length:] state_description += f"Result of previous action {i + 1}/{len(self.result)}: {result.extracted_content}\n"
state_description += ( if result.error:
f"\nError of action {i + 1}/{len(self.result)}: ...{error}" # only use last 300 characters of error
) error = result.error[-self.max_error_length:]
state_description += (
f"Error of previous action {i + 1}/{len(self.result)}: ...{error}\n"
)
if self.state.screenshot: if self.state.screenshot:
# Format message for vision model # Format message for vision model

View File

@@ -1,9 +1,3 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/2
# @Author : wenshao
# @ProjectName: browser-use-webui
# @FileName: custom_views.py
from dataclasses import dataclass from dataclasses import dataclass
from typing import Type from typing import Type
@@ -51,7 +45,7 @@ class CustomAgentOutput(AgentOutput):
) -> Type["CustomAgentOutput"]: ) -> Type["CustomAgentOutput"]:
"""Extend actions with custom actions""" """Extend actions with custom actions"""
return create_model( return create_model(
"AgentOutput", "CustomAgentOutput",
__base__=CustomAgentOutput, __base__=CustomAgentOutput,
action=( action=(
list[custom_actions], list[custom_actions],

View File

@@ -1,6 +0,0 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/1
# @Author : wenshao
# @Email : wenshaoguo1026@gmail.com
# @Project : browser-use-webui
# @FileName: __init__.py.py

View File

@@ -1,30 +0,0 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/6
# @Author : wenshao
# @ProjectName: browser-use-webui
# @FileName: config.py
import os
from dataclasses import dataclass
from typing import Optional
@dataclass
class BrowserPersistenceConfig:
"""Configuration for browser persistence"""
persistent_session: bool = False
user_data_dir: Optional[str] = None
debugging_port: Optional[int] = None
debugging_host: Optional[str] = None
@classmethod
def from_env(cls) -> "BrowserPersistenceConfig":
"""Create config from environment variables"""
return cls(
persistent_session=os.getenv("CHROME_PERSISTENT_SESSION", "").lower()
== "true",
user_data_dir=os.getenv("CHROME_USER_DATA"),
debugging_port=int(os.getenv("CHROME_DEBUGGING_PORT", "9222")),
debugging_host=os.getenv("CHROME_DEBUGGING_HOST", "localhost"),
)

View File

@@ -1,26 +1,19 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/2
# @Author : wenshao
# @ProjectName: browser-use-webui
# @FileName: browser.py
import asyncio import asyncio
import pdb import pdb
from playwright.async_api import Browser as PlaywrightBrowser from playwright.async_api import Browser as PlaywrightBrowser
from playwright.async_api import ( from playwright.async_api import (
BrowserContext as PlaywrightBrowserContext, BrowserContext as PlaywrightBrowserContext,
) )
from playwright.async_api import ( from playwright.async_api import (
Playwright, Playwright,
async_playwright, async_playwright,
) )
from browser_use.browser.browser import Browser from browser_use.browser.browser import Browser
from browser_use.browser.context import BrowserContext, BrowserContextConfig from browser_use.browser.context import BrowserContext, BrowserContextConfig
from playwright.async_api import BrowserContext as PlaywrightBrowserContext from playwright.async_api import BrowserContext as PlaywrightBrowserContext
import logging import logging
from .config import BrowserPersistenceConfig
from .custom_context import CustomBrowserContext from .custom_context import CustomBrowserContext
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -32,96 +25,57 @@ class CustomBrowser(Browser):
config: BrowserContextConfig = BrowserContextConfig() config: BrowserContextConfig = BrowserContextConfig()
) -> CustomBrowserContext: ) -> CustomBrowserContext:
return CustomBrowserContext(config=config, browser=self) return CustomBrowserContext(config=config, browser=self)
async def _setup_browser(self, playwright: Playwright) -> PlaywrightBrowser: async def _setup_browser_with_instance(self, playwright: Playwright) -> PlaywrightBrowser:
"""Sets up and returns a Playwright Browser instance with anti-detection measures.""" """Sets up and returns a Playwright Browser instance with anti-detection measures."""
if self.config.wss_url: if not self.config.chrome_instance_path:
browser = await playwright.chromium.connect(self.config.wss_url) raise ValueError('Chrome instance path is required')
return browser import subprocess
elif self.config.chrome_instance_path:
import subprocess
import requests import requests
try: try:
# Check if browser is already running # Check if browser is already running
response = requests.get('http://localhost:9222/json/version', timeout=2) response = requests.get('http://localhost:9222/json/version', timeout=2)
if response.status_code == 200: if response.status_code == 200:
logger.info('Reusing existing Chrome instance') logger.info('Reusing existing Chrome instance')
browser = await playwright.chromium.connect_over_cdp(
endpoint_url='http://localhost:9222',
timeout=20000, # 20 second timeout for connection
)
return browser
except requests.ConnectionError:
logger.debug('No existing Chrome instance found, starting a new one')
# Start a new Chrome instance
subprocess.Popen(
[
self.config.chrome_instance_path,
'--remote-debugging-port=9222',
],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
# Attempt to connect again after starting a new instance
for _ in range(10):
try:
response = requests.get('http://localhost:9222/json/version', timeout=2)
if response.status_code == 200:
break
except requests.ConnectionError:
pass
await asyncio.sleep(1)
try:
browser = await playwright.chromium.connect_over_cdp( browser = await playwright.chromium.connect_over_cdp(
endpoint_url='http://localhost:9222', endpoint_url='http://localhost:9222',
timeout=20000, # 20 second timeout for connection timeout=20000, # 20 second timeout for connection
) )
return browser return browser
except Exception as e: except requests.ConnectionError:
logger.error(f'Failed to start a new Chrome instance.: {str(e)}') logger.debug('No existing Chrome instance found, starting a new one')
raise RuntimeError(
' To start chrome in Debug mode, you need to close all existing Chrome instances and try again otherwise we can not connect to the instance.'
)
else: # Start a new Chrome instance
subprocess.Popen(
[
self.config.chrome_instance_path,
'--remote-debugging-port=9222',
] + self.config.extra_chromium_args,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
)
# try to connect first in case the browser have not started
for _ in range(10):
try: try:
disable_security_args = [] response = requests.get('http://localhost:9222/json/version', timeout=2)
if self.config.disable_security: if response.status_code == 200:
disable_security_args = [ break
'--disable-web-security', except requests.ConnectionError:
'--disable-site-isolation-trials', pass
'--disable-features=IsolateOrigins,site-per-process', await asyncio.sleep(1)
]
browser = await playwright.chromium.launch( # Attempt to connect again after starting a new instance
headless=self.config.headless, try:
args=[ browser = await playwright.chromium.connect_over_cdp(
'--no-sandbox', endpoint_url='http://localhost:9222',
'--disable-blink-features=AutomationControlled', timeout=20000, # 20 second timeout for connection
'--disable-infobars', )
'--disable-background-timer-throttling', return browser
'--disable-popup-blocking', except Exception as e:
'--disable-backgrounding-occluded-windows', logger.error(f'Failed to start a new Chrome instance.: {str(e)}')
'--disable-renderer-backgrounding', raise RuntimeError(
'--disable-window-activation', ' To start chrome in Debug mode, you need to close all existing Chrome instances and try again otherwise we can not connect to the instance.'
'--disable-focus-on-load', )
'--no-first-run',
'--no-default-browser-check',
'--no-startup-window',
'--window-position=0,0',
# '--window-size=1280,1000',
]
+ disable_security_args
+ self.config.extra_chromium_args,
proxy=self.config.proxy,
)
return browser
except Exception as e:
logger.error(f'Failed to initialize Playwright browser: {str(e)}')
raise

View File

@@ -1,10 +1,3 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/1
# @Author : wenshao
# @Email : wenshaoguo1026@gmail.com
# @Project : browser-use-webui
# @FileName: context.py
import json import json
import logging import logging
import os import os
@@ -14,7 +7,6 @@ from browser_use.browser.context import BrowserContext, BrowserContextConfig
from playwright.async_api import Browser as PlaywrightBrowser from playwright.async_api import Browser as PlaywrightBrowser
from playwright.async_api import BrowserContext as PlaywrightBrowserContext from playwright.async_api import BrowserContext as PlaywrightBrowserContext
from .config import BrowserPersistenceConfig
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -24,73 +16,4 @@ class CustomBrowserContext(BrowserContext):
browser: "Browser", browser: "Browser",
config: BrowserContextConfig = BrowserContextConfig() config: BrowserContextConfig = BrowserContextConfig()
): ):
super(CustomBrowserContext, self).__init__(browser=browser, config=config) super(CustomBrowserContext, self).__init__(browser=browser, config=config)
async def _create_context(self, browser: PlaywrightBrowser) -> PlaywrightBrowserContext:
"""Creates a new browser context with anti-detection measures and loads cookies if available."""
# If we have a context, return it directly
# Check if we should use existing context for persistence
if self.browser.config.chrome_instance_path and len(browser.contexts) > 0:
# Connect to existing Chrome instance instead of creating new one
context = browser.contexts[0]
else:
# Original code for creating new context
context = await browser.new_context(
viewport=self.config.browser_window_size,
no_viewport=False,
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
),
java_script_enabled=True,
bypass_csp=self.config.disable_security,
ignore_https_errors=self.config.disable_security,
record_video_dir=self.config.save_recording_path,
record_video_size=self.config.browser_window_size,
)
if self.config.trace_path:
await context.tracing.start(screenshots=True, snapshots=True, sources=True)
# Load cookies if they exist
if self.config.cookies_file and os.path.exists(self.config.cookies_file):
with open(self.config.cookies_file, "r") as f:
cookies = json.load(f)
logger.info(
f"Loaded {len(cookies)} cookies from {self.config.cookies_file}"
)
await context.add_cookies(cookies)
# Expose anti-detection scripts
await context.add_init_script(
"""
// Webdriver property
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
// Languages
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
// Plugins
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5]
});
// Chrome runtime
window.chrome = { runtime: {} };
// Permissions
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
"""
)
return context

View File

@@ -1,5 +0,0 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/2
# @Author : wenshao
# @ProjectName: browser-use-webui
# @FileName: __init__.py.py

View File

@@ -1,18 +1,16 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/2
# @Author : wenshao
# @ProjectName: browser-use-webui
# @FileName: custom_action.py
import pyperclip import pyperclip
from typing import Optional, Type
from pydantic import BaseModel
from browser_use.agent.views import ActionResult from browser_use.agent.views import ActionResult
from browser_use.browser.context import BrowserContext from browser_use.browser.context import BrowserContext
from browser_use.controller.service import Controller from browser_use.controller.service import Controller, DoneAction
class CustomController(Controller): class CustomController(Controller):
def __init__(self): def __init__(self, exclude_actions: list[str] = [],
super().__init__() output_model: Optional[Type[BaseModel]] = None
):
super().__init__(exclude_actions=exclude_actions, output_model=output_model)
self._register_custom_actions() self._register_custom_actions()
def _register_custom_actions(self): def _register_custom_actions(self):

View File

@@ -1,6 +0,0 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/1
# @Author : wenshao
# @Email : wenshaoguo1026@gmail.com
# @Project : browser-use-webui
# @FileName: __init__.py.py

View File

@@ -11,13 +11,13 @@ def default_config():
"max_steps": 100, "max_steps": 100,
"max_actions_per_step": 10, "max_actions_per_step": 10,
"use_vision": True, "use_vision": True,
"tool_call_in_content": True, "tool_calling_method": "auto",
"llm_provider": "openai", "llm_provider": "openai",
"llm_model_name": "gpt-4o", "llm_model_name": "gpt-4o",
"llm_temperature": 1.0, "llm_temperature": 1.0,
"llm_base_url": "", "llm_base_url": "",
"llm_api_key": "", "llm_api_key": "",
"use_own_browser": os.getenv("CHROME_PERSISTENT_SESSION", False), "use_own_browser": os.getenv("CHROME_PERSISTENT_SESSION", "false").lower() == "true",
"keep_browser_open": False, "keep_browser_open": False,
"headless": False, "headless": False,
"disable_security": True, "disable_security": True,
@@ -56,7 +56,7 @@ def save_current_config(*args):
"max_steps": args[1], "max_steps": args[1],
"max_actions_per_step": args[2], "max_actions_per_step": args[2],
"use_vision": args[3], "use_vision": args[3],
"tool_call_in_content": args[4], "tool_calling_method": args[4],
"llm_provider": args[5], "llm_provider": args[5],
"llm_model_name": args[6], "llm_model_name": args[6],
"llm_temperature": args[7], "llm_temperature": args[7],
@@ -86,7 +86,7 @@ def update_ui_from_config(config_file):
gr.update(value=loaded_config.get("max_steps", 100)), gr.update(value=loaded_config.get("max_steps", 100)),
gr.update(value=loaded_config.get("max_actions_per_step", 10)), gr.update(value=loaded_config.get("max_actions_per_step", 10)),
gr.update(value=loaded_config.get("use_vision", True)), gr.update(value=loaded_config.get("use_vision", True)),
gr.update(value=loaded_config.get("tool_call_in_content", True)), gr.update(value=loaded_config.get("tool_calling_method", True)),
gr.update(value=loaded_config.get("llm_provider", "openai")), gr.update(value=loaded_config.get("llm_provider", "openai")),
gr.update(value=loaded_config.get("llm_model_name", "gpt-4o")), gr.update(value=loaded_config.get("llm_model_name", "gpt-4o")),
gr.update(value=loaded_config.get("llm_temperature", 1.0)), gr.update(value=loaded_config.get("llm_temperature", 1.0)),

View File

@@ -25,6 +25,7 @@ from langchain_core.outputs import (
LLMResult, LLMResult,
RunInfo, RunInfo,
) )
from langchain_ollama import ChatOllama
from langchain_core.output_parsers.base import OutputParserLike from langchain_core.output_parsers.base import OutputParserLike
from langchain_core.runnables import Runnable, RunnableConfig from langchain_core.runnables import Runnable, RunnableConfig
from langchain_core.tools import BaseTool from langchain_core.tools import BaseTool
@@ -98,4 +99,38 @@ class DeepSeekR1ChatOpenAI(ChatOpenAI):
reasoning_content = response.choices[0].message.reasoning_content reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content content = response.choices[0].message.content
return AIMessage(content=content, reasoning_content=reasoning_content)
class DeepSeekR1ChatOllama(ChatOllama):
async def ainvoke(
self,
input: LanguageModelInput,
config: Optional[RunnableConfig] = None,
*,
stop: Optional[list[str]] = None,
**kwargs: Any,
) -> AIMessage:
org_ai_message = await super().ainvoke(input=input)
org_content = org_ai_message.content
reasoning_content = org_content.split("</think>")[0].replace("<think>", "")
content = org_content.split("</think>")[1]
if "**JSON Response:**" in content:
content = content.split("**JSON Response:**")[-1]
return AIMessage(content=content, reasoning_content=reasoning_content)
def invoke(
self,
input: LanguageModelInput,
config: Optional[RunnableConfig] = None,
*,
stop: Optional[list[str]] = None,
**kwargs: Any,
) -> AIMessage:
org_ai_message = super().invoke(input=input)
org_content = org_ai_message.content
reasoning_content = org_content.split("</think>")[0].replace("<think>", "")
content = org_content.split("</think>")[1]
if "**JSON Response:**" in content:
content = content.split("**JSON Response:**")[-1]
return AIMessage(content=content, reasoning_content=reasoning_content) return AIMessage(content=content, reasoning_content=reasoning_content)

View File

@@ -1,9 +1,3 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/1
# @Author : wenshao
# @Email : wenshaoguo1026@gmail.com
# @Project : browser-use-webui
# @FileName: utils.py
import base64 import base64
import os import os
import time import time
@@ -11,12 +5,21 @@ from pathlib import Path
from typing import Dict, Optional from typing import Dict, Optional
from langchain_anthropic import ChatAnthropic from langchain_anthropic import ChatAnthropic
from langchain_mistralai import ChatMistralAI
from langchain_google_genai import ChatGoogleGenerativeAI from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_ollama import ChatOllama from langchain_ollama import ChatOllama
from langchain_openai import AzureChatOpenAI, ChatOpenAI from langchain_openai import AzureChatOpenAI, ChatOpenAI
import gradio as gr import gradio as gr
from .llm import DeepSeekR1ChatOpenAI from .llm import DeepSeekR1ChatOpenAI, DeepSeekR1ChatOllama
PROVIDER_DISPLAY_NAMES = {
"openai": "OpenAI",
"azure_openai": "Azure OpenAI",
"anthropic": "Anthropic",
"deepseek": "DeepSeek",
"gemini": "Gemini"
}
def get_llm_model(provider: str, **kwargs): def get_llm_model(provider: str, **kwargs):
""" """
@@ -25,19 +28,37 @@ def get_llm_model(provider: str, **kwargs):
:param kwargs: :param kwargs:
:return: :return:
""" """
if provider not in ["ollama"]:
env_var = "GOOGLE_API_KEY" if provider == "gemini" else f"{provider.upper()}_API_KEY"
api_key = kwargs.get("api_key", "") or os.getenv(env_var, "")
if not api_key:
handle_api_key_error(provider, env_var)
kwargs["api_key"] = api_key
if provider == "anthropic": if provider == "anthropic":
if not kwargs.get("base_url", ""): if not kwargs.get("base_url", ""):
base_url = "https://api.anthropic.com" base_url = "https://api.anthropic.com"
else: else:
base_url = kwargs.get("base_url") base_url = kwargs.get("base_url")
return ChatAnthropic(
model_name=kwargs.get("model_name", "claude-3-5-sonnet-20240620"),
temperature=kwargs.get("temperature", 0.0),
base_url=base_url,
api_key=api_key,
)
elif provider == 'mistral':
if not kwargs.get("base_url", ""):
base_url = os.getenv("MISTRAL_ENDPOINT", "https://api.mistral.ai/v1")
else:
base_url = kwargs.get("base_url")
if not kwargs.get("api_key", ""): if not kwargs.get("api_key", ""):
api_key = os.getenv("ANTHROPIC_API_KEY", "") api_key = os.getenv("MISTRAL_API_KEY", "")
else: else:
api_key = kwargs.get("api_key") api_key = kwargs.get("api_key")
return ChatAnthropic( return ChatMistralAI(
model_name=kwargs.get("model_name", "claude-3-5-sonnet-20240620"), model=kwargs.get("model_name", "mistral-large-latest"),
temperature=kwargs.get("temperature", 0.0), temperature=kwargs.get("temperature", 0.0),
base_url=base_url, base_url=base_url,
api_key=api_key, api_key=api_key,
@@ -48,11 +69,6 @@ def get_llm_model(provider: str, **kwargs):
else: else:
base_url = kwargs.get("base_url") base_url = kwargs.get("base_url")
if not kwargs.get("api_key", ""):
api_key = os.getenv("OPENAI_API_KEY", "")
else:
api_key = kwargs.get("api_key")
return ChatOpenAI( return ChatOpenAI(
model=kwargs.get("model_name", "gpt-4o"), model=kwargs.get("model_name", "gpt-4o"),
temperature=kwargs.get("temperature", 0.0), temperature=kwargs.get("temperature", 0.0),
@@ -65,11 +81,6 @@ def get_llm_model(provider: str, **kwargs):
else: else:
base_url = kwargs.get("base_url") base_url = kwargs.get("base_url")
if not kwargs.get("api_key", ""):
api_key = os.getenv("DEEPSEEK_API_KEY", "")
else:
api_key = kwargs.get("api_key")
if kwargs.get("model_name", "deepseek-chat") == "deepseek-reasoner": if kwargs.get("model_name", "deepseek-chat") == "deepseek-reasoner":
return DeepSeekR1ChatOpenAI( return DeepSeekR1ChatOpenAI(
model=kwargs.get("model_name", "deepseek-reasoner"), model=kwargs.get("model_name", "deepseek-reasoner"),
@@ -85,31 +96,37 @@ def get_llm_model(provider: str, **kwargs):
api_key=api_key, api_key=api_key,
) )
elif provider == "gemini": elif provider == "gemini":
if not kwargs.get("api_key", ""):
api_key = os.getenv("GOOGLE_API_KEY", "")
else:
api_key = kwargs.get("api_key")
return ChatGoogleGenerativeAI( return ChatGoogleGenerativeAI(
model=kwargs.get("model_name", "gemini-2.0-flash-exp"), model=kwargs.get("model_name", "gemini-2.0-flash-exp"),
temperature=kwargs.get("temperature", 0.0), temperature=kwargs.get("temperature", 0.0),
google_api_key=api_key, google_api_key=api_key,
) )
elif provider == "ollama": elif provider == "ollama":
return ChatOllama( if not kwargs.get("base_url", ""):
model=kwargs.get("model_name", "qwen2.5:7b"), base_url = os.getenv("OLLAMA_ENDPOINT", "http://localhost:11434")
temperature=kwargs.get("temperature", 0.0), else:
num_ctx=kwargs.get("num_ctx", 32000), base_url = kwargs.get("base_url")
base_url=kwargs.get("base_url", "http://localhost:11434"),
) if "deepseek-r1" in kwargs.get("model_name", "qwen2.5:7b"):
return DeepSeekR1ChatOllama(
model=kwargs.get("model_name", "deepseek-r1:14b"),
temperature=kwargs.get("temperature", 0.0),
num_ctx=kwargs.get("num_ctx", 32000),
base_url=base_url,
)
else:
return ChatOllama(
model=kwargs.get("model_name", "qwen2.5:7b"),
temperature=kwargs.get("temperature", 0.0),
num_ctx=kwargs.get("num_ctx", 32000),
num_predict=kwargs.get("num_predict", 1024),
base_url=base_url,
)
elif provider == "azure_openai": elif provider == "azure_openai":
if not kwargs.get("base_url", ""): if not kwargs.get("base_url", ""):
base_url = os.getenv("AZURE_OPENAI_ENDPOINT", "") base_url = os.getenv("AZURE_OPENAI_ENDPOINT", "")
else: else:
base_url = kwargs.get("base_url") base_url = kwargs.get("base_url")
if not kwargs.get("api_key", ""):
api_key = os.getenv("AZURE_OPENAI_API_KEY", "")
else:
api_key = kwargs.get("api_key")
return AzureChatOpenAI( return AzureChatOpenAI(
model=kwargs.get("model_name", "gpt-4o"), model=kwargs.get("model_name", "gpt-4o"),
temperature=kwargs.get("temperature", 0.0), temperature=kwargs.get("temperature", 0.0),
@@ -123,11 +140,12 @@ def get_llm_model(provider: str, **kwargs):
# Predefined model names for common providers # Predefined model names for common providers
model_names = { model_names = {
"anthropic": ["claude-3-5-sonnet-20240620", "claude-3-opus-20240229"], "anthropic": ["claude-3-5-sonnet-20240620", "claude-3-opus-20240229"],
"openai": ["gpt-4o", "gpt-4", "gpt-3.5-turbo"], "openai": ["gpt-4o", "gpt-4", "gpt-3.5-turbo", "o3-mini"],
"deepseek": ["deepseek-chat", "deepseek-reasoner"], "deepseek": ["deepseek-chat", "deepseek-reasoner"],
"gemini": ["gemini-2.0-flash-exp", "gemini-2.0-flash-thinking-exp", "gemini-1.5-flash-latest", "gemini-1.5-flash-8b-latest", "gemini-2.0-flash-thinking-exp-1219" ], "gemini": ["gemini-2.0-flash-exp", "gemini-2.0-flash-thinking-exp", "gemini-1.5-flash-latest", "gemini-1.5-flash-8b-latest", "gemini-2.0-flash-thinking-exp-1219" ],
"ollama": ["qwen2.5:7b", "llama2:7b"], "ollama": ["qwen2.5:7b", "llama2:7b", "deepseek-r1:14b", "deepseek-r1:32b"],
"azure_openai": ["gpt-4o", "gpt-4", "gpt-3.5-turbo"] "azure_openai": ["gpt-4o", "gpt-4", "gpt-3.5-turbo"],
"mistral": ["pixtral-large-latest", "mistral-large-latest", "mistral-small-latest", "ministral-8b-latest"]
} }
# Callback to update the model name dropdown based on the selected provider # Callback to update the model name dropdown based on the selected provider
@@ -146,7 +164,17 @@ def update_model_dropdown(llm_provider, api_key=None, base_url=None):
return gr.Dropdown(choices=model_names[llm_provider], value=model_names[llm_provider][0], interactive=True) return gr.Dropdown(choices=model_names[llm_provider], value=model_names[llm_provider][0], interactive=True)
else: else:
return gr.Dropdown(choices=[], value="", interactive=True, allow_custom_value=True) return gr.Dropdown(choices=[], value="", interactive=True, allow_custom_value=True)
def handle_api_key_error(provider: str, env_var: str):
"""
Handles the missing API key error by raising a gr.Error with a clear message.
"""
provider_display = PROVIDER_DISPLAY_NAMES.get(provider, provider.upper())
raise gr.Error(
f"💥 {provider_display} API key not found! 🔑 Please set the "
f"`{env_var}` environment variable or provide it in the UI."
)
def encode_image(img_path): def encode_image(img_path):
if not img_path: if not img_path:
return None return None

View File

@@ -1,8 +1,3 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/2
# @Author : wenshao
# @ProjectName: browser-use-webui
# @FileName: test_browser_use.py
import pdb import pdb
from dotenv import load_dotenv from dotenv import load_dotenv
@@ -37,15 +32,27 @@ async def test_browser_use_org():
# api_key=os.getenv("AZURE_OPENAI_API_KEY", ""), # api_key=os.getenv("AZURE_OPENAI_API_KEY", ""),
# ) # )
# llm = utils.get_llm_model(
# provider="deepseek",
# model_name="deepseek-chat",
# temperature=0.8
# )
llm = utils.get_llm_model( llm = utils.get_llm_model(
provider="deepseek", provider="ollama", model_name="deepseek-r1:14b", temperature=0.5
model_name="deepseek-chat",
temperature=0.8
) )
window_w, window_h = 1920, 1080 window_w, window_h = 1920, 1080
use_vision = False use_vision = False
chrome_path = os.getenv("CHROME_PATH", None) use_own_browser = False
if use_own_browser:
chrome_path = os.getenv("CHROME_PATH", None)
if chrome_path == "":
chrome_path = None
else:
chrome_path = None
tool_calling_method = "json_schema" # setting to json_schema when using ollma
browser = Browser( browser = Browser(
config=BrowserConfig( config=BrowserConfig(
@@ -69,7 +76,8 @@ async def test_browser_use_org():
task="go to google.com and type 'OpenAI' click search and give me the first url", task="go to google.com and type 'OpenAI' click search and give me the first url",
llm=llm, llm=llm,
browser_context=browser_context, browser_context=browser_context,
use_vision=use_vision use_vision=use_vision,
tool_calling_method=tool_calling_method
) )
history: AgentHistoryList = await agent.run(max_steps=10) history: AgentHistoryList = await agent.run(max_steps=10)
@@ -95,151 +103,29 @@ async def test_browser_use_custom():
from playwright.async_api import async_playwright from playwright.async_api import async_playwright
from src.agent.custom_agent import CustomAgent from src.agent.custom_agent import CustomAgent
from src.agent.custom_prompts import CustomSystemPrompt from src.agent.custom_prompts import CustomSystemPrompt, CustomAgentMessagePrompt
from src.browser.custom_browser import CustomBrowser from src.browser.custom_browser import CustomBrowser
from src.browser.custom_context import BrowserContextConfig from src.browser.custom_context import BrowserContextConfig
from src.controller.custom_controller import CustomController from src.controller.custom_controller import CustomController
window_w, window_h = 1920, 1080 window_w, window_h = 1920, 1080
# llm = utils.get_llm_model( # llm = utils.get_llm_model(
# provider="azure_openai", # provider="openai",
# model_name="gpt-4o", # model_name="gpt-4o",
# temperature=0.8, # temperature=0.8,
# base_url=os.getenv("AZURE_OPENAI_ENDPOINT", ""), # base_url=os.getenv("OPENAI_ENDPOINT", ""),
# api_key=os.getenv("AZURE_OPENAI_API_KEY", ""), # api_key=os.getenv("OPENAI_API_KEY", ""),
# ) # )
llm = utils.get_llm_model( llm = utils.get_llm_model(
provider="gemini", provider="azure_openai",
model_name="gemini-2.0-flash-exp", model_name="gpt-4o",
temperature=1.0, temperature=0.8,
api_key=os.getenv("GOOGLE_API_KEY", "") base_url=os.getenv("AZURE_OPENAI_ENDPOINT", ""),
api_key=os.getenv("AZURE_OPENAI_API_KEY", ""),
) )
# llm = utils.get_llm_model(
# provider="deepseek",
# model_name="deepseek-chat",
# temperature=0.8
# )
# llm = utils.get_llm_model(
# provider="ollama", model_name="qwen2.5:7b", temperature=0.8
# )
controller = CustomController()
use_own_browser = False
disable_security = True
use_vision = True # Set to False when using DeepSeek
tool_call_in_content = True # Set to True when using Ollama
max_actions_per_step = 1
playwright = None
browser_context_ = None
try:
if use_own_browser:
playwright = await async_playwright().start()
chrome_exe = os.getenv("CHROME_PATH", "")
chrome_use_data = os.getenv("CHROME_USER_DATA", "")
browser_context_ = await playwright.chromium.launch_persistent_context(
user_data_dir=chrome_use_data,
executable_path=chrome_exe,
no_viewport=False,
headless=False, # 保持浏览器窗口可见
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
),
java_script_enabled=True,
bypass_csp=disable_security,
ignore_https_errors=disable_security,
record_video_dir="./tmp/record_videos",
record_video_size={"width": window_w, "height": window_h},
)
else:
browser_context_ = None
browser = CustomBrowser(
config=BrowserConfig(
headless=False,
disable_security=True,
extra_chromium_args=[f"--window-size={window_w},{window_h}"],
)
)
async with await browser.new_context(
config=BrowserContextConfig(
trace_path="./tmp/result_processing",
save_recording_path="./tmp/record_videos",
no_viewport=False,
browser_window_size=BrowserContextWindowSize(
width=window_w, height=window_h
),
),
context=browser_context_,
) as browser_context:
agent = CustomAgent(
task="go to google.com and type 'OpenAI' click search and give me the first url",
add_infos="", # some hints for llm to complete the task
llm=llm,
browser_context=browser_context,
controller=controller,
system_prompt_class=CustomSystemPrompt,
use_vision=use_vision,
tool_call_in_content=tool_call_in_content,
max_actions_per_step=max_actions_per_step
)
history: AgentHistoryList = await agent.run(max_steps=10)
print("Final Result:")
pprint(history.final_result(), indent=4)
print("\nErrors:")
pprint(history.errors(), indent=4)
# e.g. xPaths the model clicked on
print("\nModel Outputs:")
pprint(history.model_actions(), indent=4)
print("\nThoughts:")
pprint(history.model_thoughts(), indent=4)
# close browser
except Exception:
import traceback
traceback.print_exc()
finally:
# 显式关闭持久化上下文
if browser_context_:
await browser_context_.close()
# 关闭 Playwright 对象
if playwright:
await playwright.stop()
await browser.close()
async def test_browser_use_custom_v2():
from browser_use.browser.context import BrowserContextWindowSize
from browser_use.browser.browser import BrowserConfig
from playwright.async_api import async_playwright
from src.agent.custom_agent import CustomAgent
from src.agent.custom_prompts import CustomSystemPrompt
from src.browser.custom_browser import CustomBrowser
from src.browser.custom_context import BrowserContextConfig
from src.controller.custom_controller import CustomController
window_w, window_h = 1920, 1080
# llm = utils.get_llm_model(
# provider="azure_openai",
# model_name="gpt-4o",
# temperature=0.8,
# base_url=os.getenv("AZURE_OPENAI_ENDPOINT", ""),
# api_key=os.getenv("AZURE_OPENAI_API_KEY", ""),
# )
# llm = utils.get_llm_model( # llm = utils.get_llm_model(
# provider="gemini", # provider="gemini",
# model_name="gemini-2.0-flash-exp", # model_name="gemini-2.0-flash-exp",
@@ -247,31 +133,45 @@ async def test_browser_use_custom_v2():
# api_key=os.getenv("GOOGLE_API_KEY", "") # api_key=os.getenv("GOOGLE_API_KEY", "")
# ) # )
llm = utils.get_llm_model( # llm = utils.get_llm_model(
provider="deepseek", # provider="deepseek",
model_name="deepseek-reasoner", # model_name="deepseek-reasoner",
temperature=0.8 # temperature=0.8
) # )
# llm = utils.get_llm_model(
# provider="deepseek",
# model_name="deepseek-chat",
# temperature=0.8
# )
# llm = utils.get_llm_model( # llm = utils.get_llm_model(
# provider="ollama", model_name="qwen2.5:7b", temperature=0.5 # provider="ollama", model_name="qwen2.5:7b", temperature=0.5
# ) # )
# llm = utils.get_llm_model(
# provider="ollama", model_name="deepseek-r1:14b", temperature=0.5
# )
controller = CustomController() controller = CustomController()
use_own_browser = False use_own_browser = True
disable_security = True disable_security = True
use_vision = False # Set to False when using DeepSeek use_vision = False # Set to False when using DeepSeek
tool_call_in_content = True # Set to True when using Ollama
max_actions_per_step = 1 max_actions_per_step = 1
playwright = None playwright = None
browser = None browser = None
browser_context = None browser_context = None
try: try:
extra_chromium_args = [f"--window-size={window_w},{window_h}"]
if use_own_browser: if use_own_browser:
chrome_path = os.getenv("CHROME_PATH", None) chrome_path = os.getenv("CHROME_PATH", None)
if chrome_path == "": if chrome_path == "":
chrome_path = None chrome_path = None
chrome_user_data = os.getenv("CHROME_USER_DATA", None)
if chrome_user_data:
extra_chromium_args += [f"--user-data-dir={chrome_user_data}"]
else: else:
chrome_path = None chrome_path = None
browser = CustomBrowser( browser = CustomBrowser(
@@ -279,7 +179,7 @@ async def test_browser_use_custom_v2():
headless=False, headless=False,
disable_security=disable_security, disable_security=disable_security,
chrome_instance_path=chrome_path, chrome_instance_path=chrome_path,
extra_chromium_args=[f"--window-size={window_w},{window_h}"], extra_chromium_args=extra_chromium_args,
) )
) )
browser_context = await browser.new_context( browser_context = await browser.new_context(
@@ -293,18 +193,18 @@ async def test_browser_use_custom_v2():
) )
) )
agent = CustomAgent( agent = CustomAgent(
task="go to google.com and type 'OpenAI' click search and give me the first url", task="Search 'Nvidia' and give me the first url",
add_infos="", # some hints for llm to complete the task add_infos="", # some hints for llm to complete the task
llm=llm, llm=llm,
browser=browser, browser=browser,
browser_context=browser_context, browser_context=browser_context,
controller=controller, controller=controller,
system_prompt_class=CustomSystemPrompt, system_prompt_class=CustomSystemPrompt,
agent_prompt_class=CustomAgentMessagePrompt,
use_vision=use_vision, use_vision=use_vision,
tool_call_in_content=tool_call_in_content,
max_actions_per_step=max_actions_per_step max_actions_per_step=max_actions_per_step
) )
history: AgentHistoryList = await agent.run(max_steps=10) history: AgentHistoryList = await agent.run(max_steps=100)
print("Final Result:") print("Final Result:")
pprint(history.final_result(), indent=4) pprint(history.final_result(), indent=4)
@@ -336,5 +236,4 @@ async def test_browser_use_custom_v2():
if __name__ == "__main__": if __name__ == "__main__":
# asyncio.run(test_browser_use_org()) # asyncio.run(test_browser_use_org())
# asyncio.run(test_browser_use_custom()) asyncio.run(test_browser_use_custom())
asyncio.run(test_browser_use_custom_v2())

View File

@@ -1,13 +1,10 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/1
# @Author : wenshao
# @Email : wenshaoguo1026@gmail.com
# @Project : browser-use-webui
# @FileName: test_llm_api.py
import os import os
import pdb import pdb
from dataclasses import dataclass
from dotenv import load_dotenv from dotenv import load_dotenv
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_ollama import ChatOllama
load_dotenv() load_dotenv()
@@ -15,145 +12,121 @@ import sys
sys.path.append(".") sys.path.append(".")
@dataclass
class LLMConfig:
provider: str
model_name: str
temperature: float = 0.8
base_url: str = None
api_key: str = None
def create_message_content(text, image_path=None):
content = [{"type": "text", "text": text}]
if image_path:
from src.utils import utils
image_data = utils.encode_image(image_path)
content.append({
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"}
})
return content
def get_env_value(key, provider):
env_mappings = {
"openai": {"api_key": "OPENAI_API_KEY", "base_url": "OPENAI_ENDPOINT"},
"azure_openai": {"api_key": "AZURE_OPENAI_API_KEY", "base_url": "AZURE_OPENAI_ENDPOINT"},
"gemini": {"api_key": "GOOGLE_API_KEY"},
"deepseek": {"api_key": "DEEPSEEK_API_KEY", "base_url": "DEEPSEEK_ENDPOINT"},
"mistral": {"api_key": "MISTRAL_API_KEY", "base_url": "MISTRAL_ENDPOINT"},
}
if provider in env_mappings and key in env_mappings[provider]:
return os.getenv(env_mappings[provider][key], "")
return ""
def test_llm(config, query, image_path=None, system_message=None):
from src.utils import utils
# Special handling for Ollama-based models
if config.provider == "ollama":
if "deepseek-r1" in config.model_name:
from src.utils.llm import DeepSeekR1ChatOllama
llm = DeepSeekR1ChatOllama(model=config.model_name)
else:
llm = ChatOllama(model=config.model_name)
ai_msg = llm.invoke(query)
print(ai_msg.content)
if "deepseek-r1" in config.model_name:
pdb.set_trace()
return
# For other providers, use the standard configuration
llm = utils.get_llm_model(
provider=config.provider,
model_name=config.model_name,
temperature=config.temperature,
base_url=config.base_url or get_env_value("base_url", config.provider),
api_key=config.api_key or get_env_value("api_key", config.provider)
)
# Prepare messages for non-Ollama models
messages = []
if system_message:
messages.append(SystemMessage(content=create_message_content(system_message)))
messages.append(HumanMessage(content=create_message_content(query, image_path)))
ai_msg = llm.invoke(messages)
# Handle different response types
if hasattr(ai_msg, "reasoning_content"):
print(ai_msg.reasoning_content)
print(ai_msg.content)
if config.provider == "deepseek" and "deepseek-reasoner" in config.model_name:
print(llm.model_name)
pdb.set_trace()
def test_openai_model(): def test_openai_model():
from langchain_core.messages import HumanMessage config = LLMConfig(provider="openai", model_name="gpt-4o")
from src.utils import utils test_llm(config, "Describe this image", "assets/examples/test.png")
llm = utils.get_llm_model(
provider="openai",
model_name="gpt-4o",
temperature=0.8,
base_url=os.getenv("OPENAI_ENDPOINT", ""),
api_key=os.getenv("OPENAI_API_KEY", "")
)
image_path = "assets/examples/test.png"
image_data = utils.encode_image(image_path)
message = HumanMessage(
content=[
{"type": "text", "text": "describe this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
]
)
ai_msg = llm.invoke([message])
print(ai_msg.content)
def test_gemini_model(): def test_gemini_model():
# you need to enable your api key first: https://ai.google.dev/palm_docs/oauth_quickstart # Enable your API key first if you haven't: https://ai.google.dev/palm_docs/oauth_quickstart
from langchain_core.messages import HumanMessage config = LLMConfig(provider="gemini", model_name="gemini-2.0-flash-exp")
from src.utils import utils test_llm(config, "Describe this image", "assets/examples/test.png")
llm = utils.get_llm_model(
provider="gemini",
model_name="gemini-2.0-flash-exp",
temperature=0.8,
api_key=os.getenv("GOOGLE_API_KEY", "")
)
image_path = "assets/examples/test.png"
image_data = utils.encode_image(image_path)
message = HumanMessage(
content=[
{"type": "text", "text": "describe this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
]
)
ai_msg = llm.invoke([message])
print(ai_msg.content)
def test_azure_openai_model(): def test_azure_openai_model():
from langchain_core.messages import HumanMessage config = LLMConfig(provider="azure_openai", model_name="gpt-4o")
from src.utils import utils test_llm(config, "Describe this image", "assets/examples/test.png")
llm = utils.get_llm_model(
provider="azure_openai",
model_name="gpt-4o",
temperature=0.8,
base_url=os.getenv("AZURE_OPENAI_ENDPOINT", ""),
api_key=os.getenv("AZURE_OPENAI_API_KEY", "")
)
image_path = "assets/examples/test.png"
image_data = utils.encode_image(image_path)
message = HumanMessage(
content=[
{"type": "text", "text": "describe this image"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
},
]
)
ai_msg = llm.invoke([message])
print(ai_msg.content)
def test_deepseek_model(): def test_deepseek_model():
from langchain_core.messages import HumanMessage config = LLMConfig(provider="deepseek", model_name="deepseek-chat")
from src.utils import utils test_llm(config, "Who are you?")
llm = utils.get_llm_model(
provider="deepseek",
model_name="deepseek-chat",
temperature=0.8,
base_url=os.getenv("DEEPSEEK_ENDPOINT", ""),
api_key=os.getenv("DEEPSEEK_API_KEY", "")
)
message = HumanMessage(
content=[
{"type": "text", "text": "who are you?"}
]
)
ai_msg = llm.invoke([message])
print(ai_msg.content)
def test_deepseek_r1_model(): def test_deepseek_r1_model():
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage config = LLMConfig(provider="deepseek", model_name="deepseek-reasoner")
from src.utils import utils test_llm(config, "Which is greater, 9.11 or 9.8?", system_message="You are a helpful AI assistant.")
llm = utils.get_llm_model(
provider="deepseek",
model_name="deepseek-reasoner",
temperature=0.8,
base_url=os.getenv("DEEPSEEK_ENDPOINT", ""),
api_key=os.getenv("DEEPSEEK_API_KEY", "")
)
messages = []
sys_message = SystemMessage(
content=[{"type": "text", "text": "you are a helpful AI assistant"}]
)
messages.append(sys_message)
user_message = HumanMessage(
content=[
{"type": "text", "text": "9.11 and 9.8, which is greater?"}
]
)
messages.append(user_message)
ai_msg = llm.invoke(messages)
print(ai_msg.reasoning_content)
print(ai_msg.content)
print(llm.model_name)
pdb.set_trace()
def test_ollama_model(): def test_ollama_model():
from langchain_ollama import ChatOllama config = LLMConfig(provider="ollama", model_name="qwen2.5:7b")
test_llm(config, "Sing a ballad of LangChain.")
llm = ChatOllama(model="qwen2.5:7b") def test_deepseek_r1_ollama_model():
ai_msg = llm.invoke("Sing a ballad of LangChain.") config = LLMConfig(provider="ollama", model_name="deepseek-r1:14b")
print(ai_msg.content) test_llm(config, "How many 'r's are in the word 'strawberry'?")
def test_mistral_model():
config = LLMConfig(provider="mistral", model_name="pixtral-large-latest")
test_llm(config, "Describe this image", "assets/examples/test.png")
if __name__ == '__main__': if __name__ == "__main__":
# test_openai_model() # test_openai_model()
# test_gemini_model() # test_gemini_model()
# test_azure_openai_model() # test_azure_openai_model()
# test_deepseek_model() #test_deepseek_model()
# test_ollama_model() # test_ollama_model()
test_deepseek_r1_model() # test_deepseek_r1_model()
# test_deepseek_r1_ollama_model()
test_mistral_model()

View File

@@ -1,9 +1,3 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/2
# @Author : wenshao
# @Email : wenshaoguo1026@gmail.com
# @Project : browser-use-webui
# @FileName: test_playwright.py
import pdb import pdb
from dotenv import load_dotenv from dotenv import load_dotenv

139
webui.py
View File

@@ -1,10 +1,3 @@
# -*- coding: utf-8 -*-
# @Time : 2025/1/1
# @Author : wenshao
# @Email : wenshaoguo1026@gmail.com
# @Project : browser-use-webui
# @FileName: webui.py
import pdb import pdb
import logging import logging
@@ -28,22 +21,20 @@ from browser_use.browser.context import (
BrowserContextConfig, BrowserContextConfig,
BrowserContextWindowSize, BrowserContextWindowSize,
) )
from langchain_ollama import ChatOllama
from playwright.async_api import async_playwright from playwright.async_api import async_playwright
from src.utils.agent_state import AgentState from src.utils.agent_state import AgentState
from src.utils import utils from src.utils import utils
from src.agent.custom_agent import CustomAgent from src.agent.custom_agent import CustomAgent
from src.browser.custom_browser import CustomBrowser from src.browser.custom_browser import CustomBrowser
from src.agent.custom_prompts import CustomSystemPrompt from src.agent.custom_prompts import CustomSystemPrompt, CustomAgentMessagePrompt
from src.browser.config import BrowserPersistenceConfig
from src.browser.custom_context import BrowserContextConfig, CustomBrowserContext from src.browser.custom_context import BrowserContextConfig, CustomBrowserContext
from src.controller.custom_controller import CustomController from src.controller.custom_controller import CustomController
from gradio.themes import Citrus, Default, Glass, Monochrome, Ocean, Origin, Soft, Base from gradio.themes import Citrus, Default, Glass, Monochrome, Ocean, Origin, Soft, Base
from src.utils.default_config_settings import default_config, load_config_from_file, save_config_to_file, save_current_config, update_ui_from_config from src.utils.default_config_settings import default_config, load_config_from_file, save_config_to_file, save_current_config, update_ui_from_config
from src.utils.utils import update_model_dropdown, get_latest_files, capture_screenshot from src.utils.utils import update_model_dropdown, get_latest_files, capture_screenshot
from dotenv import load_dotenv
load_dotenv()
# Global variables for persistence # Global variables for persistence
_global_browser = None _global_browser = None
@@ -101,7 +92,7 @@ async def run_browser_agent(
max_steps, max_steps,
use_vision, use_vision,
max_actions_per_step, max_actions_per_step,
tool_call_in_content tool_calling_method
): ):
global _global_agent_state global _global_agent_state
_global_agent_state.clear_stop() # Clear any previous stop requests _global_agent_state.clear_stop() # Clear any previous stop requests
@@ -147,7 +138,7 @@ async def run_browser_agent(
max_steps=max_steps, max_steps=max_steps,
use_vision=use_vision, use_vision=use_vision,
max_actions_per_step=max_actions_per_step, max_actions_per_step=max_actions_per_step,
tool_call_in_content=tool_call_in_content tool_calling_method=tool_calling_method
) )
elif agent_type == "custom": elif agent_type == "custom":
final_result, errors, model_actions, model_thoughts, trace_file, history_file = await run_custom_agent( final_result, errors, model_actions, model_thoughts, trace_file, history_file = await run_custom_agent(
@@ -166,7 +157,7 @@ async def run_browser_agent(
max_steps=max_steps, max_steps=max_steps,
use_vision=use_vision, use_vision=use_vision,
max_actions_per_step=max_actions_per_step, max_actions_per_step=max_actions_per_step,
tool_call_in_content=tool_call_in_content tool_calling_method=tool_calling_method
) )
else: else:
raise ValueError(f"Invalid agent type: {agent_type}") raise ValueError(f"Invalid agent type: {agent_type}")
@@ -193,6 +184,9 @@ async def run_browser_agent(
gr.update(interactive=True) # Re-enable run button gr.update(interactive=True) # Re-enable run button
) )
except gr.Error:
raise
except Exception as e: except Exception as e:
import traceback import traceback
traceback.print_exc() traceback.print_exc()
@@ -225,7 +219,7 @@ async def run_org_agent(
max_steps, max_steps,
use_vision, use_vision,
max_actions_per_step, max_actions_per_step,
tool_call_in_content tool_calling_method
): ):
try: try:
global _global_browser, _global_browser_context, _global_agent_state global _global_browser, _global_browser_context, _global_agent_state
@@ -233,20 +227,24 @@ async def run_org_agent(
# Clear any previous stop request # Clear any previous stop request
_global_agent_state.clear_stop() _global_agent_state.clear_stop()
extra_chromium_args = [f"--window-size={window_w},{window_h}"]
if use_own_browser: if use_own_browser:
chrome_path = os.getenv("CHROME_PATH", None) chrome_path = os.getenv("CHROME_PATH", None)
if chrome_path == "": if chrome_path == "":
chrome_path = None chrome_path = None
chrome_user_data = os.getenv("CHROME_USER_DATA", None)
if chrome_user_data:
extra_chromium_args += [f"--user-data-dir={chrome_user_data}"]
else: else:
chrome_path = None chrome_path = None
if _global_browser is None: if _global_browser is None:
_global_browser = Browser( _global_browser = Browser(
config=BrowserConfig( config=BrowserConfig(
headless=headless, headless=headless,
disable_security=disable_security, disable_security=disable_security,
chrome_instance_path=chrome_path, chrome_instance_path=chrome_path,
extra_chromium_args=[f"--window-size={window_w},{window_h}"], extra_chromium_args=extra_chromium_args,
) )
) )
@@ -261,7 +259,7 @@ async def run_org_agent(
), ),
) )
) )
agent = Agent( agent = Agent(
task=task, task=task,
llm=llm, llm=llm,
@@ -269,7 +267,7 @@ async def run_org_agent(
browser=_global_browser, browser=_global_browser,
browser_context=_global_browser_context, browser_context=_global_browser_context,
max_actions_per_step=max_actions_per_step, max_actions_per_step=max_actions_per_step,
tool_call_in_content=tool_call_in_content tool_calling_method=tool_calling_method
) )
history = await agent.run(max_steps=max_steps) history = await agent.run(max_steps=max_steps)
@@ -316,7 +314,7 @@ async def run_custom_agent(
max_steps, max_steps,
use_vision, use_vision,
max_actions_per_step, max_actions_per_step,
tool_call_in_content tool_calling_method
): ):
try: try:
global _global_browser, _global_browser_context, _global_agent_state global _global_browser, _global_browser_context, _global_agent_state
@@ -324,10 +322,14 @@ async def run_custom_agent(
# Clear any previous stop request # Clear any previous stop request
_global_agent_state.clear_stop() _global_agent_state.clear_stop()
extra_chromium_args = [f"--window-size={window_w},{window_h}"]
if use_own_browser: if use_own_browser:
chrome_path = os.getenv("CHROME_PATH", None) chrome_path = os.getenv("CHROME_PATH", None)
if chrome_path == "": if chrome_path == "":
chrome_path = None chrome_path = None
chrome_user_data = os.getenv("CHROME_USER_DATA", None)
if chrome_user_data:
extra_chromium_args += [f"--user-data-dir={chrome_user_data}"]
else: else:
chrome_path = None chrome_path = None
@@ -340,7 +342,7 @@ async def run_custom_agent(
headless=headless, headless=headless,
disable_security=disable_security, disable_security=disable_security,
chrome_instance_path=chrome_path, chrome_instance_path=chrome_path,
extra_chromium_args=[f"--window-size={window_w},{window_h}"], extra_chromium_args=extra_chromium_args,
) )
) )
@@ -355,7 +357,7 @@ async def run_custom_agent(
), ),
) )
) )
# Create and run agent # Create and run agent
agent = CustomAgent( agent = CustomAgent(
task=task, task=task,
@@ -366,9 +368,10 @@ async def run_custom_agent(
browser_context=_global_browser_context, browser_context=_global_browser_context,
controller=controller, controller=controller,
system_prompt_class=CustomSystemPrompt, system_prompt_class=CustomSystemPrompt,
agent_prompt_class=CustomAgentMessagePrompt,
max_actions_per_step=max_actions_per_step, max_actions_per_step=max_actions_per_step,
tool_call_in_content=tool_call_in_content, agent_state=_global_agent_state,
agent_state=_global_agent_state tool_calling_method=tool_calling_method
) )
history = await agent.run(max_steps=max_steps) history = await agent.run(max_steps=max_steps)
@@ -421,7 +424,7 @@ async def run_with_stream(
max_steps, max_steps,
use_vision, use_vision,
max_actions_per_step, max_actions_per_step,
tool_call_in_content tool_calling_method
): ):
global _global_agent_state global _global_agent_state
stream_vw = 80 stream_vw = 80
@@ -449,7 +452,7 @@ async def run_with_stream(
max_steps=max_steps, max_steps=max_steps,
use_vision=use_vision, use_vision=use_vision,
max_actions_per_step=max_actions_per_step, max_actions_per_step=max_actions_per_step,
tool_call_in_content=tool_call_in_content tool_calling_method=tool_calling_method
) )
# Add HTML content at the start of the result array # Add HTML content at the start of the result array
html_content = f"<h1 style='width:{stream_vw}vw; height:{stream_vh}vh'>Using browser...</h1>" html_content = f"<h1 style='width:{stream_vw}vw; height:{stream_vh}vh'>Using browser...</h1>"
@@ -481,7 +484,7 @@ async def run_with_stream(
max_steps=max_steps, max_steps=max_steps,
use_vision=use_vision, use_vision=use_vision,
max_actions_per_step=max_actions_per_step, max_actions_per_step=max_actions_per_step,
tool_call_in_content=tool_call_in_content tool_calling_method=tool_calling_method
) )
) )
@@ -535,6 +538,12 @@ async def run_with_stream(
try: try:
result = await agent_task result = await agent_task
final_result, errors, model_actions, model_thoughts, latest_videos, trace, history_file, stop_button, run_button = result final_result, errors, model_actions, model_thoughts, latest_videos, trace, history_file, stop_button, run_button = result
except gr.Error:
final_result = ""
model_actions = ""
model_thoughts = ""
latest_videos = trace = history_file = None
except Exception as e: except Exception as e:
errors = f"Agent error: {str(e)}" errors = f"Agent error: {str(e)}"
@@ -607,18 +616,8 @@ def create_ui(config, theme_name="Ocean"):
} }
""" """
js = """
function refresh() {
const url = new URL(window.location);
if (url.searchParams.get('__theme') !== 'dark') {
url.searchParams.set('__theme', 'dark');
window.location.href = url.href;
}
}
"""
with gr.Blocks( with gr.Blocks(
title="Browser Use WebUI", theme=theme_map[theme_name], css=css, js=js title="Browser Use WebUI", theme=theme_map[theme_name], css=css
) as demo: ) as demo:
with gr.Row(): with gr.Row():
gr.Markdown( gr.Markdown(
@@ -638,32 +637,38 @@ def create_ui(config, theme_name="Ocean"):
value=config['agent_type'], value=config['agent_type'],
info="Select the type of agent to use", info="Select the type of agent to use",
) )
max_steps = gr.Slider( with gr.Column():
minimum=1, max_steps = gr.Slider(
maximum=200, minimum=1,
value=config['max_steps'], maximum=200,
step=1, value=config['max_steps'],
label="Max Run Steps", step=1,
info="Maximum number of steps the agent will take", label="Max Run Steps",
) info="Maximum number of steps the agent will take",
max_actions_per_step = gr.Slider( )
minimum=1, max_actions_per_step = gr.Slider(
maximum=20, minimum=1,
value=config['max_actions_per_step'], maximum=20,
step=1, value=config['max_actions_per_step'],
label="Max Actions per Step", step=1,
info="Maximum number of actions the agent will take per step", label="Max Actions per Step",
) info="Maximum number of actions the agent will take per step",
use_vision = gr.Checkbox( )
label="Use Vision", with gr.Column():
value=config['use_vision'], use_vision = gr.Checkbox(
info="Enable visual processing capabilities", label="Use Vision",
) value=config['use_vision'],
tool_call_in_content = gr.Checkbox( info="Enable visual processing capabilities",
label="Use Tool Calls in Content", )
value=config['tool_call_in_content'], tool_calling_method = gr.Dropdown(
info="Enable Tool Calls in content", label="Tool Calling Method",
) value=config['tool_calling_method'],
interactive=True,
allow_custom_value=True, # Allow users to input custom model names
choices=["auto", "json_schema", "function_calling"],
info="Tool Calls Funtion Name",
visible=False
)
with gr.TabItem("🔧 LLM Configuration", id=2): with gr.TabItem("🔧 LLM Configuration", id=2):
with gr.Group(): with gr.Group():
@@ -813,7 +818,7 @@ def create_ui(config, theme_name="Ocean"):
fn=update_ui_from_config, fn=update_ui_from_config,
inputs=[config_file_input], inputs=[config_file_input],
outputs=[ outputs=[
agent_type, max_steps, max_actions_per_step, use_vision, tool_call_in_content, agent_type, max_steps, max_actions_per_step, use_vision, tool_calling_method,
llm_provider, llm_model_name, llm_temperature, llm_base_url, llm_api_key, llm_provider, llm_model_name, llm_temperature, llm_base_url, llm_api_key,
use_own_browser, keep_browser_open, headless, disable_security, enable_recording, use_own_browser, keep_browser_open, headless, disable_security, enable_recording,
window_w, window_h, save_recording_path, save_trace_path, save_agent_history_path, window_w, window_h, save_recording_path, save_trace_path, save_agent_history_path,
@@ -824,7 +829,7 @@ def create_ui(config, theme_name="Ocean"):
save_config_button.click( save_config_button.click(
fn=save_current_config, fn=save_current_config,
inputs=[ inputs=[
agent_type, max_steps, max_actions_per_step, use_vision, tool_call_in_content, agent_type, max_steps, max_actions_per_step, use_vision, tool_calling_method,
llm_provider, llm_model_name, llm_temperature, llm_base_url, llm_api_key, llm_provider, llm_model_name, llm_temperature, llm_base_url, llm_api_key,
use_own_browser, keep_browser_open, headless, disable_security, use_own_browser, keep_browser_open, headless, disable_security,
enable_recording, window_w, window_h, save_recording_path, save_trace_path, enable_recording, window_w, window_h, save_recording_path, save_trace_path,
@@ -876,7 +881,7 @@ def create_ui(config, theme_name="Ocean"):
agent_type, llm_provider, llm_model_name, llm_temperature, llm_base_url, llm_api_key, agent_type, llm_provider, llm_model_name, llm_temperature, llm_base_url, llm_api_key,
use_own_browser, keep_browser_open, headless, disable_security, window_w, window_h, use_own_browser, keep_browser_open, headless, disable_security, window_w, window_h,
save_recording_path, save_agent_history_path, save_trace_path, # Include the new path save_recording_path, save_agent_history_path, save_trace_path, # Include the new path
enable_recording, task, add_infos, max_steps, use_vision, max_actions_per_step, tool_call_in_content enable_recording, task, add_infos, max_steps, use_vision, max_actions_per_step, tool_calling_method
], ],
outputs=[ outputs=[
browser_view, # Browser view browser_view, # Browser view