Real browser docs

This commit is contained in:
magmueller
2025-01-25 12:05:51 -08:00
parent b437e1d8fe
commit 29f8e0b7a8
6 changed files with 69 additions and 25 deletions

View File

@@ -16,7 +16,7 @@ from langchain_openai import ChatOpenAI
agent = Agent(
task="Search for latest news about AI",
llm=ChatOpenAI(model="gpt-4"),
llm=ChatOpenAI(model="gpt-4o"),
)
```
@@ -33,7 +33,7 @@ Control how the agent operates:
agent = Agent(
task="your task",
llm=llm,
controller=custom_controller, # Custom function registry
controller=custom_controller, # For custom tool calling
use_vision=True, # Enable vision capabilities
save_conversation_path="logs/conversation.json" # Save chat logs
)

View File

@@ -10,13 +10,14 @@ Functions can be either `sync` or `async`. Keep them focused and single-purpose.
```python
from browser_use.controller.service import Controller
from browser_use import ActionResult
# Initialize the controller
controller = Controller()
@controller.action('Ask user for information')
def ask_human(question: str, display_question: bool) -> str:
return input(f'\n{question}\nInput: ')
def ask_human(question: str) -> str:
answer = input(f'\n{question}\nInput: ')
return ActionResult(extracted_content=answer)
```
<Note>
@@ -50,6 +51,7 @@ from browser_use.browser.service import Browser
async def open_website(url: str, browser: Browser):
page = browser.get_current_page()
await page.goto(url)
return ActionResult(extracted_content='Website opened')
```
## Structured Parameters with Pydantic
@@ -110,3 +112,8 @@ await agent2.run()
The controller is stateless and can be used to register multiple actions and
multiple agents.
</Note>
For more examples like file upload or notifications, visit [examples/custom-functions](https://github.com/browser-use/browser-use/tree/main/examples/custom-functions).

View File

@@ -0,0 +1,53 @@
---
title: "Connect to a Real Browser"
description: "With this you can connect to your real browser, where you are logged in with all your accounts."
icon: "message"
---
## Overview
You can connect the agent to your real Chrome browser instance, allowing it to access your existing browser profile with all your logged-in accounts and settings. This is particularly useful when you want the agent to interact with services where you're already authenticated.
<Note>
First make sure to close all running Chrome instances.
</Note>
## Basic Configuration
To connect to your real Chrome browser, you'll need to specify the path to your Chrome executable when creating the Browser instance:
```python
from browser_use import Agent, Browser, BrowserConfig
from langchain_openai import ChatOpenAI
import asyncio
# Configure the browser to connect to your Chrome instance
browser = Browser(
config=BrowserConfig(
# Specify the path to your Chrome executable
chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome', # macOS path
# For Windows, typically: 'C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
# For Linux, typically: '/usr/bin/google-chrome'
)
)
# Create the agent with your configured browser
agent = Agent(
task="Your task here",
llm=ChatOpenAI(model='gpt-4o'),
browser=browser,
)
async def main():
await agent.run()
input('Press Enter to close the browser...')
await browser.close()
if __name__ == '__main__':
asyncio.run(main())
```
<Note>
When using your real browser, the agent will have access to all your logged-in sessions. Make sure to review the task you're giving to the agent and ensure it aligns with your security requirements.
</Note>

View File

@@ -6,7 +6,7 @@ icon: "message"
## Overview
You can customize the system prompt by extending the `SystemPrompt` class. Internally, this adds extra instructions to the default system prompt (which is general and quite optimized at this point).
You can customize the system prompt by extending the `SystemPrompt` class. Internally, this adds extra instructions to the default system prompt.
<Note>
Custom system prompts allow you to modify the agent's behavior at a
@@ -36,17 +36,6 @@ class MySystemPrompt(SystemPrompt):
return f'{existing_rules}\n{new_rules}'
```
The `important_rules()` are written in format like this. Keeping the format
consistent helps with redundancy and readability.
```text
8. ACTION SEQUENCING:
- Actions are executed in the order they appear in the list
- ...
```
<Note>Try not to override other methods unless you have a specific need.</Note>
## Using Custom System Prompt
Apply your custom system prompt when creating an agent:

View File

@@ -50,6 +50,7 @@
"customize/agent-settings",
"customize/browser-settings",
"customize/system-prompt",
"customize/real-browser",
"customize/custom-functions"
]
},

View File

@@ -15,22 +15,16 @@ from browser_use.browser.context import BrowserContext
browser = Browser(
config=BrowserConfig(
headless=False,
# NOTE: you need to close your chrome browser - so that this can open your browser in debug mode
chrome_instance_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
)
)
controller = Controller()
async def main():
task = f'In docs.google.com write my Papa a quick thank you for everything letter \n - Magnus'
task += f' and save the document as pdf'
model = ChatOpenAI(model='gpt-4o')
agent = Agent(
task=task,
llm=model,
controller=controller,
task='In docs.google.com write my Papa a quick letter',
llm=ChatOpenAI(model='gpt-4o'),
browser=browser,
)