chore: turn vision into capability (#679)

Fixes https://github.com/microsoft/playwright-mcp/issues/420
This commit is contained in:
Pavel Feldman
2025-07-16 16:40:00 -07:00
committed by GitHub
parent 012c906500
commit d61aa16fee
23 changed files with 366 additions and 575 deletions

351
README.md
View File

@@ -193,9 +193,8 @@ Playwright MCP server supports following arguments. They can be provided in the
--browser <browser> browser or chrome channel to use, possible
values: chrome, firefox, webkit, msedge.
--browser-agent <endpoint> Use browser agent (experimental).
--caps <caps> comma-separated list of capabilities to enable,
possible values: tabs, pdf, history, wait, files,
install. Default is all.
--caps <caps> comma-separated list of additional capabilities
to enable, possible values: vision, pdf.
--cdp-endpoint <endpoint> CDP endpoint to connect to.
--config <path> path to the configuration file.
--device <device> device to emulate, for example: "iPhone 15"
@@ -227,8 +226,6 @@ Playwright MCP server supports following arguments. They can be provided in the
specified, a temporary directory will be created.
--viewport-size <size> specify browser viewport size in pixels, for
example "1280, 720"
--vision Run server that uses screenshots (Aria snapshots
are used by default)
```
<!--- End of options generated section -->
@@ -329,21 +326,14 @@ npx @playwright/mcp@latest --config path/to/config.json
host?: string; // Host to bind to (default: localhost)
},
// List of enabled capabilities
// List of additional capabilities
capabilities?: Array<
'core' | // Core browser automation
'tabs' | // Tab management
'pdf' | // PDF generation
'history' | // Browser history
'wait' | // Wait utilities
'files' | // File handling
'install' | // Browser installation
'testing' // Testing
'pdf' | // PDF generation
'vision' | // Coordinate-based interactions
>;
// Enable vision mode (screenshots instead of accessibility snapshots)
vision?: boolean;
// Directory for output files
outputDir?: string;
@@ -433,42 +423,10 @@ http.createServer(async (req, res) => {
### Tools
The tools are available in two modes:
1. **Snapshot Mode** (default): Uses accessibility snapshots for better performance and reliability
2. **Vision Mode**: Uses screenshots for visual-based interactions
To use Vision Mode, add the `--vision` flag when starting the server:
```js
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": [
"@playwright/mcp@latest",
"--vision"
]
}
}
}
```
Vision Mode works best with the computer use models that are able to interact with elements using
X Y coordinate space, based on the provided screenshot.
<!--- Tools generated by update-readme.js -->
<details>
<summary><b>Interactions</b></summary>
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_snapshot**
- Title: Page snapshot
- Description: Capture accessibility snapshot of the current page, this is better than screenshot
- Parameters: None
- Read-only: **true**
<summary><b>Core automation</b></summary>
<!-- NOTE: This has been generated via update-readme.js -->
@@ -483,6 +441,22 @@ X Y coordinate space, based on the provided screenshot.
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_close**
- Title: Close browser
- Description: Close the page
- Parameters: None
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_console_messages**
- Title: Get console messages
- Description: Returns all console messages
- Parameters: None
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_drag**
- Title: Drag mouse
- Description: Perform drag and drop between two elements
@@ -495,60 +469,17 @@ X Y coordinate space, based on the provided screenshot.
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_hover**
- Title: Hover mouse
- Description: Hover over element on page
- **browser_evaluate**
- Title: Evaluate JavaScript
- Description: Evaluate JavaScript expression on page or element
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_type**
- Title: Type text
- Description: Type text into editable element
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- `text` (string): Text to type into the element
- `submit` (boolean, optional): Whether to submit entered text (press Enter after)
- `slowly` (boolean, optional): Whether to type one character at a time. Useful for triggering key handlers in the page. By default entire text is filled in at once.
- `function` (string): () => { /* code */ } or (element) => { /* code */ } when element is provided
- `element` (string, optional): Human-readable element description used to obtain permission to interact with the element
- `ref` (string, optional): Exact target element reference from the page snapshot
- Read-only: **false**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_select_option**
- Title: Select option
- Description: Select an option in a dropdown
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- `values` (array): Array of values to select in the dropdown. This can be a single value or multiple values.
- Read-only: **false**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_press_key**
- Title: Press a key
- Description: Press a key on the keyboard
- Parameters:
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
- Read-only: **false**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_wait_for**
- Title: Wait for
- Description: Wait for text to appear or disappear or a specified time to pass
- Parameters:
- `time` (number, optional): The time to wait in seconds
- `text` (string, optional): The text to wait for
- `textGone` (string, optional): The text to wait for to disappear
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_file_upload**
- Title: Upload files
- Description: Upload one or multiple files
@@ -566,10 +497,15 @@ X Y coordinate space, based on the provided screenshot.
- `promptText` (string, optional): The text of the prompt in case of a prompt dialog.
- Read-only: **false**
</details>
<!-- NOTE: This has been generated via update-readme.js -->
<details>
<summary><b>Navigation</b></summary>
- **browser_hover**
- Title: Hover mouse
- Description: Hover over element on page
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
@@ -596,26 +532,51 @@ X Y coordinate space, based on the provided screenshot.
- Parameters: None
- Read-only: **true**
</details>
<!-- NOTE: This has been generated via update-readme.js -->
<details>
<summary><b>Evaluation</b></summary>
- **browser_network_requests**
- Title: List network requests
- Description: Returns all network requests since loading the page
- Parameters: None
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_evaluate**
- Title: Evaluate JavaScript
- Description: Evaluate JavaScript expression on page or element
- **browser_press_key**
- Title: Press a key
- Description: Press a key on the keyboard
- Parameters:
- `function` (string): () => { /* code */ } or (element) => { /* code */ } when element is provided
- `element` (string, optional): Human-readable element description used to obtain permission to interact with the element
- `ref` (string, optional): Exact target element reference from the page snapshot
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
- Read-only: **false**
</details>
<!-- NOTE: This has been generated via update-readme.js -->
<details>
<summary><b>Resources</b></summary>
- **browser_resize**
- Title: Resize browser window
- Description: Resize the browser window
- Parameters:
- `width` (number): Width of the browser window
- `height` (number): Height of the browser window
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_select_option**
- Title: Select option
- Description: Select an option in a dropdown
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- `values` (array): Array of values to select in the dropdown. This can be a single value or multiple values.
- Read-only: **false**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_snapshot**
- Title: Page snapshot
- Description: Capture accessibility snapshot of the current page, this is better than screenshot
- Parameters: None
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
@@ -631,64 +592,41 @@ X Y coordinate space, based on the provided screenshot.
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_pdf_save**
- Title: Save as PDF
- Description: Save page as PDF
- **browser_type**
- Title: Type text
- Description: Type text into editable element
- Parameters:
- `filename` (string, optional): File name to save the pdf to. Defaults to `page-{timestamp}.pdf` if not specified.
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_network_requests**
- Title: List network requests
- Description: Returns all network requests since loading the page
- Parameters: None
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_console_messages**
- Title: Get console messages
- Description: Returns all console messages
- Parameters: None
- Read-only: **true**
</details>
<details>
<summary><b>Utilities</b></summary>
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_install**
- Title: Install the browser specified in the config
- Description: Install the browser specified in the config. Call this if you get an error about the browser not being installed.
- Parameters: None
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `ref` (string): Exact target element reference from the page snapshot
- `text` (string): Text to type into the element
- `submit` (boolean, optional): Whether to submit entered text (press Enter after)
- `slowly` (boolean, optional): Whether to type one character at a time. Useful for triggering key handlers in the page. By default entire text is filled in at once.
- Read-only: **false**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_close**
- Title: Close browser
- Description: Close the page
- Parameters: None
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_resize**
- Title: Resize browser window
- Description: Resize the browser window
- **browser_wait_for**
- Title: Wait for
- Description: Wait for text to appear or disappear or a specified time to pass
- Parameters:
- `width` (number): Width of the browser window
- `height` (number): Height of the browser window
- `time` (number, optional): The time to wait in seconds
- `text` (string, optional): The text to wait for
- `textGone` (string, optional): The text to wait for to disappear
- Read-only: **true**
</details>
<details>
<summary><b>Tabs</b></summary>
<summary><b>Tab management</b></summary>
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_tab_close**
- Title: Close a tab
- Description: Close a tab
- Parameters:
- `index` (number, optional): The index of the tab to close. Closes current tab if not provided.
- Read-only: **false**
<!-- NOTE: This has been generated via update-readme.js -->
@@ -716,44 +654,29 @@ X Y coordinate space, based on the provided screenshot.
- `index` (number): The index of the tab to select
- Read-only: **true**
</details>
<details>
<summary><b>Browser installation</b></summary>
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_tab_close**
- Title: Close a tab
- Description: Close a tab
- Parameters:
- `index` (number, optional): The index of the tab to close. Closes current tab if not provided.
- **browser_install**
- Title: Install the browser specified in the config
- Description: Install the browser specified in the config. Call this if you get an error about the browser not being installed.
- Parameters: None
- Read-only: **false**
</details>
<details>
<summary><b>Vision mode</b></summary>
<summary><b>Coordinate-based (opt-in via --caps=vision)</b></summary>
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_screen_capture**
- Title: Take a screenshot
- Description: Take a screenshot of the current page
- Parameters: None
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_screen_move_mouse**
- Title: Move mouse
- Description: Move mouse to a given position
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `x` (number): X coordinate
- `y` (number): Y coordinate
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_screen_click**
- **browser_mouse_click_xy**
- Title: Click
- Description: Click left mouse button
- Description: Click left mouse button at a given position
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `x` (number): X coordinate
@@ -762,9 +685,9 @@ X Y coordinate space, based on the provided screenshot.
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_screen_drag**
- **browser_mouse_drag_xy**
- Title: Drag mouse
- Description: Drag left mouse button
- Description: Drag left mouse button to a given position
- Parameters:
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `startX` (number): Start X coordinate
@@ -775,52 +698,28 @@ X Y coordinate space, based on the provided screenshot.
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_screen_type**
- Title: Type text
- Description: Type text
- **browser_mouse_move_xy**
- Title: Move mouse
- Description: Move mouse to a given position
- Parameters:
- `text` (string): Text to type into the element
- `submit` (boolean, optional): Whether to submit entered text (press Enter after)
- Read-only: **false**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_press_key**
- Title: Press a key
- Description: Press a key on the keyboard
- Parameters:
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
- Read-only: **false**
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_wait_for**
- Title: Wait for
- Description: Wait for text to appear or disappear or a specified time to pass
- Parameters:
- `time` (number, optional): The time to wait in seconds
- `text` (string, optional): The text to wait for
- `textGone` (string, optional): The text to wait for to disappear
- `element` (string): Human-readable element description used to obtain permission to interact with the element
- `x` (number): X coordinate
- `y` (number): Y coordinate
- Read-only: **true**
<!-- NOTE: This has been generated via update-readme.js -->
</details>
- **browser_file_upload**
- Title: Upload files
- Description: Upload one or multiple files
- Parameters:
- `paths` (array): The absolute paths to the files to upload. Can be a single file or multiple files.
- Read-only: **false**
<details>
<summary><b>PDF generation (opt-in via --caps=pdf)</b></summary>
<!-- NOTE: This has been generated via update-readme.js -->
- **browser_handle_dialog**
- Title: Handle a dialog
- Description: Handle a dialog
- **browser_pdf_save**
- Title: Save as PDF
- Description: Save page as PDF
- Parameters:
- `accept` (boolean): Whether to accept the dialog.
- `promptText` (string, optional): The text of the prompt in case of a prompt dialog.
- Read-only: **false**
- `filename` (string, optional): File name to save the pdf to. Defaults to `page-{timestamp}.pdf` if not specified.
- Read-only: **true**
</details>