mirror of
https://github.com/microsoft/playwright-mcp.git
synced 2025-10-12 00:25:14 +03:00
chore: turn vision into capability (#679)
Fixes https://github.com/microsoft/playwright-mcp/issues/420
This commit is contained in:
351
README.md
351
README.md
@@ -193,9 +193,8 @@ Playwright MCP server supports following arguments. They can be provided in the
|
||||
--browser <browser> browser or chrome channel to use, possible
|
||||
values: chrome, firefox, webkit, msedge.
|
||||
--browser-agent <endpoint> Use browser agent (experimental).
|
||||
--caps <caps> comma-separated list of capabilities to enable,
|
||||
possible values: tabs, pdf, history, wait, files,
|
||||
install. Default is all.
|
||||
--caps <caps> comma-separated list of additional capabilities
|
||||
to enable, possible values: vision, pdf.
|
||||
--cdp-endpoint <endpoint> CDP endpoint to connect to.
|
||||
--config <path> path to the configuration file.
|
||||
--device <device> device to emulate, for example: "iPhone 15"
|
||||
@@ -227,8 +226,6 @@ Playwright MCP server supports following arguments. They can be provided in the
|
||||
specified, a temporary directory will be created.
|
||||
--viewport-size <size> specify browser viewport size in pixels, for
|
||||
example "1280, 720"
|
||||
--vision Run server that uses screenshots (Aria snapshots
|
||||
are used by default)
|
||||
```
|
||||
|
||||
<!--- End of options generated section -->
|
||||
@@ -329,21 +326,14 @@ npx @playwright/mcp@latest --config path/to/config.json
|
||||
host?: string; // Host to bind to (default: localhost)
|
||||
},
|
||||
|
||||
// List of enabled capabilities
|
||||
// List of additional capabilities
|
||||
capabilities?: Array<
|
||||
'core' | // Core browser automation
|
||||
'tabs' | // Tab management
|
||||
'pdf' | // PDF generation
|
||||
'history' | // Browser history
|
||||
'wait' | // Wait utilities
|
||||
'files' | // File handling
|
||||
'install' | // Browser installation
|
||||
'testing' // Testing
|
||||
'pdf' | // PDF generation
|
||||
'vision' | // Coordinate-based interactions
|
||||
>;
|
||||
|
||||
// Enable vision mode (screenshots instead of accessibility snapshots)
|
||||
vision?: boolean;
|
||||
|
||||
// Directory for output files
|
||||
outputDir?: string;
|
||||
|
||||
@@ -433,42 +423,10 @@ http.createServer(async (req, res) => {
|
||||
|
||||
### Tools
|
||||
|
||||
The tools are available in two modes:
|
||||
|
||||
1. **Snapshot Mode** (default): Uses accessibility snapshots for better performance and reliability
|
||||
2. **Vision Mode**: Uses screenshots for visual-based interactions
|
||||
|
||||
To use Vision Mode, add the `--vision` flag when starting the server:
|
||||
|
||||
```js
|
||||
{
|
||||
"mcpServers": {
|
||||
"playwright": {
|
||||
"command": "npx",
|
||||
"args": [
|
||||
"@playwright/mcp@latest",
|
||||
"--vision"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Vision Mode works best with the computer use models that are able to interact with elements using
|
||||
X Y coordinate space, based on the provided screenshot.
|
||||
|
||||
<!--- Tools generated by update-readme.js -->
|
||||
|
||||
<details>
|
||||
<summary><b>Interactions</b></summary>
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_snapshot**
|
||||
- Title: Page snapshot
|
||||
- Description: Capture accessibility snapshot of the current page, this is better than screenshot
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
<summary><b>Core automation</b></summary>
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
@@ -483,6 +441,22 @@ X Y coordinate space, based on the provided screenshot.
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_close**
|
||||
- Title: Close browser
|
||||
- Description: Close the page
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_console_messages**
|
||||
- Title: Get console messages
|
||||
- Description: Returns all console messages
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_drag**
|
||||
- Title: Drag mouse
|
||||
- Description: Perform drag and drop between two elements
|
||||
@@ -495,60 +469,17 @@ X Y coordinate space, based on the provided screenshot.
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_hover**
|
||||
- Title: Hover mouse
|
||||
- Description: Hover over element on page
|
||||
- **browser_evaluate**
|
||||
- Title: Evaluate JavaScript
|
||||
- Description: Evaluate JavaScript expression on page or element
|
||||
- Parameters:
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `ref` (string): Exact target element reference from the page snapshot
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_type**
|
||||
- Title: Type text
|
||||
- Description: Type text into editable element
|
||||
- Parameters:
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `ref` (string): Exact target element reference from the page snapshot
|
||||
- `text` (string): Text to type into the element
|
||||
- `submit` (boolean, optional): Whether to submit entered text (press Enter after)
|
||||
- `slowly` (boolean, optional): Whether to type one character at a time. Useful for triggering key handlers in the page. By default entire text is filled in at once.
|
||||
- `function` (string): () => { /* code */ } or (element) => { /* code */ } when element is provided
|
||||
- `element` (string, optional): Human-readable element description used to obtain permission to interact with the element
|
||||
- `ref` (string, optional): Exact target element reference from the page snapshot
|
||||
- Read-only: **false**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_select_option**
|
||||
- Title: Select option
|
||||
- Description: Select an option in a dropdown
|
||||
- Parameters:
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `ref` (string): Exact target element reference from the page snapshot
|
||||
- `values` (array): Array of values to select in the dropdown. This can be a single value or multiple values.
|
||||
- Read-only: **false**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_press_key**
|
||||
- Title: Press a key
|
||||
- Description: Press a key on the keyboard
|
||||
- Parameters:
|
||||
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
|
||||
- Read-only: **false**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_wait_for**
|
||||
- Title: Wait for
|
||||
- Description: Wait for text to appear or disappear or a specified time to pass
|
||||
- Parameters:
|
||||
- `time` (number, optional): The time to wait in seconds
|
||||
- `text` (string, optional): The text to wait for
|
||||
- `textGone` (string, optional): The text to wait for to disappear
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_file_upload**
|
||||
- Title: Upload files
|
||||
- Description: Upload one or multiple files
|
||||
@@ -566,10 +497,15 @@ X Y coordinate space, based on the provided screenshot.
|
||||
- `promptText` (string, optional): The text of the prompt in case of a prompt dialog.
|
||||
- Read-only: **false**
|
||||
|
||||
</details>
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
<details>
|
||||
<summary><b>Navigation</b></summary>
|
||||
- **browser_hover**
|
||||
- Title: Hover mouse
|
||||
- Description: Hover over element on page
|
||||
- Parameters:
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `ref` (string): Exact target element reference from the page snapshot
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
@@ -596,26 +532,51 @@ X Y coordinate space, based on the provided screenshot.
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
|
||||
</details>
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
<details>
|
||||
<summary><b>Evaluation</b></summary>
|
||||
- **browser_network_requests**
|
||||
- Title: List network requests
|
||||
- Description: Returns all network requests since loading the page
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_evaluate**
|
||||
- Title: Evaluate JavaScript
|
||||
- Description: Evaluate JavaScript expression on page or element
|
||||
- **browser_press_key**
|
||||
- Title: Press a key
|
||||
- Description: Press a key on the keyboard
|
||||
- Parameters:
|
||||
- `function` (string): () => { /* code */ } or (element) => { /* code */ } when element is provided
|
||||
- `element` (string, optional): Human-readable element description used to obtain permission to interact with the element
|
||||
- `ref` (string, optional): Exact target element reference from the page snapshot
|
||||
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
|
||||
- Read-only: **false**
|
||||
|
||||
</details>
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
<details>
|
||||
<summary><b>Resources</b></summary>
|
||||
- **browser_resize**
|
||||
- Title: Resize browser window
|
||||
- Description: Resize the browser window
|
||||
- Parameters:
|
||||
- `width` (number): Width of the browser window
|
||||
- `height` (number): Height of the browser window
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_select_option**
|
||||
- Title: Select option
|
||||
- Description: Select an option in a dropdown
|
||||
- Parameters:
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `ref` (string): Exact target element reference from the page snapshot
|
||||
- `values` (array): Array of values to select in the dropdown. This can be a single value or multiple values.
|
||||
- Read-only: **false**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_snapshot**
|
||||
- Title: Page snapshot
|
||||
- Description: Capture accessibility snapshot of the current page, this is better than screenshot
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
@@ -631,64 +592,41 @@ X Y coordinate space, based on the provided screenshot.
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_pdf_save**
|
||||
- Title: Save as PDF
|
||||
- Description: Save page as PDF
|
||||
- **browser_type**
|
||||
- Title: Type text
|
||||
- Description: Type text into editable element
|
||||
- Parameters:
|
||||
- `filename` (string, optional): File name to save the pdf to. Defaults to `page-{timestamp}.pdf` if not specified.
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_network_requests**
|
||||
- Title: List network requests
|
||||
- Description: Returns all network requests since loading the page
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_console_messages**
|
||||
- Title: Get console messages
|
||||
- Description: Returns all console messages
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Utilities</b></summary>
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_install**
|
||||
- Title: Install the browser specified in the config
|
||||
- Description: Install the browser specified in the config. Call this if you get an error about the browser not being installed.
|
||||
- Parameters: None
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `ref` (string): Exact target element reference from the page snapshot
|
||||
- `text` (string): Text to type into the element
|
||||
- `submit` (boolean, optional): Whether to submit entered text (press Enter after)
|
||||
- `slowly` (boolean, optional): Whether to type one character at a time. Useful for triggering key handlers in the page. By default entire text is filled in at once.
|
||||
- Read-only: **false**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_close**
|
||||
- Title: Close browser
|
||||
- Description: Close the page
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_resize**
|
||||
- Title: Resize browser window
|
||||
- Description: Resize the browser window
|
||||
- **browser_wait_for**
|
||||
- Title: Wait for
|
||||
- Description: Wait for text to appear or disappear or a specified time to pass
|
||||
- Parameters:
|
||||
- `width` (number): Width of the browser window
|
||||
- `height` (number): Height of the browser window
|
||||
- `time` (number, optional): The time to wait in seconds
|
||||
- `text` (string, optional): The text to wait for
|
||||
- `textGone` (string, optional): The text to wait for to disappear
|
||||
- Read-only: **true**
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Tabs</b></summary>
|
||||
<summary><b>Tab management</b></summary>
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_tab_close**
|
||||
- Title: Close a tab
|
||||
- Description: Close a tab
|
||||
- Parameters:
|
||||
- `index` (number, optional): The index of the tab to close. Closes current tab if not provided.
|
||||
- Read-only: **false**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
@@ -716,44 +654,29 @@ X Y coordinate space, based on the provided screenshot.
|
||||
- `index` (number): The index of the tab to select
|
||||
- Read-only: **true**
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Browser installation</b></summary>
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_tab_close**
|
||||
- Title: Close a tab
|
||||
- Description: Close a tab
|
||||
- Parameters:
|
||||
- `index` (number, optional): The index of the tab to close. Closes current tab if not provided.
|
||||
- **browser_install**
|
||||
- Title: Install the browser specified in the config
|
||||
- Description: Install the browser specified in the config. Call this if you get an error about the browser not being installed.
|
||||
- Parameters: None
|
||||
- Read-only: **false**
|
||||
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary><b>Vision mode</b></summary>
|
||||
<summary><b>Coordinate-based (opt-in via --caps=vision)</b></summary>
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_screen_capture**
|
||||
- Title: Take a screenshot
|
||||
- Description: Take a screenshot of the current page
|
||||
- Parameters: None
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_screen_move_mouse**
|
||||
- Title: Move mouse
|
||||
- Description: Move mouse to a given position
|
||||
- Parameters:
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `x` (number): X coordinate
|
||||
- `y` (number): Y coordinate
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_screen_click**
|
||||
- **browser_mouse_click_xy**
|
||||
- Title: Click
|
||||
- Description: Click left mouse button
|
||||
- Description: Click left mouse button at a given position
|
||||
- Parameters:
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `x` (number): X coordinate
|
||||
@@ -762,9 +685,9 @@ X Y coordinate space, based on the provided screenshot.
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_screen_drag**
|
||||
- **browser_mouse_drag_xy**
|
||||
- Title: Drag mouse
|
||||
- Description: Drag left mouse button
|
||||
- Description: Drag left mouse button to a given position
|
||||
- Parameters:
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `startX` (number): Start X coordinate
|
||||
@@ -775,52 +698,28 @@ X Y coordinate space, based on the provided screenshot.
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_screen_type**
|
||||
- Title: Type text
|
||||
- Description: Type text
|
||||
- **browser_mouse_move_xy**
|
||||
- Title: Move mouse
|
||||
- Description: Move mouse to a given position
|
||||
- Parameters:
|
||||
- `text` (string): Text to type into the element
|
||||
- `submit` (boolean, optional): Whether to submit entered text (press Enter after)
|
||||
- Read-only: **false**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_press_key**
|
||||
- Title: Press a key
|
||||
- Description: Press a key on the keyboard
|
||||
- Parameters:
|
||||
- `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
|
||||
- Read-only: **false**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_wait_for**
|
||||
- Title: Wait for
|
||||
- Description: Wait for text to appear or disappear or a specified time to pass
|
||||
- Parameters:
|
||||
- `time` (number, optional): The time to wait in seconds
|
||||
- `text` (string, optional): The text to wait for
|
||||
- `textGone` (string, optional): The text to wait for to disappear
|
||||
- `element` (string): Human-readable element description used to obtain permission to interact with the element
|
||||
- `x` (number): X coordinate
|
||||
- `y` (number): Y coordinate
|
||||
- Read-only: **true**
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
</details>
|
||||
|
||||
- **browser_file_upload**
|
||||
- Title: Upload files
|
||||
- Description: Upload one or multiple files
|
||||
- Parameters:
|
||||
- `paths` (array): The absolute paths to the files to upload. Can be a single file or multiple files.
|
||||
- Read-only: **false**
|
||||
<details>
|
||||
<summary><b>PDF generation (opt-in via --caps=pdf)</b></summary>
|
||||
|
||||
<!-- NOTE: This has been generated via update-readme.js -->
|
||||
|
||||
- **browser_handle_dialog**
|
||||
- Title: Handle a dialog
|
||||
- Description: Handle a dialog
|
||||
- **browser_pdf_save**
|
||||
- Title: Save as PDF
|
||||
- Description: Save page as PDF
|
||||
- Parameters:
|
||||
- `accept` (boolean): Whether to accept the dialog.
|
||||
- `promptText` (string, optional): The text of the prompt in case of a prompt dialog.
|
||||
- Read-only: **false**
|
||||
- `filename` (string, optional): File name to save the pdf to. Defaults to `page-{timestamp}.pdf` if not specified.
|
||||
- Read-only: **true**
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user