chore: turn vision into capability (#679)

Fixes https://github.com/microsoft/playwright-mcp/issues/420
2025-10-12 00:25:14 +03:00 · 2025-07-16 16:40:00 -07:00
parent 012c906500
commit d61aa16fee
23 changed files with 366 additions and 575 deletions
--- a/README.md
+++ b/README.md
@@ -193,9 +193,8 @@ Playwright MCP server supports following arguments. They can be provided in the
  --browser <browser>          browser or chrome channel to use, possible
                               values: chrome, firefox, webkit, msedge.
  --browser-agent <endpoint>   Use browser agent (experimental).
-  --caps <caps>                comma-separated list of capabilities to enable,
-                               possible values: tabs, pdf, history, wait, files,
-                               install. Default is all.
+  --caps <caps>                comma-separated list of additional capabilities
+                               to enable, possible values: vision, pdf.
  --cdp-endpoint <endpoint>    CDP endpoint to connect to.
  --config <path>              path to the configuration file.
  --device <device>            device to emulate, for example: "iPhone 15"
@@ -227,8 +226,6 @@ Playwright MCP server supports following arguments. They can be provided in the
                               specified, a temporary directory will be created.
  --viewport-size <size>       specify browser viewport size in pixels, for
                               example "1280, 720"
-  --vision                     Run server that uses screenshots (Aria snapshots
-                               are used by default)
 ```

 <!--- End of options generated section -->
@@ -329,21 +326,14 @@ npx @playwright/mcp@latest --config path/to/config.json
    host?: string;  // Host to bind to (default: localhost)
  },

-  // List of enabled capabilities
+  // List of additional capabilities
  capabilities?: Array<
-    'core' |    // Core browser automation
    'tabs' |    // Tab management
-    'pdf' |     // PDF generation
-    'history' | // Browser history
-    'wait' |    // Wait utilities
-    'files' |   // File handling
    'install' | // Browser installation
-    'testing'   // Testing
+    'pdf' |     // PDF generation
+    'vision' |  // Coordinate-based interactions
  >;

-  // Enable vision mode (screenshots instead of accessibility snapshots)
-  vision?: boolean;
-
  // Directory for output files
  outputDir?: string;

@@ -433,42 +423,10 @@ http.createServer(async (req, res) => {

 ### Tools

-The tools are available in two modes:
-
-1. **Snapshot Mode** (default): Uses accessibility snapshots for better performance and reliability
-2. **Vision Mode**: Uses screenshots for visual-based interactions
-
-To use Vision Mode, add the `--vision` flag when starting the server:
-
-```js
-{
-  "mcpServers": {
-    "playwright": {
-      "command": "npx",
-      "args": [
-        "@playwright/mcp@latest",
-        "--vision"
-      ]
-    }
-  }
-}
-```
-
-Vision Mode works best with the computer use models that are able to interact with elements using
-X Y coordinate space, based on the provided screenshot.
-
 <!--- Tools generated by update-readme.js -->

 <details>
-<summary><b>Interactions</b></summary>
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_snapshot**
-  - Title: Page snapshot
-  - Description: Capture accessibility snapshot of the current page, this is better than screenshot
-  - Parameters: None
-  - Read-only: **true**
+<summary><b>Core automation</b></summary>

 <!-- NOTE: This has been generated via update-readme.js -->

@@ -483,6 +441,22 @@ X Y coordinate space, based on the provided screenshot.

 <!-- NOTE: This has been generated via update-readme.js -->

+- **browser_close**
+  - Title: Close browser
+  - Description: Close the page
+  - Parameters: None
+  - Read-only: **true**
+
+<!-- NOTE: This has been generated via update-readme.js -->
+
+- **browser_console_messages**
+  - Title: Get console messages
+  - Description: Returns all console messages
+  - Parameters: None
+  - Read-only: **true**
+
+<!-- NOTE: This has been generated via update-readme.js -->
+
 - **browser_drag**
  - Title: Drag mouse
  - Description: Perform drag and drop between two elements
@@ -495,60 +469,17 @@ X Y coordinate space, based on the provided screenshot.

 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_hover**
-  - Title: Hover mouse
-  - Description: Hover over element on page
+- **browser_evaluate**
+  - Title: Evaluate JavaScript
+  - Description: Evaluate JavaScript expression on page or element
  - Parameters:
-    - `element` (string): Human-readable element description used to obtain permission to interact with the element
-    - `ref` (string): Exact target element reference from the page snapshot
-  - Read-only: **true**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_type**
-  - Title: Type text
-  - Description: Type text into editable element
-  - Parameters:
-    - `element` (string): Human-readable element description used to obtain permission to interact with the element
-    - `ref` (string): Exact target element reference from the page snapshot
-    - `text` (string): Text to type into the element
-    - `submit` (boolean, optional): Whether to submit entered text (press Enter after)
-    - `slowly` (boolean, optional): Whether to type one character at a time. Useful for triggering key handlers in the page. By default entire text is filled in at once.
+    - `function` (string): () => { /* code */ } or (element) => { /* code */ } when element is provided
+    - `element` (string, optional): Human-readable element description used to obtain permission to interact with the element
+    - `ref` (string, optional): Exact target element reference from the page snapshot
  - Read-only: **false**

 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_select_option**
-  - Title: Select option
-  - Description: Select an option in a dropdown
-  - Parameters:
-    - `element` (string): Human-readable element description used to obtain permission to interact with the element
-    - `ref` (string): Exact target element reference from the page snapshot
-    - `values` (array): Array of values to select in the dropdown. This can be a single value or multiple values.
-  - Read-only: **false**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_press_key**
-  - Title: Press a key
-  - Description: Press a key on the keyboard
-  - Parameters:
-    - `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
-  - Read-only: **false**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_wait_for**
-  - Title: Wait for
-  - Description: Wait for text to appear or disappear or a specified time to pass
-  - Parameters:
-    - `time` (number, optional): The time to wait in seconds
-    - `text` (string, optional): The text to wait for
-    - `textGone` (string, optional): The text to wait for to disappear
-  - Read-only: **true**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
 - **browser_file_upload**
  - Title: Upload files
  - Description: Upload one or multiple files
@@ -566,10 +497,15 @@ X Y coordinate space, based on the provided screenshot.
    - `promptText` (string, optional): The text of the prompt in case of a prompt dialog.
  - Read-only: **false**

-</details>
+<!-- NOTE: This has been generated via update-readme.js -->

-<details>
-<summary><b>Navigation</b></summary>
+- **browser_hover**
+  - Title: Hover mouse
+  - Description: Hover over element on page
+  - Parameters:
+    - `element` (string): Human-readable element description used to obtain permission to interact with the element
+    - `ref` (string): Exact target element reference from the page snapshot
+  - Read-only: **true**

 <!-- NOTE: This has been generated via update-readme.js -->

@@ -596,26 +532,51 @@ X Y coordinate space, based on the provided screenshot.
  - Parameters: None
  - Read-only: **true**

-</details>
+<!-- NOTE: This has been generated via update-readme.js -->

-<details>
-<summary><b>Evaluation</b></summary>
+- **browser_network_requests**
+  - Title: List network requests
+  - Description: Returns all network requests since loading the page
+  - Parameters: None
+  - Read-only: **true**

 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_evaluate**
-  - Title: Evaluate JavaScript
-  - Description: Evaluate JavaScript expression on page or element
+- **browser_press_key**
+  - Title: Press a key
+  - Description: Press a key on the keyboard
  - Parameters:
-    - `function` (string): () => { /* code */ } or (element) => { /* code */ } when element is provided
-    - `element` (string, optional): Human-readable element description used to obtain permission to interact with the element
-    - `ref` (string, optional): Exact target element reference from the page snapshot
+    - `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
  - Read-only: **false**

-</details>
+<!-- NOTE: This has been generated via update-readme.js -->

-<details>
-<summary><b>Resources</b></summary>
+- **browser_resize**
+  - Title: Resize browser window
+  - Description: Resize the browser window
+  - Parameters:
+    - `width` (number): Width of the browser window
+    - `height` (number): Height of the browser window
+  - Read-only: **true**
+
+<!-- NOTE: This has been generated via update-readme.js -->
+
+- **browser_select_option**
+  - Title: Select option
+  - Description: Select an option in a dropdown
+  - Parameters:
+    - `element` (string): Human-readable element description used to obtain permission to interact with the element
+    - `ref` (string): Exact target element reference from the page snapshot
+    - `values` (array): Array of values to select in the dropdown. This can be a single value or multiple values.
+  - Read-only: **false**
+
+<!-- NOTE: This has been generated via update-readme.js -->
+
+- **browser_snapshot**
+  - Title: Page snapshot
+  - Description: Capture accessibility snapshot of the current page, this is better than screenshot
+  - Parameters: None
+  - Read-only: **true**

 <!-- NOTE: This has been generated via update-readme.js -->

@@ -631,64 +592,41 @@ X Y coordinate space, based on the provided screenshot.

 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_pdf_save**
-  - Title: Save as PDF
-  - Description: Save page as PDF
+- **browser_type**
+  - Title: Type text
+  - Description: Type text into editable element
  - Parameters:
-    - `filename` (string, optional): File name to save the pdf to. Defaults to `page-{timestamp}.pdf` if not specified.
-  - Read-only: **true**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_network_requests**
-  - Title: List network requests
-  - Description: Returns all network requests since loading the page
-  - Parameters: None
-  - Read-only: **true**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_console_messages**
-  - Title: Get console messages
-  - Description: Returns all console messages
-  - Parameters: None
-  - Read-only: **true**
-
-</details>
-
-<details>
-<summary><b>Utilities</b></summary>
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_install**
-  - Title: Install the browser specified in the config
-  - Description: Install the browser specified in the config. Call this if you get an error about the browser not being installed.
-  - Parameters: None
+    - `element` (string): Human-readable element description used to obtain permission to interact with the element
+    - `ref` (string): Exact target element reference from the page snapshot
+    - `text` (string): Text to type into the element
+    - `submit` (boolean, optional): Whether to submit entered text (press Enter after)
+    - `slowly` (boolean, optional): Whether to type one character at a time. Useful for triggering key handlers in the page. By default entire text is filled in at once.
  - Read-only: **false**

 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_close**
-  - Title: Close browser
-  - Description: Close the page
-  - Parameters: None
-  - Read-only: **true**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_resize**
-  - Title: Resize browser window
-  - Description: Resize the browser window
+- **browser_wait_for**
+  - Title: Wait for
+  - Description: Wait for text to appear or disappear or a specified time to pass
  - Parameters:
-    - `width` (number): Width of the browser window
-    - `height` (number): Height of the browser window
+    - `time` (number, optional): The time to wait in seconds
+    - `text` (string, optional): The text to wait for
+    - `textGone` (string, optional): The text to wait for to disappear
  - Read-only: **true**

 </details>

 <details>
-<summary><b>Tabs</b></summary>
+<summary><b>Tab management</b></summary>
+
+<!-- NOTE: This has been generated via update-readme.js -->
+
+- **browser_tab_close**
+  - Title: Close a tab
+  - Description: Close a tab
+  - Parameters:
+    - `index` (number, optional): The index of the tab to close. Closes current tab if not provided.
+  - Read-only: **false**

 <!-- NOTE: This has been generated via update-readme.js -->

@@ -716,44 +654,29 @@ X Y coordinate space, based on the provided screenshot.
    - `index` (number): The index of the tab to select
  - Read-only: **true**

+</details>
+
+<details>
+<summary><b>Browser installation</b></summary>
+
 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_tab_close**
-  - Title: Close a tab
-  - Description: Close a tab
-  - Parameters:
-    - `index` (number, optional): The index of the tab to close. Closes current tab if not provided.
+- **browser_install**
+  - Title: Install the browser specified in the config
+  - Description: Install the browser specified in the config. Call this if you get an error about the browser not being installed.
+  - Parameters: None
  - Read-only: **false**

 </details>

 <details>
-<summary><b>Vision mode</b></summary>
+<summary><b>Coordinate-based (opt-in via --caps=vision)</b></summary>

 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_screen_capture**
-  - Title: Take a screenshot
-  - Description: Take a screenshot of the current page
-  - Parameters: None
-  - Read-only: **true**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_screen_move_mouse**
-  - Title: Move mouse
-  - Description: Move mouse to a given position
-  - Parameters:
-    - `element` (string): Human-readable element description used to obtain permission to interact with the element
-    - `x` (number): X coordinate
-    - `y` (number): Y coordinate
-  - Read-only: **true**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_screen_click**
+- **browser_mouse_click_xy**
  - Title: Click
-  - Description: Click left mouse button
+  - Description: Click left mouse button at a given position
  - Parameters:
    - `element` (string): Human-readable element description used to obtain permission to interact with the element
    - `x` (number): X coordinate
@@ -762,9 +685,9 @@ X Y coordinate space, based on the provided screenshot.

 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_screen_drag**
+- **browser_mouse_drag_xy**
  - Title: Drag mouse
-  - Description: Drag left mouse button
+  - Description: Drag left mouse button to a given position
  - Parameters:
    - `element` (string): Human-readable element description used to obtain permission to interact with the element
    - `startX` (number): Start X coordinate
@@ -775,52 +698,28 @@ X Y coordinate space, based on the provided screenshot.

 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_screen_type**
-  - Title: Type text
-  - Description: Type text
+- **browser_mouse_move_xy**
+  - Title: Move mouse
+  - Description: Move mouse to a given position
  - Parameters:
-    - `text` (string): Text to type into the element
-    - `submit` (boolean, optional): Whether to submit entered text (press Enter after)
-  - Read-only: **false**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_press_key**
-  - Title: Press a key
-  - Description: Press a key on the keyboard
-  - Parameters:
-    - `key` (string): Name of the key to press or a character to generate, such as `ArrowLeft` or `a`
-  - Read-only: **false**
-
-<!-- NOTE: This has been generated via update-readme.js -->
-
- **browser_wait_for**
-  - Title: Wait for
-  - Description: Wait for text to appear or disappear or a specified time to pass
-  - Parameters:
-    - `time` (number, optional): The time to wait in seconds
-    - `text` (string, optional): The text to wait for
-    - `textGone` (string, optional): The text to wait for to disappear
+    - `element` (string): Human-readable element description used to obtain permission to interact with the element
+    - `x` (number): X coordinate
+    - `y` (number): Y coordinate
  - Read-only: **true**

-<!-- NOTE: This has been generated via update-readme.js -->
+</details>

- **browser_file_upload**
-  - Title: Upload files
-  - Description: Upload one or multiple files
-  - Parameters:
-    - `paths` (array): The absolute paths to the files to upload. Can be a single file or multiple files.
-  - Read-only: **false**
+<details>
+<summary><b>PDF generation (opt-in via --caps=pdf)</b></summary>

 <!-- NOTE: This has been generated via update-readme.js -->

- **browser_handle_dialog**
-  - Title: Handle a dialog
-  - Description: Handle a dialog
+- **browser_pdf_save**
+  - Title: Save as PDF
+  - Description: Save page as PDF
  - Parameters:
-    - `accept` (boolean): Whether to accept the dialog.
-    - `promptText` (string, optional): The text of the prompt in case of a prompt dialog.
-  - Read-only: **false**
+    - `filename` (string, optional): File name to save the pdf to. Defaults to `page-{timestamp}.pdf` if not specified.
+  - Read-only: **true**

 </details>