MCP (Model Context Protocol)
What can it do?โ
WebdriverIO MCP is a Model Context Protocol (MCP) server that enables AI assistants like Claude Desktop and Claude Code to automate and interact with web browsers and mobile applications.
Why WebdriverIO MCP?โ
- Mobile-First: Unlike browser-only MCP servers, WebdriverIO MCP supports iOS and Android native app automation via Appium
- Cross-Platform Selectors: Smart element detection generates multiple locator strategies (accessibility ID, XPath, UiAutomator, iOS predicates) automatically
- WebdriverIO Ecosystem: Built on the battle-tested WebdriverIO framework with its rich ecosystem of services and reporters
It provides a unified interface for:
- ๐ฅ๏ธ Desktop Browsers (Chrome - headed or headless mode)
- ๐ฑ Native Mobile Apps (iOS Simulators / Android Emulators / Real Devices via Appium)
- ๐ณ Hybrid Mobile Apps (Native + WebView context switching via Appium)
through the @wdio/mcp package.
This allows AI assistants to:
- Launch and control browsers with configurable dimensions, headless mode, and optional initial navigation
- Navigate websites and interact with elements (click, type, scroll)
- Analyze page content via accessibility tree and visible elements detection with pagination support
- Take screenshots automatically optimized (resized, compressed to max 1MB)
- Manage cookies for session handling
- Control mobile devices including gestures (tap, swipe, drag and drop)
- Switch contexts in hybrid apps between native and webview
- Execute scripts - JavaScript in browsers, Appium mobile commands on devices
- Handle device features like rotation, keyboard, geolocation
- and much more, see the Tools and Configuration options
NOTE For Mobile Apps Mobile automation requires a running Appium server with the appropriate drivers installed. See Prerequisites for setup instructions.
Installationโ
The easiest way to use @wdio/mcp is via npx without any local installation:
npx @wdio/mcp
Or install it globally:
npm install -g @wdio/mcp
Usage with Claudeโ
To use WebdriverIO MCP with Claude, modify the configuration file:
{
"mcpServers": {
"wdio-mcp": {
"command": "npx",
"args": ["-y", "@wdio/mcp"]
}
}
}
After adding the configuration, restart Claude. The WebdriverIO MCP tools will be available for browser and mobile automation tasks.
Usage with Claude Codeโ
Claude Code automatically detects MCP servers. You can configure it in your project's .claude/settings.json, or .mcp.json.
Or add it to .claude.json globally with executing:
claude mcp add --transport stdio wdio-mcp -- npx -y @wdio/mcp
Validate it by running the /mcp command inside claude code.
Quick Start Examplesโ
Browser Automationโ
Ask Claude to automate browser tasks:
"Open Chrome and navigate to https://webdriver.io"
"Click the 'Get Started' button"
"Take a screenshot of the page"
"Find all visible links on the page"
Mobile App Automationโ
Ask Claude to automate mobile apps:
"Start my iOS app on the iPhone 15 simulator"
"Tap the login button"
"Swipe up to scroll down"
"Take a screenshot of the current screen"
Capabilitiesโ
Browser Automation (Chrome)โ
| Feature | Description |
|---|---|
| Session Management | Launch Chrome in headed/headless mode with custom dimensions and optional navigation URL |
| Navigation | Navigate to URLs |
| Element Interaction | Click elements, type text, find elements by various selectors |
| Page Analysis | Get visible elements (with pagination), accessibility tree (with filtering) |
| Screenshots | Capture screenshots (auto-optimized to max 1MB) |
| Scrolling | Scroll up/down by configurable pixel amounts |
| Cookie Management | Get, set, and delete cookies |
| Script Execution | Execute custom JavaScript in browser context |
Mobile App Automation (iOS/Android)โ
| Feature | Description |
|---|---|
| Session Management | Launch apps on simulators, emulators, or real devices |
| Touch Gestures | Tap, swipe, drag and drop |
| Element Detection | Smart element detection with multiple locator strategies and pagination |
| App Lifecycle | Get app state (via execute_script for activate/terminate) |
| Context Switching | Switch between native and webview contexts in hybrid apps |
| Device Control | Rotate device, keyboard control |
| Geolocation | Get and set device GPS coordinates |
| Permissions | Automatic permission and alert handling |
| Script Execution | Execute Appium mobile commands (pressKey, deepLink, shell, etc.) |
Prerequisitesโ
Browser Automationโ
- Chrome must be installed on your system
- WebdriverIO handles automated ChromeDriver management
Mobile Automationโ
iOSโ
- Install Xcode from the Mac App Store
- Install Xcode Command Line Tools:
xcode-select --install - Install Appium:
npm install -g appium - Install the XCUITest driver:
appium driver install xcuitest - Start the Appium server:
appium - For Simulators: Open Xcode โ Window โ Devices and Simulators to create/manage simulators
- For Real Devices: You'll need the device UDID (40-character unique identifier)
Androidโ
- Install Android Studio and set up Android SDK
- Set environment variables:
export ANDROID_HOME=$HOME/Library/Android/sdk
export PATH=$PATH:$ANDROID_HOME/emulator
export PATH=$PATH:$ANDROID_HOME/platform-tools - Install Appium:
npm install -g appium - Install the UiAutomator2 driver:
appium driver install uiautomator2 - Start the Appium server:
appium - Create an emulator via Android Studio โ Virtual Device Manager
- Start the emulator before running tests
Architectureโ
How It Worksโ
WebdriverIO MCP acts as a bridge between AI assistants and browser/mobile automation:
โโโโโโโโโโโโโโโโโโโ MCP Protocol โโโโโโโโโโโโโโโโโโโ
โ Claude Desktop โ โโโโโโโโโโโโโโโโโโโโบ โ @wdio/mcp โ
โ or Claude Code โ (stdio) โ Server โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโ
โ
WebDriverIO API
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโผโโโโโโโโ โโโโโโโโโผโโโโโโโโ โโโโโโโโโผโโโโโโโโ
โ Chrome โ โ Appium โ โ Appium โ
โ (Browser) โ โ (iOS) โ โ (Android) โ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
Session Managementโ
- Single-session model: Only one browser OR app session can be active at a time
- Session state is maintained globally across tool calls
- Auto-detach: Sessions with preserved state (
noReset: true) automatically detach on close
Element Detectionโ
Browser (Web)โ
- Uses an optimized browser script to find all visible, interactable elements
- Returns elements with CSS selectors, IDs, classes, and ARIA information
- Filters to viewport-visible elements by default
Mobile (Native Apps)โ
- Uses efficient XML page source parsing (2 HTTP calls vs 600+ for traditional queries)
- Platform-specific element classification for Android and iOS
- Generates multiple locator strategies per element:
- Accessibility ID (cross-platform, most stable)
- Resource ID / Name attribute
- Text / Label matching
- XPath (full and simplified)
- UiAutomator (Android) / Predicates (iOS)
Selector Syntaxโ
The MCP server supports multiple selector strategies. See Selectors for detailed documentation.
Web (CSS/XPath)โ
# CSS Selectors
button.my-class
#element-id
[data-testid="login"]
# XPath
//button[@class='submit']
//a[contains(text(), 'Click')]
# Text Selectors (WebdriverIO specific)
button=Exact Button Text
a*=Partial Link Text
Mobile (Cross-Platform)โ
# Accessibility ID (recommended - works on iOS & Android)
~loginButton
# Android UiAutomator
android=new UiSelector().text("Login")
# iOS Predicate String
-ios predicate string:label == "Login"
# iOS Class Chain
-ios class chain:**/XCUIElementTypeButton[`label == "Login"`]
# XPath (works on both platforms)
//android.widget.Button[@text="Login"]
//XCUIElementTypeButton[@label="Login"]
Available Toolsโ
The MCP server provides 25 tools for browser and mobile automation. See Tools for the complete reference.
Browser Toolsโ
| Tool | Description |
|---|---|
start_browser | Launch Chrome browser (with optional initial URL) |
close_session | Close or detach from session |
navigate | Navigate to a URL |
click_element | Click an element |
set_value | Type text into input |
get_visible_elements | Get visible/interactable elements (with pagination) |
get_accessibility | Get accessibility tree (with filtering) |
take_screenshot | Capture screenshot (auto-optimized) |
scroll | Scroll the page up or down |
get_cookies / set_cookie / delete_cookies | Cookie management |
execute_script | Execute JavaScript in browser |
Mobile Toolsโ
| Tool | Description |
|---|---|
start_app_session | Launch iOS/Android app |
tap_element | Tap element or coordinates |
swipe | Swipe in a direction |
drag_and_drop | Drag between locations |
get_app_state | Check if app is running |
get_contexts / switch_context | Hybrid app context switching |
rotate_device | Rotate to portrait/landscape |
get_geolocation / set_geolocation | Get or set GPS coordinates |
hide_keyboard | Dismiss on-screen keyboard |
execute_script | Execute Appium mobile commands |
Automatic Handlingโ
Permissionsโ
By default, the MCP server automatically grants app permissions (autoGrantPermissions: true), eliminating the need to manually handle permission dialogs during automation.
System Alertsโ
System alerts (like "Allow notifications?") are automatically accepted by default (autoAcceptAlerts: true). This can be configured to dismiss instead with autoDismissAlerts: true.
Configurationโ
Environment Variablesโ
Configure the Appium server connection:
| Variable | Default | Description |
|---|---|---|
APPIUM_URL | 127.0.0.1 | Appium server hostname |
APPIUM_URL_PORT | 4723 | Appium server port |
APPIUM_PATH | / | Appium server path |
Example with Custom Appium Serverโ
{
"mcpServers": {
"wdio-mcp": {
"command": "npx",
"args": ["-y", "@wdio/mcp"],
"env": {
"APPIUM_URL": "192.168.1.100",
"APPIUM_URL_PORT": "4724"
}
}
}
}
Performance Optimizationโ
The MCP server is optimized for efficient AI assistant communication:
- TOON Format: Uses Token-Oriented Object Notation for minimal token usage
- XML Parsing: Mobile element detection uses 2 HTTP calls (vs 600+ traditionally)
- Screenshot Compression: Images auto-compressed to max 1MB using Sharp
- Viewport Filtering: Only visible elements returned by default
- Pagination: Large element lists can be paginated to reduce response size
TypeScript Supportโ
The MCP server is written in TypeScript and includes full type definitions. If you're extending or integrating with the server programmatically, you'll benefit from auto-completion and type safety.
Error Handlingโ
All tools are designed with robust error handling:
- Errors are returned as text content (never thrown), maintaining MCP protocol stability
- Descriptive error messages help diagnose issues
- Session state is preserved even when individual operations fail
Use Casesโ
Quality Assuranceโ
- AI-powered test case execution
- Visual regression testing with screenshots
- Accessibility auditing via accessibility tree analysis
Web Scraping & Data Extractionโ
- Navigate complex multi-page flows
- Extract structured data from dynamic content
- Handle authentication and session management
Mobile App Testingโ
- Cross-platform test automation (iOS + Android)
- Onboarding flow validation
- Deep linking and navigation testing
Integration Testingโ
- End-to-end workflow testing
- API + UI integration verification
- Multi-platform consistency checks
Troubleshootingโ
Browser won't startโ
- Ensure Chrome is installed
- Check that no other process is using the default debugging port (9222)
- Try headless mode if display issues occur
Appium connection failedโ
- Verify Appium server is running (
appium) - Check the Appium URL and port configuration
- Ensure the appropriate driver is installed (
appium driver list)
iOS Simulator issuesโ
- Ensure Xcode is installed and up to date
- Check that simulators are available (
xcrun simctl list devices) - For real devices, verify the UDID is correct
Android Emulator issuesโ
- Ensure Android SDK is properly configured
- Verify emulator is running (
adb devices) - Check that
ANDROID_HOMEenvironment variable is set
Resourcesโ
- Tools Reference - Complete list of available tools
- Selectors Guide - Selector syntax documentation
- Configuration - Configuration options
- FAQ - Frequently asked questions
- GitHub Repository - Source code and issues
- NPM Package - Package on npm
- Model Context Protocol - MCP specification