FAQ

Frequently asked questions about WebdriverIO MCP.

General

What is MCP?

MCP (Model Context Protocol) is an open protocol that enables AI assistants like Claude to interact with external tools and services. WebdriverIO MCP implements this protocol to provide browser and mobile automation capabilities to Claude Desktop and Claude Code.

What can I automate with WebdriverIO MCP?

You can automate:

Desktop browsers (Chrome) - navigation, clicking, typing, screenshots
iOS apps - on simulators or real devices
Android apps - on emulators or real devices
Hybrid apps - switching between native and web contexts

Do I need to write code?

No! That's the main benefit of MCP. You can describe what you want to do in natural language, and Claude will use the appropriate tools to accomplish the task.

Example prompts:

"Open Chrome and navigate to webdriver.io"
"Click the Get Started button"
"Take a screenshot of the current page"
"Start my iOS app and log in as test user"

Installation & Setup

How do I install WebdriverIO MCP?

You don't need to install it separately. The MCP server runs automatically via npx when you configure it in Claude Desktop or Claude Code.

Add this to your Claude Desktop config:

{
    "mcpServers": {
        "wdio-mcp": {
            "command": "npx",
            "args": ["-y", "@wdio/mcp"]
        }
    }
}

Where is the Claude Desktop config file?

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

Do I need Appium for browser automation?

No. Browser automation only requires Chrome to be installed. WebdriverIO handles the ChromeDriver automatically.

Do I need Appium for mobile automation?

Yes. Mobile automation requires:

Appium server running (npm install -g appium && appium)
Platform drivers installed (appium driver install xcuitest for iOS, appium driver install uiautomator2 for Android)
Appropriate development tools (Xcode for iOS, Android SDK for Android)

Browser Automation

Which browsers are supported?

Currently, only Chrome is supported. Support for other browsers may be added in future versions.

Can I run Chrome in headless mode?

Yes! Ask Claude to start the browser in headless mode:

"Start Chrome in headless mode"

Or Claude will use this option when appropriate (e.g., in CI/CD contexts).

Can I set the browser window size?

Yes. You can specify dimensions when starting the browser:

"Start Chrome with a window size of 1920x1080"

Supported dimensions: 400-3840 pixels wide, 400-2160 pixels tall. Default is 1920x1080.

Can I start the browser and navigate in one step?

Yes! Use the navigationUrl parameter:

"Start Chrome and navigate to https://webdriver.io"

This is more efficient than starting the browser and then navigating separately.

How do I take screenshots?

Simply ask Claude:

"Take a screenshot of the current page"

Screenshots are automatically optimized:

Scaled to max 2000px dimension
Compressed to max 1MB file size
Format: PNG or JPEG (automatically selected for optimal quality)

Can I interact with iframes?

Currently, the MCP server operates on the main document. iframe interaction may be added in future versions.

Can I execute custom JavaScript?

Yes! Use the execute_script tool:

"Execute script to get the page title" "Execute script: return document.querySelectorAll('button').length"

Mobile Automation

How do I start an iOS app?

Ask Claude with the necessary details:

"Start my iOS app located at /path/to/MyApp.app on the iPhone 15 simulator"

Or for an installed app:

"Start the app with noReset enabled on the iPhone 15 simulator"

How do I start an Android app?

"Start my Android app at /path/to/app.apk on the Pixel 7 emulator"

Or for an installed app:

"Start the app with noReset enabled on the Pixel 7 emulator"

Can I test on real devices?

Yes! For real devices, you'll need the device UDID:

iOS: Connect device, open Finder, click device, click serial number to reveal UDID
Android: Run adb devices in terminal

Then ask Claude:

"Start my iOS app on the real device with UDID abc123..."

How do I handle permission dialogs?

By default, permissions are automatically granted (autoGrantPermissions: true). If you need to test permission flows, you can disable this:

"Start my app without automatically granting permissions"

What gestures are supported?

Tap: Tap on elements or coordinates
Swipe: Swipe up, down, left, or right
Drag and Drop: Drag from one element to another or to coordinates

Note: long_press is available through execute_script with Appium mobile commands.

How do I scroll in mobile apps?

Use swipe gestures:

"Swipe up to scroll down" "Swipe down to scroll up"

Can I rotate the device?

Yes:

"Rotate the device to landscape" "Rotate the device to portrait"

How do I handle hybrid apps?

For apps with webviews, you can switch contexts:

"Get available contexts" "Switch to the webview context" "Switch back to native context"

Can I execute Appium mobile commands?

Yes! Use the execute_script tool:

Execute script "mobile: pressKey" with args [{ keycode: 4 }]  // Press BACK on Android
Execute script "mobile: activateApp" with args [{ appId: "com.example.app" }]
Execute script "mobile: terminateApp" with args [{ bundleId: "com.example.app" }]

Element Selection

How does Claude know which element to interact with?

Claude uses the get_visible_elements tool to identify interactive elements on the page/screen. Each element comes with multiple selector strategies.

What if there are too many elements on the page?

Use pagination to manage large element lists:

"Get the first 20 visible elements" "Get visible elements with offset 20 and limit 20"

The response includes total, showing, and hasMore to help navigate through elements.

Can I get only specific types of elements?

Yes! Use the elementType parameter:

interactable (default): Buttons, links, inputs
visual: Images, SVGs
all: Both types

"Get visible visual elements on the page"

What if Claude clicks the wrong element?

You can be more specific:

Provide exact text: "Click the button that says 'Submit Order'"
Provide selector: "Click the element with selector #submit-btn"
Provide accessibility ID: "Click the element with accessibility ID loginButton"

What's the best selector strategy for mobile?

Accessibility ID (best) - ~loginButton
Resource ID (Android) - id=login_button
Predicate String (iOS) - -ios predicate string:label == "Login"
XPath (last resort) - slower but works everywhere

What is the accessibility tree and when should I use it?

The accessibility tree provides semantic information about page elements (roles, names, states). Use get_accessibility when:

get_visible_elements doesn't return expected elements
You need to find elements by accessibility role (button, link, textbox, etc.)
You need detailed semantic information about elements

"Get accessibility tree filtered to button and link roles"

Session Management

Can I have multiple sessions at once?

No. The MCP server uses a single-session model. Only one browser or app session can be active at a time.

What happens when I close a session?

It depends on the session type and settings:

Browser: Chrome closes completely
Mobile with noReset: false: App terminates
Mobile with noReset: true or no appPath: App stays open (session detaches automatically)

Can I preserve app state between sessions?

Yes! Use the noReset option:

"Start my app with noReset enabled"

This preserves login state, preferences, and other app data.

What's the difference between close and detach?

Close: Terminates the browser/app completely
Detach: Disconnects automation but keeps browser/app running

Detach is useful when you want to manually inspect the state after automation.

My session keeps timing out during debugging

Increase the command timeout:

"Start my app with newCommandTimeout of 300 seconds"

Default is 60 seconds. For long debugging sessions, try 300-600 seconds.

Troubleshooting

"Session not found" error

This means no active session exists. Start a browser or app session first:

"Start Chrome and navigate to google.com"

"Element not found" error

The element might not be visible or might have a different selector. Try:

Asking Claude to get all visible elements first
Providing a more specific selector
Waiting for the page/app to fully load
Using inViewportOnly: false to find off-screen elements

Browser won't start

Ensure Chrome is installed
Check if another process is using the debugging port (9222)
Try headless mode

Appium connection failed

This is the most common issue when starting mobile automation.

Verify Appium is running: curl http://localhost:4723/status
Start Appium if needed: appium
Check your Appium URL configuration matches the server
Ensure drivers are installed: appium driver list --installed

tip

The MCP server requires Appium to be running before starting mobile sessions. Make sure to start Appium first:

appium

Future versions may include automatic Appium service management.

iOS Simulator won't start

Ensure Xcode is installed: xcode-select --install
List available simulators: xcrun simctl list devices
Check for specific simulator errors in Console.app

Android Emulator won't start

Set ANDROID_HOME: export ANDROID_HOME=$HOME/Library/Android/sdk
Check emulators: emulator -list-avds
Start emulator manually: emulator -avd <avd-name>
Verify device is connected: adb devices

Screenshots aren't working

For mobile, ensure the session is active
For browser, try a different page (some pages block screenshots)
Check Claude Desktop logs for errors

Screenshots are automatically compressed to max 1MB, so large screenshots will work but may be lower quality.

Performance

Why is mobile automation slow?

Mobile automation involves:

Network communication with Appium server
Appium communicating with the device/simulator
Device rendering and response

Tips for faster automation:

Use emulators/simulators instead of real devices for development
Use accessibility IDs instead of XPath
Enable inViewportOnly: true for element detection
Use pagination (limit) to reduce token usage

How can I speed up element detection?

The MCP server already optimizes element detection using XML page source parsing (2 HTTP calls vs 600+ for traditional element queries). Additional tips:

Keep inViewportOnly: true (default)
Set includeContainers: false (default)
Use limit and offset for pagination on large screens
Use specific selectors instead of finding all elements

Screenshots are slow or failing

Screenshots are automatically optimized:

Resized if larger than 2000px
Compressed to stay under 1MB
Converted to JPEG if PNG is too large

This optimization reduces processing time and ensures Claude can handle the image.

Limitations

What are the current limitations?

Single session: Only one browser/app at a time
Browser support: Chrome only (for now)
iframe support: Limited support for iframes
File uploads: Not directly supported via tools
Audio/Video: Cannot interact with media playback
Browser extensions: Not supported

Can I use this for production testing?

WebdriverIO MCP is designed for interactive AI-assisted automation. For production CI/CD testing, consider using WebdriverIO's traditional test runner with full programmatic control.

Security

Is my data secure?

The MCP server runs locally on your machine. All automation happens through local browser/Appium connections. No data is sent to external servers beyond what you explicitly navigate to.

Can Claude access my passwords?

Claude can see page content and interact with elements, but:

Passwords in <input type="password"> fields are masked
You should avoid automating sensitive credentials
Use test accounts for automation

Contributing

How can I contribute?

Visit the GitHub repository to:

Report bugs
Request features
Submit pull requests

General​

What is MCP?​

What can I automate with WebdriverIO MCP?​

Do I need to write code?​

Installation & Setup​

How do I install WebdriverIO MCP?​

Where is the Claude Desktop config file?​

Do I need Appium for browser automation?​

Do I need Appium for mobile automation?​

Browser Automation​

Which browsers are supported?​

Can I run Chrome in headless mode?​

Can I set the browser window size?​

Can I start the browser and navigate in one step?​

How do I take screenshots?​

Can I interact with iframes?​

Can I execute custom JavaScript?​

Mobile Automation​

How do I start an iOS app?​

How do I start an Android app?​

Can I test on real devices?​

How do I handle permission dialogs?​

What gestures are supported?​

How do I scroll in mobile apps?​

Can I rotate the device?​

How do I handle hybrid apps?​

Can I execute Appium mobile commands?​

Element Selection​

How does Claude know which element to interact with?​

What if there are too many elements on the page?​

Can I get only specific types of elements?​

What if Claude clicks the wrong element?​

What's the best selector strategy for mobile?​

What is the accessibility tree and when should I use it?​

Session Management​

Can I have multiple sessions at once?​

What happens when I close a session?​

Can I preserve app state between sessions?​

What's the difference between close and detach?​

My session keeps timing out during debugging​

Troubleshooting​

"Session not found" error​

"Element not found" error​

Browser won't start​

Appium connection failed​

iOS Simulator won't start​

Android Emulator won't start​

Screenshots aren't working​

Performance​

Why is mobile automation slow?​

How can I speed up element detection?​

Screenshots are slow or failing​

Limitations​

What are the current limitations?​

Can I use this for production testing?​

Security​

Is my data secure?​

Can Claude access my passwords?​

Contributing​

How can I contribute?​

Where can I get help?​