Skip to main content

OCR Testing

Automated testing on mobile native apps and desktop sites can be particularly challenging when dealing with elements that lack unique identifiers. Standard WebdriverIO selectors may not always help you. Enter the world of the @wdio/ocr-service, a powerful service that leverages OCR (Optical Character Recognition) to search, wait for, and interact with on-screen elements based on their visible text.

The following custom commands will be provided and added to the browser/driver object so you will get the right toolset to do your job.

How does it work

This service will

  1. create a screenshot of your screen/device. (If needed you can provide a haystack, which can be an element or a rectangle object, to pinpoint a specific area. See the documentation for each command.)
  2. optimize the result for OCR by turning the screenshot into black/white with a high contrast screenshot (the high contrast is needed to prevent a lot of image background noise. This can be customized per command.)
  3. uses Optical Character Recognition from Tesseract.js/Tesseract to get all text from the screen and highlight all found text on an image. It can support several languages which can be found here.
  4. uses Fuzzy Logic from Fuse.js to find strings that are approximately equal to a given pattern (rather than exactly). This means for example that the search value Username can also find the text Usename or vice versa.
  5. Provide a cli wizzard (npx ocr-service) to validate your images and retrieve text through your terminal

An example of steps 1, 2 and 3 can be found in this image

Process steps

It works with ZERO system dependencies (besides what WebdriverIO uses), but if needed it can also work with a local installation from Tesseract which will reduce the execution time drastically! (See also the Test Execution Optimization on how to speed up your tests.)

Enthusiastic? Start using it today by following the Getting Started guide.

Important

There are a variety of reasons you might not get good quality output from Tesseract. One of the biggest reasons that could be related to your app and this module could be the fact that there is no proper color distinction between the text that needs to be found and the background. For example, white text on a dark background can easily be found, but light text on a white background or dark text on a dark background can hardly be found.

See also this page for more information from Tesseract.

Also don't forget to read the FAQ.

Welcome! How can I help?

WebdriverIO AI Copilot