An Easy Way of Creating Text Localization and Detection in Tesseract OCR

How to Utilize Tesseract to Detect, Localize, and OCR Text

In this article, I will show you how to detect and localize text using the Tesseract. In case you don’t know, Tesseract is an optical character recognition engine for various operating systems. Currently Tesseract is running well on Windows, macOS, and Linux platforms.

If you are not familiar with tesseract, I suggest reading my previous article about Tesseract OCR. There are shown how to install and do a simple OCR.

Let’s get started. We will first enter the dependencies that we need. Here we need the Pytesseract and OpenCV library. If you have not installed OpenCV on your machine, please visit this page.

I will use a simple image to test the usage of the tesseract. I will use the image below.

Sample image

Let’s load this image and extract the data.

Different from what we did in the previous article, wherein the previous article we immediately changed the image into a string. In this article, we convert the image into a dictionary.

The following results are the contents of the dictionary.

{
'level': [1, 2, 3, 4, 5, 5, 5],
'page_num': [1, 1, 1, 1, 1, 1, 1],
'block_num': [0, 1, 1, 1, 1, 1, 1],
'par_num': [0, 0, 1, 1, 1, 1, 1],
'line_num': [0, 0, 0, 1, 1, 1, 1],
'word_num': [0, 0, 0, 0, 1, 2, 3],
'left': [0, 26, 26, 26, 26, 110, 216],
'top': [0, 63, 63, 63, 63, 63, 63],
'width': [300, 249, 249, 249, 77, 100, 59],
'height': [150, 25, 25, 25, 25, 19, 19],
'conf': ['-1', '-1', '-1', '-1', 97, 96, 96],
'text': ['', '', '', '', 'Testing', 'Tesseract', 'OCR']
}

I will not explain one by one the purpose of each value in the dictionary. Here we will use the left, top, width, and height to draw a bounding box around the text along with the text itself. In addition, we will need a conf key to determine the boundary of the detected text.

Now we will extract the bounding box coordinates of the text region from the current result.

Finally, we specify the confidence value that we want, here I will use the value conf = 70. So the code will be like this.

Now everything is set. Then we display the results using this code.

cv2.imshow(image)

And this is the result.

From the results above it can be seen that the results obtained are quite good. There are several parameters that you can change, such as confident value limits. Or if you find it unattractive, you can change the thickness or color of the bounding box or text.

If you like this post, I want to recommend an article that inspired me.

A Minimalist | AI/NLP Engineer | https://github.com/fahmisalman

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store