Install Tesseract On Windows 10

Install Tesseract On Windows 10
Tesseract Exe
Install Tesseract On Windows 10 32-bit
Tesseract Download For Windows 10

Install OpenCV with Tesseract on Windows. This guide will take you through the very easy installation steps for OpenCV with Tesseract on Windows.

For Windows: Installer for Windows for Tesseract 3.05 and Tesseract 4 are available from Tesseract at UB Mannheim. These include the training tools. Both 32-bit and 64-bit installers are available. An installer for the OLD version 3.02 is available for Windows from our download page. This includes the English training data.
Windows installer of tesseract-ocr 3.02.02. Follow the installation steps and check the option Tesseract development files: Building. After finishing the installation, find the Visual Studio project folder: Here are all relevant libraries that needed to be linked when building the OCR library.
Tesseract is an optical character recognition software which developed by Google. Its an open source OCR tool. There are many versions of tesseract but we will use the 4.0 version.

'TesseractNotFoundError : tesseract is not installed or it's not in your path'

In order to execute Tesseract on Windows 7, we have to follow below steps in below sequences:

1. Install Python in C folder (Custom option) under Python3.6 Folder.
Latest version has some issues and later create issues, preferred one as on date of this post would be Python 3.6 version

Install Tesseract On Windows 10

Adding Python and PythonPath to the Windows environment:

Open Explorer.
Right-click 'Computer' in the Navigation Tree Panel on the left.
Select 'Properties' at the bottom of the Context Menu.
Select 'Advanced system settings'
Click 'Environment Variables...' in the Advanced Tab

PY_HOME

%PY_HOME%Lib;%PY_HOME%DLLs;%PY_HOME%Liblib-tk;C:another-library

Append

Tesseract Exe

Install Tesseract On Windows 10 32-bit

2. Microsoft Visual C++ 14.0 is required

Go to Build Tools for Visual Studio 2017
Select free download under Visual Studio Community 2017. This will download the installer. Run the installer.
a. Under Windows, there are 3 choices. Only check Desktop development with C++
b. Under Web & Cloud, there are 7 choices. Only check Python development ( This is optional)

3. Install 'Microsoft Visual Studio 14.0' --- visualcppbuildtools_full

Microsoft link to download, Link

4. Install 'BuildTools_Full' from following Link

5. Install Tesseract-OCR from following link 64bit link 32 bit

Post that add new environment variable TESSDATA_PREFIX --> C:Program Files(x86)Tesseract-OCR
Also update environment variable PATH with, C:Program Files (x86)Tesseract-OCR
Ideally this should work, else you could also add a new environment variable 'tesseract' with value of 'C:Program Files (x86)Tesseract-OCRtesseract.exe'

Sometimes, in order to execute Tesseract on Windows 10, we have to follow above steps and then do following:

Find script file pytesseract.py from C:Python36Libsite-packagepytesseract and Open it.

Change the following code
from: tesseract_cmd= 'tesseract'
to: tesseract_cmd='D:Program Files (x86)Tesseract-OCRtesseract.exe'

Tesseract Download For Windows 10

Simple OCR Guide: Installing and Using Tesseract In Python Code (Ubuntu)

3/19/2018

Introduction: OCR

There are times when there's text written inside of image files that we want to extract. Can we do that, programmatically? The answer is yes, that's what OCR is.
It's simple enough to OCR an image using the command line in Ubuntu, but we also want to be able to use OCR in programs. Python is a good language for using OCR, and Tesseract is the OCR tool we'll be using.

OCR From the Command Line: Install Tesseract

Let's install Tesseract so that we can use it in our command line. In Ubuntu, it's really simple.

To test it, download the following image on your computer.

(Right click and save the image.)
Then in a terminal (inside the directory your picture was downloaded too, with the correct image name), use Tesseract on the image with the following command:

For me the output is:
Hello World.
Using Eggﬁggggplg OCR.
From gggmgxg.

Why did it get the words Tesseract and srcmake incorrect? Notice the squiggly red lines under the words, in the picture. Often, 'noise' in images makes OCR imperfect. That's why cleaning images up is important, before using OCR on them. For this reason, it's often important to be able to use OCR in a program, and not just the command line.
Let's look at writing a python program that uses Tesseract, now.

Setup Python Project and Install Libraries

We can use Tesseract from the command line, but how about in Python? (Obviously, make sure that you have python installed. Also, you'll need tesseract installed, from the previous section.) (Also, shout out to nikhilkumarsingh on github for providing this really easy install/code guide.)
Use the following commands to install the python tesseract library, pillow (for processing images in python). We'll also install imagemagick and wand now, for the sake of processing pdf files (and helping with image cleaning, later).

Our installation should work, so let's test it with some code.

Some Python OCR Code

We're going to make a simple python file to OCR an image. In the same folder that you have the test image you downloaded from before in, create a file named 'main.py'. In main.py, add the following code:

Of course, make sure the image name on line 4 is correct.
To run this code, in your terminal (which should be located in the directory with main.py and the ocr_orig.png file):

You should see the OCR output in your terminal.

Conclusion

We looked at how to OCR an image, both in the command line, and through python code. We chose Tesseract as our library, and we see that sometimes the results get skewed by noise in the image. It's best practice to try to make the text in an image clearer and to clean up anything unnecessary in an image, to make the OCR tool work better.
Going forward, try to look up more advanced image processing tricks to make the OCR work better.

Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels.
Twitter: @srcmake
Discord: srcmake#3644
Youtube: srcmake
Twitch: www.twitch.tv/srcmake
Github: srcmake

References
1. www.pyimagesearch.com/2017/07/03/installing-tesseract-for-ocr/
2. github.com/nikhilkumarsingh/tesseract-python

Comments are closed.