• Jaya Darshana Singam

OCR _ a meld of pattern recognition, AI and computer vision

Living in the age of evolving technology, the tempo of the routine-sustenance has whirled into a relay race and the time became a foremost essential which has to be planned crucially. In this never-ending marathon, often we are in search of new solutions to increase the efficiency and ease the work by providing the contact-less software, automate and streamline the workplace, capture few important notes and files, make the text images editable and extracting the required data from a given form.

Optical Character Recognition (OCR) is a technique providing solutions to revolutionize the captured images into a readily available editable form to simplify the task to overcome the extra spadework of re-typing the whole context.

Fig 1 : OCR Functioning [Source: Google Images]

In the process of converting the image or the scanned copy into a editable text, several steps are involved including the pattern recognition, computer vision as well as the Artificial Intelligence Techniques.

General Working of OCR . .

To classify the optical patterns of text or digits contained in a digital image. The character recognition is achieved through segmentation, feature extraction and classification. The below image gives an over-view on how the processing is done.

Fig 2: Process steps involved in OCR Function [2]

Step 1: Extracting the alphabets and numeric characters by removing the noise from the background (grey-scaling) and forming boundaries to the image provided.

Step 2: By feature extraction which includes row and boundary detection, normalization of pixels (scaling), and binarization of characters, the pattern is been recognized.

Step 3: By division of recognized binary characters into tracks and sectors, a trained Convolutional Neural Network (ConvNet) Model is formed.

Step 4: From the ConvNet, two-step verification is done using the trained model (stored characters).

Step 5: Output is generated with an editable text of the provided image text.

Fig 3: Detecting the Characters and conversion into editable text steps in practical way [Test Results - Own]

From the above images in the third one, the 7 refer the number of words written and the % refers to the recognized character ability of the whole written context. The output (editable output occured) is not accurate, as the model is still in the training phase of detecting the characters in various hand-written text. The number of characters written in the dialog box are accurately detecting. Which concludes that more training should be done to recognize the hand-written characters.

Applications . .

The major applications of OCR in various fields:

  1. Invoice Imaging

  2. Banking

  3. Online Capatcha

  4. Automatic Number Plate Recognition

  5. Digital Libraries

  6. Data entry for business documents including check, passport, invoice, bank statement and receipt

Advantages . .

The major advantages of OCR applications are:

  1. Extracting required information from images and pdf files

  2. Reduces time- consumption

  3. Increases the efficiency of the work-space

  4. Editable

  5. Keeping Back-ups

Future-Work . .

To train the model several more times with various inputs to increase the accuracy of the hand-written text The model built is using the Pytesseract , OpenCV and other few libraries in python. The next Blog will include the detailed procedure of a built web -application using Pytesseract.


[1] Google Images

[2] Faisal Mohammad, Jyoti Anarase, Milan Shingote, Pratik Ghanwat, "Optical Character Recognition Implementation Using Pattern Matching " , (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5 (2) , 2014, 2088-2090.