The 7 Steps of OCR process


1. Fuzzy judgment

The main task of fuzzy judgment is to determinate if the imported image is clear or blurry.
This is a 2 steps process:

  1. Divide the image into foreground text area and background area.
  2. Calculate the blur parameter of foreground text and define if the image is too blurry or usable.

2. Border cutting

The main task of border cutting is to find the position of image foreground (by detecting the borders using Line Segment Detector) and remove the background part.

3. Perspective transformation

The main task of the perspective transformation is to correct the distortion or tilt of the foreground part.
This is a 2 steps process:

  1. Calculate the position of the 4 corners of the document in the image, by detecting its border lines.
  2. Calculate the transformation parameters to correct the image and apply the correction to the image.

4. Document type detection

The main task of type detection is matching the imported image with a template image, then return the most similar template type.
This is a 2 steps process:

  1. Extract the features of “template image” (the features can be the title location, etc.) which will define the template.
  2. Extract the features of imported image and compare them with the previously mentioned template, then return the matching type.

5. Template matching

Template matching is mainly to extract the features of the template and match the feature descriptors that correspond to the features.
This is a 4 steps process:

  1. Extract the features which are invariant to template scale and rotation.
  2. Extract the feature descriptors from invariant features.
  3. Filter out the correct features.
  4. Calculate the position of the text boxes and extract them.

6. Binary conversion

The main task of binary conversion is to convert the blocks of images to black-white.
This part of the process is critical. If the conversion is not correct, the impact in character recognition and matching is huge.
The main process is to find the information of each text region, then to convert each text region to black and white.

7. Layout analysis

The main task of layout analysis is the segmentation of blocks from the image. These blocks are then categorized into text blocks, line blocks or graphic blocks.
This is a 3 steps process:

  1. Analyze the distributions for geometric structure of different layout, then distinguish the different blocks
  2. Analyze and categorize the characteristics and attributes of the different blocks, then define them as text block, line block and graphic block.
  3. For text block, starting from the pixel level of the image, combine these pixels into characters, combine these characters into words, and then combine words into text line.