Friday, August 7, 2009

Preprocessing Text

Extracting information from handwritten texts is one of the problems in image processing. Several steps and methods are usually used in order to get the desired information like the removal of embedded lines and text separation. In this activity, previous knowledge regarding image processing techniques are applied to recognize handwriting from a given text document.

Step by step procedure
  • Download the text document and crop a certain portion which shows texts together with the horizontal lines. (See Figure 1)
  • Make a filter mask by taking the Fourier transform of the cropped image and modifying it using Gimp to remove the white vertical lines. This would eliminate the horizontal lines in the image. (See Figure 2). Filter out the image using the idea of correlation.
  • Binarize the resulting image and apply the different morphological operations to separate the letters of the texts (SeeFigure 3)
  • Using bwlabel, label out the enclosed surfaces. Supposedly, each letter must have its own label. However, for this case, this was not followed. The texts in the cropped image are written in cursive style which makes the letter separation harder.


Figure 1. Original image and a cropped portion.


Figure 2. Filter mask used


Figure 3. Binarized text of the cropped image.


Figure 4. Bwlabel of the image.

The procedures above are also repeated for the next task which is detect where in the word "description" is found in the document. The first image in the figure below shows the word "description" which is drawn in Paint. The size of this image must be equal to the size of the document. It is then binarized, with a threshold value similar to the threshold value utilized in the first part of the activity. Using the correlation function, the recognition of the word "description" is achieved. The red enclosures in the last image below show the portions where the word "description" appears. For further enhancement, it is recommended that the word "description" in the first colum be placed exactly in the center.


In this activitty, I am giving myself a grade of 9 because I did not get the most desirable results.
I would like to thank Winsome Rara and Thirdy Buno for giving useful suggestions in the activity.

Figure 5. Output image after applying correlation function to recognized the word "description" from the text document.


No comments:

Post a Comment