Friday, August 7, 2009

Preprocessing Text

Extracting information from handwritten texts is one of the problems in image processing. Several steps and methods are usually used in order to get the desired information like the removal of embedded lines and text separation. In this activity, previous knowledge regarding image processing techniques are applied to recognize handwriting from a given text document.

Step by step procedure
  • Download the text document and crop a certain portion which shows texts together with the horizontal lines. (See Figure 1)
  • Make a filter mask by taking the Fourier transform of the cropped image and modifying it using Gimp to remove the white vertical lines. This would eliminate the horizontal lines in the image. (See Figure 2). Filter out the image using the idea of correlation.
  • Binarize the resulting image and apply the different morphological operations to separate the letters of the texts (SeeFigure 3)
  • Using bwlabel, label out the enclosed surfaces. Supposedly, each letter must have its own label. However, for this case, this was not followed. The texts in the cropped image are written in cursive style which makes the letter separation harder.


Figure 1. Original image and a cropped portion.


Figure 2. Filter mask used


Figure 3. Binarized text of the cropped image.


Figure 4. Bwlabel of the image.

The procedures above are also repeated for the next task which is detect where in the word "description" is found in the document. The first image in the figure below shows the word "description" which is drawn in Paint. The size of this image must be equal to the size of the document. It is then binarized, with a threshold value similar to the threshold value utilized in the first part of the activity. Using the correlation function, the recognition of the word "description" is achieved. The red enclosures in the last image below show the portions where the word "description" appears. For further enhancement, it is recommended that the word "description" in the first colum be placed exactly in the center.


In this activitty, I am giving myself a grade of 9 because I did not get the most desirable results.
I would like to thank Winsome Rara and Thirdy Buno for giving useful suggestions in the activity.

Figure 5. Output image after applying correlation function to recognized the word "description" from the text document.


Color Image Segmentation

From the past activities, the separation of the region of interest (ROI) from the background image has been done using binary operations. This was proven to be a useful segmentation tool.

However, there are instances wherein the graylevels of the images overlap with that of the background. As an alternative technique, colors have been used to segment the ROI from the background. In fact, colors are used to divide images of the skin regions in face and hand recognition, land cover in remote sensing, and cells in microscopy.

Considering images of 3D objects, it can be observed that at different pixel locations, there are colors with different shades present within the object. To some extent, these shadows are seen as differing brightness levels of the same color. Thus, the color space can be represented by the parameters that can separate the brightness and chromaticity information. This color space is referred as normalized chromaticity coordinates (NCC).

NCC is expressed as the ratio of the individual RGB values and the sum of the RGB of the images. Mathematically, it is defined as:

where R, G, and B are the red, green, and blue color values of the image.

From the equations above, we can say that the values of r, g, and b are only between 0 and 1. Also, we do not need to derive b from the RGB since it can be obtained from r and g. This means that we can compress the 3D object image into its 2D form which makes the processing simpler. The figure below shows the r-g color space.

The different techniques that make use of the colors to segment images are the parametric and non-parametric methods. In parametric technique, the probability that a pixel belongs to a color distribution of interest is determined. This is done through the following steps:

  • From the 3D object, crop a ROI and determine its r and g values (see the equation above).
  • Calculate the mean μr μg and standard deviations σr σg.
  • Perform the equation below. Take note that the r in the equation is the r value of the image. This probability function is used to tag a pixel value whether it belongs to the ROI or not.
  • Execute the same equation for g of the image.
  • Multiply the probability p(r) and p(g).

In non-parametric technique, the 2D color histogram of the ROI is used to tag the membership of the pixels. Note that the histogram, when normalized, is equal to the probability distribution function of the color. To segment the image using non-parametric technique, the succeeding steps are followed:

  • From the 3D object, crop the ROI and determine its r and g values.
  • Get the 2D chromaticity histogram of the ROI. The sample code for 2D histogram is provided in the manual
  • Segment the images using backprojection. This is done by first determining the r and g values of the image and finding the position of each pixel in the histogram. After that, get the value of the histogram and use this to replace the pixel value of the image.

The following figures show the segmentation of different images using parametric and non-parametric techniques.


Figure 2. Segmentation of the image using parametric (2nd column) and non-parametric (3rd column) technique.

Figure 3. Segmentation of the images considering different color patch for parametric (2nd column) and non-parametric (3rd) column.


The results show that the segmentation of the images depend basically on the patch color that we choose and the number of bins of the histogram. The bright colors seen in the reconstruction of the images correspond to the pixels having close values with the pixel of the patch. As observed from the two technique, the segmentation is better if non-parametric technique is used. Proper choice of the number of bins yields good segmentation.


In this activity, I am giving myself a grade of 10 since the segmentation using the two techniques has been employed and yield desirable results.


I would like to thank Jica monsanto and Winsome Rara for helping me debug the code.


References:

1. Applied 186 Activity 12 Handout

2. http://blog.howdesign.com/content/binary/fwy.jpg

3. http://images.google.com.ph/imgres?imgurl=http://www.archimuse.com/mw2003/papers/rowe/rowefig1.jpg&imgrefurl=http://www.archimuse.com/mw2003/papers/rowe/rowe.html&usg=__mD2GEzAUoSQ5kK0tr5HFl4xoXpk=&h=288&w=322&sz=43&hl=tl&start=50&sig2=l-1KWXaLREY5nWbYBlEkXA&um=1&tbnid=VsWVB6-IgKvt4M:&tbnh=106&tbnw=118&prev=/images%3Fq%3D3D%2Bobject%2B%252B%2Bball%26ndsp%3D18%26hl%3Dtl%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DN%26start%3D36%26um%3D1&ei=mXZ8SuSaGYqYkQXCydyNAw


Thursday, August 6, 2009

Binary Operations

It is always essential in image processing to specify the region of interest and separate it from the background. The ROI is the selected subset of samples within a given dataset [1]. It is often detached from the background through various techniques. One of these is through the use binary operations which necessitate proper thresholding. However, in the case of an image with several ROI, further processing is still needed in order to account for the overlap in the graylevel distribution between the ROI and the background. This is done by applying the different morphological operations onto the image. Practical uses of all these techniques are area estimation of cells and tracking fingerprint ridges.

In this activity, all the idea and knowledge learned in image processing are integrated to give a best estimate of the area of cells.

The figure below shows the image of scattered punched papers which resemble the cells found in a glass slide. The area estimation is done through the following procedures:

· The image is cut into 13 256 x 256 pixel overlapping subimages.

· Plot the histogram of the subimages and give the best approximation of the threshold value. For this case, the threshold is found ~0.8. This will enable the separation of the ROI from the background.

· Binarize the subimages using “im2bw” command.

· The ROI is enhanced using morphological operations. Specifically, the “opening” operator is used since it is defined as the dilation of the erosion of the images. This will remove the holes inside the cells, clean the isolated pixels and detach nearly touching cells.

· Using “bwlabel” command of Scilab, enclosed contiguous blobs on the binary images are labeled.

· Calculate the area of each blob.



Figure 1. Cropped images of the scattered punched papers.


Figure 2. Enhancement of the images using the "opening" morphological operation.
By using the "opening " operator, the cells are cleaned up in such a way that holes from the enclosures are removed and partially connected blobs are separated. Opening operator is chosen over the closing operator because it dilates the eroded cells. So if there are small interconnections between the cells, the bond is broken and the cells are detached from each other.
Note that this operator is used only after binarizing the images. Proper threshold value was chosen from the histogram of each subimage. An example of the histogram is shown below.

Figure 3. Histogram of the first subimage (uppermost left image from Fig.2)


Figure 4. The corresponding images after using "bwlabel" command.

Once the blobs for each subimages are distinguished, the "bwlabel" command labels each contiguous blob. The image in Figure 3 shows the variation in the graylevel of each blob. It is recommended to use true colors in showing the images in order to clearly see how this command works.

Figure 5. Histogram of the cell area measurements

The histogram plot shown above shows that most of the cell areas for all subimages are in the range of 440-540. The average area of the whole image based from the subimage is ~491 with a standard deviation of 41.

In this activity, I give myself a 10 for meeting the objectives. I would like to thank Winsome Rara for giving useful comments.

Color Image Processing


In image processing, the attainment of a well-adjusted combination of colors is very important. The quality of the ensemble of colors that are imaged from the detector depends on generation of the RGB colors which is a product of the spectral power distribution of the illuminant S(λ), reflectance of the surface ρ(λ) and the spectral sensitivity of the camera to red ηR, green ηG, and blue ηB color. Mathematically, this is expressed as:


where

The constant K is the white balancing constant. By normalizing this constant, the pixel values of the image are said to be “white balanced” to the light source [1].

White balance (WB) is the process of removing unrealistic color casts, so that the color of an object is retained when it is photographed. On of the factors that affect white balancing is the "color temperature" of a light source, which refers to the relative warmth or coolness of white light [2]. Nowadays, digital cameras have a setting called “auto white balance (AWB)”. This function of the camera allows it to adjust its white balance depending on the temperature of the illuminant. However, the AWB does not always give the type of image that you want to have. AWB does not provide maximum color accuracy and results to a problem when the color of the light is an integral part of the image [3]. This is why there are other white balancing options in cameras, such as incandescent, fluorescent, daylight, and cloudy.

Correct white balance can be obtained using two algorithms, the White Patch Algorithm (WPA) and the Gray World Algorithm (GWA). In WPA, an unbalanced camera image is captured and the RGB of the known white object is used as the divider. On the other hand, GWA assumes that the average color of the world is gray. Thus, by knowing the RGB of a gray object, it remains to be the RGB of the white until a constant factor. The balancing constants are obtained by averaging the red, green and blue values and utilizing them as the balancing constants [1].

For this activity, the two algorithms are applied to correct the white balance of the images taken under incandescent, fluorescent, daylight and cloudy illumination conditions.

The following images below are incorrectly white balanced. The settings of the camera phone are incandescent, fluorescent, daylight and cloudy.

After using WPA and GWA, the following images are obtained.



Figure 2. Processing unbalanced images using White patch and Gray world algorithm.

In general, it is highly observed that the white balance of the original images has been corrected. The intensities of the colors have been adjusted such that higher image quality is obtained. However, from the four different white balance settings, the illumination under a cloudy condition is white balanced in the poorest quality. Although the white mat which appears slightly bluish in the original image has been reconstructed, it is still darker and the true white color is not obtained. This could be attributed to the choice of the “white patch” and the normalizing constant. It is also observed that the reconstructed images using GWA yields darker than the WPA. This maybe enhanced by changing the normalizing constants used.

The next image shows objects with the same hues taken under incandescent setting.


Figure 2. Processing unbalanced image of objects having same hues.

Again, it can be seen that the image is enhanced such that the high intensity of blue light dominates the picture is lessened.

Upon doing the reconstruction of the images using WPA, I had some difficulty in choosing the correct white patch in order yield enhanced images. This affects the total image quality because the white patch is used as the divider of the RGB colors. Based from this, I would say that GWA is better than WPA because using this algorithm, the quality of the reconstructed image depends on the average of the R, G, and B. The values can be modify such that if the quotient of the image layer over the RGB averages is greater than 1, we can set this value to 1 (For attainment of normalized values).

For this activity, I give myself a grade of 10 since all the objectives are met. I would like to thank Jica Monsanto for capturing some of the images while I arrange the object to be photographed and Mr. Combinido for commenting on my code. Thanks guys!



References:

1. Color Image Processing, Applied Physics 186 Handout.

2. http://www.cambridgeincolour.com/tutorials/white-balance.htm

3. http://www.ronbigelow.com/articles/white/white_balance.htm


Morphological Operation


Align Center

In image processing, it is very important to get a full detail of the image. In some cases such as normalization of images and eye feature extraction, morphological operations are very essential [1]. Morphology refers to the form, structure, shape, size, texture, and phase distribution of objects and images [2].

Since an image could be defined as an amplitude function consisting of a collection of either continuous or discrete coordinates [3], it contains pixels that are needed to be distinguished from the background. These pixels which occupy the region of interest can be separated using different morphological operations such as erosion and dilation.

Dilation of an A by B is denoted as:


which involves all z’s which are translations of reflected B, that when intersected with A is not the empty set. B is called the structuring element. In general, dilation causes an object to expand or elongate following the shape of the structuring element.

On the other hand, erosion is defined as:

which involved all z’s such that B translated by z is contained in A. The general effect of erosion is to shrink or reduce the image by the shape of the structuring element B [4].

In this activity, dilation and erosion are performed to different binary images, namely a square (50×50) , a triangle (base = 50 , height = 30), a circle (radius 25), a hollow square (60×60, edges are 4 pixels thick), and a plus sign (8 pixels thick and 50 pixels long for each line).

The different structuring elements used are shown below.



The succeeding figures shows what happen after eroding and dilating the images with the structure elements.




Figure 1. Dilation (first row) and erosion (second row) of the circle. The first image of each row shows the original image.



Figure 2. Dilation (first row) and erosion (second row) of the square. The first image of each row shows the original image.



Figure 3. Dilation (first row) and erosion (second row) of the triangle. The first image of each row shows the original image.



Figure 4. Dilation (first row) and erosion (second row) of the hollow square. The first image of each row shows the original image.



Figure 5. Dilation (first row) and erosion (second row) of the cross. The first image of each row shows the original image.

It can be observed from the figures above that the statement regarding the general effect of dilation and erosion has been satisfied. For the case of dilation, the image expands depending on the structuring element used. The dilation using 4 x 4 square matrix of one as the structuring element results to the uniform expansion of the image (white color) in all sides. However, using a 2 x 4 and 4 x 2 ones yields to the enlargement of the image in the vertical and horizontal sides, respectively. Finally, using the cross as the structuring element results to the expansion of the image in all sides except at the corners.

For the case of erosion, the same thing happen, only that instead of expansion or the addition of pixels, shrinking or elimination of some pixels in the sides of the images occurs. Taking a look at the effect of erosion on the hollow square, it is noticed that the 2 x 4 matrix structuring element reduces the number of pixels in the left and right side of the image, while the 4 x 2 structuring element shrinks the top and bottom side. Also, it is interesting to note that only the four pixels at the corners of the hollow square remains after eroding it to the cross structuring element.

My predictions for the new images yielded after applying erosion and dilation using the four structuring elements above are correct only for the part of the dilation of the images using the first three structuring elements. I had a hard time imagining for the output of the dilation and erosion of the images using the cross as the structuring element.

After exploring and applying the "skel" function from Scilab in the five images, I arrived at the following outputs below.


Figure 6. Resulting images after applying "skel" function.

For this activity, I give myself a grade of 9 because I failed to predict the dilation and erosion of the the images using the cross as the structuring elements.

References:
1. http://www.ph.tn.tudelft.nl/Courses/FIP/noframes/fip-Techniqu.html
2. http://en.wikipedia.org/wiki/Morphology
3.
http://inperc.com/wiki/index.php?title=Special:Whatlinkshere/Dilation_and_erosion
4. “Morphological operations”, Applied Physics 186 Activity 8 Manual.