Friday, October 9, 2009

Activity 17 Photometric Stereo

As from our previous activities, we know that the brightness of an object highly depends on the amount of light it receives from the source and the reflectance of the object's surface.

Consider the image shown above. The variable P resembles the point at the surface, p(P) is the reflectance, S(P) is the vector from P to the source, n(P) is the normal vector at point P, and r(P) is the distance from P to the source. For a point source, the brightness decreases to 1/r^2 and this proportionality is more clearly seen for a nearby point source, wherein the brightness is given by:

If the light source is at infinity, the light wave approaching the object appears similar to a plane wave, and the brightness becomes:

where S(P) becomes equal for all direction. Finally, for a line source, the brightness decreases proportional to 1/r.

Given a light source shone to an object at different positions, we can reconstruct the 3D shape of the surface of the object. The technique employed for this problem is Photometric Stereo. In this method, we capture multiple images of the surface at different locations and use these images to generate the vectors normal to the surface.


In this activity, we reconstruct the 3D shape of an object which is illuminated by a source from a very far location. We assume that the brightness of the object is proportional to the intensity I captured by the camera at different positions V which are represented by the different images and matrix shown below.



We utilize the I and V values to the equations:


where g corresponds to the product of the reflectance and the normal vector. We solve for this variable in order to determine the surface normal vector n. The next two equation are used for this step.


Since the surface normals are related to the partial derivative of the elevation of the surface f, we can integrate the derivative to finally compute the surface elevation. The equation given below represents the elevation at any point (u,v) .

Following the steps above, f(u,v) was calculated and plotted to arrive at the 3D shape of the object displayed below.


The main objective of this activity which is the reconstruction of the 3D plot of the object was fulfilled so I am giving myself a grade of 10.


Reference:
"Photometric Stereo",Activity 17 Manual in AP 186.

Wednesday, September 23, 2009

Activity 16 Neural Networks

In this activity, pattern recognition is employed by using neural networks...

Neural network is a computational model that mimics the behavior of the neurons in the brain [1]. It is more advantageous than linear discriminant analysis because it also considers nonlinear systems. It is able to model nonlinear functions containing large variables which make it useful to several applications such as detection of medical phenomena, stock market prediction, credit assignment, monitoring the condition of machinery, and engine machinery [2].

To implement neural network model, I considered 10 sample objects consisting of blue and red dinosaurs as shown in Figure 1. Four feature vectors, namely the red chromaticity value, green chromaticity value, blue chromaticity value, and area were utilized for classification. The area values of the objects were normalized with respect to the highest value so the feature vectors can be correlated with one another. The first five rows of each column correspond to the blue dinosaurs while the last five rows describe the feature vectors of the red dinosaurs. The matrix below shows the arrangement of the training set and the feature vectors. The first, second, third, and fourth columns are designated to red chromaticity value, green chromaticity value, blue chromaticity value, and normalized area, respectively.


Figure 1. Image used for neural network

For this case, I chose the red dinosaurs as my desired class, so the desired output is written such that the class of blue dinosaurs is set to 0 and red dinosaurs to 1. By adopting the code of Mr. Jeric Tugaff for neural network posted in Mr. Cole’s blog [3], the results shown in Figure 2 and 3 were obtained. The number of iterations (T) and learning rate (LR) were varied and their effects to the accuracy of recognition were observed. The values for T were changed from 100 to 500 with an increment of 100, and it was found that more accurate pattern recognition is acquired for higher T, that is, blue-colored dinosaurs resulted to values nearest to 0 and red dinosaurs nearest to 1. Thus, T=500 yield the most accurate pattern recognition ability from the range chosen, and it was utilized for determining the effect of LR to the accuracy of recognition (see Figure 3). LR was varied from 0.1 to 2 and it can be observed that more accurate pattern recognition is attained as the LR is increased. From the range of LR values, LR=2 yielded the highest accuracy for pattern recognition by having values of red dinosaurs closest to 1 and blue dinosaurs closest to 0. However, it cannot be concluded that T=500 and LR=2 are the optimized parameters for acquiring the most accurate pattern recognition ability since we can still extend the range and seek for more combination of LR and T.


Figure 2. Results obtained for different T


Figure 3. Results obtained for different LR

Since I was able to meet the objective of the activity, I give myself a grade of 10. I would like to acknowledge Master and Orly for helping me improve my results.

References:

1. "Neural Networks", Activity 16 Manual.

2. "Neural Networks", Statsoft Inc., http://www.statsoft.com/textbook/stneunet.html#apps.

3. http://cole-ap186.blogspot.com/



Saturday, September 12, 2009

Activity 15 Probabilistic Classification



Pattern recognition was performed in the previous activity using the Euclidean distance. Now, we are going to employ Linear Discriminant Analysis (LDA) for classifying members of a certain group. The image from the previous activity are reused for this purpose and is shown in Figure 1. LDA shows that groups can be separated by a linear combination of features that are used to describe the members.

Figure1. Image used for pattern recognition using LDA

Using the object features, the members are assigned to the group with highest conditional probability. This is known as Baye's rule, which helps minimize the total error of classification (TEC). Mathematically, it is expressed as the object belong to group i where

Thus, the probability that the member belongs to group i given a set of data x is determined.

To implement LDA, we form a matrix x consisting of rows of objects and columns of features. For this particular activity, I used the same objects (blue and red dinaosaurs) and same features as the previous activity, that is, r_values and area. From this matrix, the two classes of objects are separated (x1 and x2) with their corresponding features. To visualize this, refer to the equation below.


From x1 and x2, the mean features u1 and u1, and global mean vector are determined which are used to get the mean corrected data x10 and x2o. From these values, the pooled covariance matrix are calculated using the equation

where g is the number of groups ci(r,s) is the covariance of each group. Finally, the discriminant function is solved using the following formula:

The selection of an object to a particular group is by choosing the Fi of the class with higher value. Here, F1 correspond to the class of red dinosaurs while F2 to the blue dinosaurs. Table 1 shows the result after performing LDA.



Table 1. Results obtained after applying LDA
The percentage of correct prediction of the classification is 100%.
For this activity, I am able to apply LDA to classify objects belonging to a certain class. Since I met the objective, I give myself a grade of 10.

References:
http://people.revoledu.com/kardi/tutorial/LDA/
http://www.craftkitsandsupplies.com/images/Foam_Shapes/Foam_Dinosaur_Shapes.jpg

Activity 14 Pattern Recognition

In image processing, it is important to distinguish sets of classes which share common features such as size, shape, color, and others. These sets of features are known as pattern and the process of determining the set of features that will enable the separation of a set into classes in called pattern recognition. In this activity, our task is to apply pattern recognition in images. Careful choice of the class features is important in this activity since it helps simplify our problem.

Mathematically, patterns maybe arranged into ordered sets such as feature vectors. If we let ωj, where j=1, 2, 3, …, W be a set of classes and W the total number of classes, we can define a representative of class ωj such that its mean feature vector is:

where xj is the set of all feature vectors in class ωj and Nj is the number of sampls in ωj. The class membership is determined by classifying an unknown feature vector x to the class whose mean is nearest to it. The Euclidean distance is used for this case and is equal to:

where is the ||y|| = (yTy)1/2 Euclidean norm [1].

The image [2] chosen for this activity is shown in Figure 1. As seen from the figure, the members of the entire class are dinosaurs of different types. From these dinosaurs, two classes are chosen and are shown in Figure 2. The figure consists of 5 blue dinosaurs and 5 red dinosaurs. The first three pictures of each row served as the training sets while the last two are the test objects. The classes are differentiated from each other in terms of color and area. For the first feature in our pattern recognition, the RGB values of each training sets are determined. Then, the normalized chromaticity for red (denoted as r_value) and the mean r_value for the two classes are calculated. For the second feature, i.e. area, the training objects are first converted to binary images (with proper thresholding). Using the code for area calculation in activity 2, the area of each training object and the mean area for each class are obtained. Table 1 shows a summary of the r_value and area for each training object.


Figure 1. Image used for pattern recognition [2]

Figure 2. Selected classes from image



Table 1. r_value and area of the training set.

To clearly see the separation between the two classes using the features selected, the r_values for differentarea of each training object are plotted and is shown in Figure 3. Notice that the separation between the classes is large enough to classify the objects. This signifies that the chosen features are reliable for pattern recognition.


Figure 3. Plot of r_values versus area of the training objects

For the test objects, the r_value and area are also calculated. Using Euclidean distance, test objects are classified as to what class they belong. The computed values and the classifications are shown in Table 2 and the plot of r_values vs. area is shown in Figure 4.

Table 2. Tabulated r_value and area of the test objects and their classifications

Figure 4. Plot of r_values vs. area of the test objects

From this activity, I was able to perform pattern recognition by supplying the right feature of each class and predicting the class from which the test objects belong. I would like to thank Kaye Vergel for that good discussion.

Since I am able to meet the objective of the activity, I give myself a grade of 10.



References:

1. Activity manual for Pattern recognition

2. http://www.craftkitsandsupplies.com/images/Foam_Shapes/Foam_Dinosaur_Shapes.jpg


Friday, August 7, 2009

Preprocessing Text

Extracting information from handwritten texts is one of the problems in image processing. Several steps and methods are usually used in order to get the desired information like the removal of embedded lines and text separation. In this activity, previous knowledge regarding image processing techniques are applied to recognize handwriting from a given text document.

Step by step procedure
  • Download the text document and crop a certain portion which shows texts together with the horizontal lines. (See Figure 1)
  • Make a filter mask by taking the Fourier transform of the cropped image and modifying it using Gimp to remove the white vertical lines. This would eliminate the horizontal lines in the image. (See Figure 2). Filter out the image using the idea of correlation.
  • Binarize the resulting image and apply the different morphological operations to separate the letters of the texts (SeeFigure 3)
  • Using bwlabel, label out the enclosed surfaces. Supposedly, each letter must have its own label. However, for this case, this was not followed. The texts in the cropped image are written in cursive style which makes the letter separation harder.


Figure 1. Original image and a cropped portion.


Figure 2. Filter mask used


Figure 3. Binarized text of the cropped image.


Figure 4. Bwlabel of the image.

The procedures above are also repeated for the next task which is detect where in the word "description" is found in the document. The first image in the figure below shows the word "description" which is drawn in Paint. The size of this image must be equal to the size of the document. It is then binarized, with a threshold value similar to the threshold value utilized in the first part of the activity. Using the correlation function, the recognition of the word "description" is achieved. The red enclosures in the last image below show the portions where the word "description" appears. For further enhancement, it is recommended that the word "description" in the first colum be placed exactly in the center.


In this activitty, I am giving myself a grade of 9 because I did not get the most desirable results.
I would like to thank Winsome Rara and Thirdy Buno for giving useful suggestions in the activity.

Figure 5. Output image after applying correlation function to recognized the word "description" from the text document.


Color Image Segmentation

From the past activities, the separation of the region of interest (ROI) from the background image has been done using binary operations. This was proven to be a useful segmentation tool.

However, there are instances wherein the graylevels of the images overlap with that of the background. As an alternative technique, colors have been used to segment the ROI from the background. In fact, colors are used to divide images of the skin regions in face and hand recognition, land cover in remote sensing, and cells in microscopy.

Considering images of 3D objects, it can be observed that at different pixel locations, there are colors with different shades present within the object. To some extent, these shadows are seen as differing brightness levels of the same color. Thus, the color space can be represented by the parameters that can separate the brightness and chromaticity information. This color space is referred as normalized chromaticity coordinates (NCC).

NCC is expressed as the ratio of the individual RGB values and the sum of the RGB of the images. Mathematically, it is defined as:

where R, G, and B are the red, green, and blue color values of the image.

From the equations above, we can say that the values of r, g, and b are only between 0 and 1. Also, we do not need to derive b from the RGB since it can be obtained from r and g. This means that we can compress the 3D object image into its 2D form which makes the processing simpler. The figure below shows the r-g color space.

The different techniques that make use of the colors to segment images are the parametric and non-parametric methods. In parametric technique, the probability that a pixel belongs to a color distribution of interest is determined. This is done through the following steps:

  • From the 3D object, crop a ROI and determine its r and g values (see the equation above).
  • Calculate the mean μr μg and standard deviations σr σg.
  • Perform the equation below. Take note that the r in the equation is the r value of the image. This probability function is used to tag a pixel value whether it belongs to the ROI or not.
  • Execute the same equation for g of the image.
  • Multiply the probability p(r) and p(g).

In non-parametric technique, the 2D color histogram of the ROI is used to tag the membership of the pixels. Note that the histogram, when normalized, is equal to the probability distribution function of the color. To segment the image using non-parametric technique, the succeeding steps are followed:

  • From the 3D object, crop the ROI and determine its r and g values.
  • Get the 2D chromaticity histogram of the ROI. The sample code for 2D histogram is provided in the manual
  • Segment the images using backprojection. This is done by first determining the r and g values of the image and finding the position of each pixel in the histogram. After that, get the value of the histogram and use this to replace the pixel value of the image.

The following figures show the segmentation of different images using parametric and non-parametric techniques.


Figure 2. Segmentation of the image using parametric (2nd column) and non-parametric (3rd column) technique.

Figure 3. Segmentation of the images considering different color patch for parametric (2nd column) and non-parametric (3rd) column.


The results show that the segmentation of the images depend basically on the patch color that we choose and the number of bins of the histogram. The bright colors seen in the reconstruction of the images correspond to the pixels having close values with the pixel of the patch. As observed from the two technique, the segmentation is better if non-parametric technique is used. Proper choice of the number of bins yields good segmentation.


In this activity, I am giving myself a grade of 10 since the segmentation using the two techniques has been employed and yield desirable results.


I would like to thank Jica monsanto and Winsome Rara for helping me debug the code.


References:

1. Applied 186 Activity 12 Handout

2. http://blog.howdesign.com/content/binary/fwy.jpg

3. http://images.google.com.ph/imgres?imgurl=http://www.archimuse.com/mw2003/papers/rowe/rowefig1.jpg&imgrefurl=http://www.archimuse.com/mw2003/papers/rowe/rowe.html&usg=__mD2GEzAUoSQ5kK0tr5HFl4xoXpk=&h=288&w=322&sz=43&hl=tl&start=50&sig2=l-1KWXaLREY5nWbYBlEkXA&um=1&tbnid=VsWVB6-IgKvt4M:&tbnh=106&tbnw=118&prev=/images%3Fq%3D3D%2Bobject%2B%252B%2Bball%26ndsp%3D18%26hl%3Dtl%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DN%26start%3D36%26um%3D1&ei=mXZ8SuSaGYqYkQXCydyNAw