Final project, character recognition

16
Vinay Varghese Introduction: This report contains the work done to filter out the serial number from a South African bank note and to store it digitally as text. The steps required to do this will require basic knowledge of image processing as well as skills in working with Matlab and Python. All work in his report was done using Python 2.76 and Matlab 2012b. General remark: Most of the work needed to be done in this report can very easily be done using built in functions already present in Matlab. However, I have done as much as I could by using my own code and functions that were constructed manually.

description

Character recogntion using cross correlation in matlab

Transcript of Final project, character recognition

Page 1: Final project, character recognition

Vinay Varghese

Introduction:

This report contains the work done to filter out the serial number from a South African bank note and to store it digitally as text. The steps required to do this will require basic knowledge of image processing as well as skills in working with Matlab and Python. All work in his report was done using Python 2.76 and Matlab 2012b.

General remark:

Most of the work needed to be done in this report can very easily be done using built in functions already present in Matlab. However, I have done as much as I could by using my own code and functions that were constructed manually.

Page 2: Final project, character recognition

Cropping of the Image:

Figure1: A digital image of a fifty rand bank note along with its dimensions in pixels

Ideally, only the segment of the digital image containing the serial code should be inputted into the image to text converter that is to be made.

From looking at this example of a scanned note, one can see all the dimensions of critical sections.

The code to crop the note was made looking at the ratio of the required segment to the dimensions of the whole note. The note was cropped 10% of its width to the right, around 40% down of its total height and finally the note was cropped about 5% its total height from the bottom.

The code required to do this was implemented using python and is shown below.

Page 3: Final project, character recognition

The above code is the code used to manually crop out the segment with the serial code and to store it was an image named “croppedimage.jpg”.

Below is an image that was the output to the cropping code.

Figure2: The cropped segment containing the serial code

This image was then send through into the next function, the filter.

Filtering of the Image:

Although there are many built in functions available in matlab for image filtering, I decided to make my own filtering code in python. The two filters I managed to code were the median filter and the mean filter.

The purpose of the filter is to try removing the background noise in the image and to end up with just the serial code.

The two filters built will be discussed below:

1) The median Filter

The principle of the median filter is to run through the image pixels, replacing each pixel with the median of the neighbouring pixels. This will results in some of the noise disappearing in the image.

For my code for the median filter, I made a square matrix, and ran it through the entire image, while the matrix runs through the image the code will continuously be taking the median and placing it in the image. Two median filters were made, a 9x9 median filter and a 15x15 median filter. The filter was made in python by using for loops to run through the image. The code for just the 9x9 median filter is shown in the following page:

Page 4: Final project, character recognition

The output to this median filter when the input is the cropped note is shown below:

Figure3: The output of a 9x9 median filter

Figure4: The output of a 15x15 median filter

As can be seen above, the output of the median filters are not good enough. The 9x9 filter results in an image where the text has lost some of its quality and the 15x15 filter results in a unusable image.

Page 5: Final project, character recognition

2) The mean Filter

The mean filter works in a similar manner to the median filter described above. The mean filter was made using 9x9 matrices and 15x15 matrices. The filter goes through all the pixels in the image and replaces it with the average of the neighbouring pixels. For this reason it is safe to assume that the smaller the matrix used, the more defined the picture will be, but at the same time one must know that if the chosen matrix is too small in dimensions none of the noise in the background will be filtered out.

The code made to implement this mean filter was made in Python 2.76. It is shown below:

Page 6: Final project, character recognition

Below are the images from the mean filter:

Figure5: The output of the 15x15 mean filter

Figure6: The output of the 9x9 mean filter

Figure7: The output of a 3x3 mean filter.

As you can the 15x15 mean filter results in too much of the image being blurred out. The 9x9 image has text that can be recognized by eye but still doesn’t look too clear. And finally, the 3x3 filter results in a clear sharp image. The one downside to the 3x3 image is that some of the noise is still present in the background. So from here on out the coding of the image to text program will be done assuming that the output of the 3x3 mean filter is used.

Page 7: Final project, character recognition

Text recognition:

For the text recognition section, the coding will be done using Matlab 2012b. This is because Matlab is far more than Python and offers more image processing modules.

There are several methods that can be used for text recognition. For the purpose of detecting serial numbers on bank notes I have chosen the cross correlation method using a template. This was chosen because it seems like the most reasonable option considering my limited knowledge in image processing.

Creating a template:

The first step that needs to be done when detecting letters in this method is to make a template. The template will be the set of images that will be used to compare to when the program is trying to recognize text. In this case the template will be a series of images containing the possible characters present on a serial code. The possible characters on a South African bank note is “ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890”.

Note that the process of creating a template is shortened by the realization that only capital letters can be found on the notes.

Below are some examples of the images present in the template:

Figure8: Template image of the letter A

Figure9: Template image of the letter S

Figure9: Template image of the number 2

Page 8: Final project, character recognition

Cross correlation:

The code now needs to somehow match up the figures in the template folder with patterns in the segment containing the serial code. For this it is neccesary that the orientation of the figures in the template be in the same orientation as the image containing the serial code.

Cross correlation will compare the image in the template with the original image and check for matches. In other words the code will search for a segment of the image that will match up with a image in the templates folder.

The cross correlation function will be used in Matlab and it will produce a graph that will show to what extent the image in the templates folder is a match to the segment in the cropped and filtered bank note.

The code will do this cross correlation for all images in the templates folder to map the serial code. The images in the templates folder where made using the each letter from the already filtered bank note.

Below is the code used to cross correlate. It will be used just to detect the number 4 for now:

This code has the following outputs:

Figure10: Image showing the detected character “4”

Page 9: Final project, character recognition

Figure11: Graph showing correlation. Peaks represent the match to the template image used.

One can see from above that the ‘4’ has been detected in the image using the template of the character. The graph produced shows the amount of correlation. The location of the peak is related to the location of the character found in the filtered image. This fact is of significant use. The locations of peaks can be used to put the detected characters in order to compile the serial code in text. This will be done in a latter step.

This form of template matching needs to be done for every character ranging from A to Z and 0 to 9. In this way all characters present in the filtered image will be detected.

From Image to Text:

Now one of the final things that needs to be done is to convert the matches to the templates into text. From tedious work I have concluded that if the template matches the letter on the bank note, the correlation value was around 0.8 or more. So from this I made a function in matlab that looks for peaks above the threshold value of 0.8. The letter that corresponds to the template as well as the peaks x position is stored as variables in Matlab.

The code in the following page will look for the letter A and find its correlation value as well as its x position. The outputs are shown in the next page:

Page 10: Final project, character recognition

Figure12: Code that correlates the letter A from template and finds its x position

The variable above named ‘xbegin’ is very importent as it shows the letters position horizontally. By arranging all the letters from smallest to biggest the program will automatically arrange all detected values in order.

The above arranging of all “xbegin” values were done by placing them in an array then using a sort function to automatically place them in order.

There are some problems that could arise from the above text recognition code. Firstly, the main problem is if a character is repeated in the serial code. The problem of repeated characters is solved by the use of threshold detection. By using this, the code will look for all peaks above the threshold value, hence it will find all three “9” characters as is seen in the example code. The second problem is that the code will not find the space between the second last and last character of the serial code. This is not a problem that needs solving in my opinion and hence I have left that as it is.

Below is the array of values that represent X positions of the letters. The output was from matlab.

Page 11: Final project, character recognition

Figure13: Array with all x positions of letters from the template.

Now that the x values are acquire, the next step would be to arrange them from smallest to largest. This will the represent the order of the letter. Below I have shown the arranged values.

Figure14: Array representing x positions of detected letters in an arranged manner from left to right.

Now all that is needed to complete this task is to match up these x positions to the letters found from the template.

This is done by a simple code section in MatLab. The code will simply ask MatLab if the letters position corresponds to the first value of the array named “arrayofxvaluessorted”. If it is, then the first letter is that corresponding template image. If not, the code must try the next letter and so on. This is done using forloops and if statements.

Page 12: Final project, character recognition

Figure15: Code that compares if B is the first letter, if it is it stores it in the array.

The code above is made in conjunction with forloops in the actual code. If for example, the first letter isnt B, then it will test C, then D then so on. In this manner it will go through every possible character and place it in order as an array. In the end the array will just be converted into a string and displayed.

The final output of the example code is shown below:

Figure16: Final output of the program.

This is the end of the character recognition code. As one can see above, the text was correctly recognized and printed as a string. This is as far as I went because I am working through this project individually and not in a group. The code worked moderateley well for most of the note I tested. The code however, did not work for the image with the note in the wrong orientation. This problem could be easily solved in a number of ways. The simplest way being, if no text is recognized by Matlab, it is almost safe to assume that the note is in the wrong orientation. I will use the rotate function in the python section of my code to put the digital bank note back in the same orientation. By implementing that simple step, all note can be detected irrespective of its its positional orientaion.

For the sake of completion I have decided to try input my filtered image into a external image to text module in python. This module is named Tesseract.

Tesseract is a module that is available in Python and is easily implemented. I have just added this section so that I can compare the performance of Tesseract vs the performance of the code that I have made.

The output is shown in the following page:

Page 13: Final project, character recognition

Figure17: The output of my python code used in conjuction with Tesseract.

As one can see, the program converts the filtered image into text with zero error. This is a fantastic result and is quite clearly better than my image to text script that I made with matlab and python. The performance time is unfortunately a bit slow. This is most likely because the entire code is done in Python 2.76 and its performance speeds cannot be compared to that of Matlab.

Throughout this report, steps have been documented to show what was done to implement an image to text recognition program. Additionally, the alternate method of using an external module to do this task is also shown to make a comparison.

Comments:

This coding could have greatly been simplified by using the built in filters in Matlab instead of coding one manually in Python, however, for the sake of doing every step manually in the process of text recognition I have made my own filters. The performance of the matlab filters would have most likely been better than the python mean and median filters that were created.

Refferences

http://www.mathworks.com/help/images/ref/normxcorr2.html;jsessionid=6a0df9d726b90dba9dfef8ce3418