Face recognition and Detection by PCA Algorithm

Table of Contents1. Introduction.........................................................................................................................................2

2. Objectives............................................................................................................................................2

3. Tools/Environment Used.....................................................................................................................3

4. Analysis................................................................................................................................................4

5. Design..................................................................................................................................................7

5.1 Mathematical Background...............................................................................................................7

5.2 PCA Algorithm..............................................................................................................................11

6. Testing...............................................................................................................................................17

7. Snapshots..........................................................................................................................................25

8. Conclusion.........................................................................................................................................29

9. Future Enhancements........................................................................................................................30

10. References.....................................................................................................................................31

1. Introduction

Humans are very good at recognizing faces and complex patterns. Even a passage of time doesn't

affect this capability and therefore it would help if computers become as robust as humans in

face recognition. Face recognition system can help in many ways:

Checking for criminal records.

Enhancement of security by using surveillance cameras in conjunction with face

recognition system.

Finding lost children's by using the images received from the cameras fitted at public

places.

Knowing in advance if some VIP is entering the hotel.

Detection of a criminal at public place.

Can be used in different areas of science for comparing an entity with a set of entities.

Pattern Recognition.

This project is a step towards developing a face recognition system which can recognize static

images. It can be modified to work with dynamic images. In that case the dynamic images

received from the camera can first be converted in to the static one's and then the same procedure

can be applied on them. But then there are lots of other things that should be considered. Like

distance between the camera and the person, magnification factor, view [top, side, front] etc.

2. Tools/Environment Used

Software Requirements:

Operating System : Windows operating system

Hardware Requirements:

Processor : Pentium processor of 400MHz or higher.

RAM : Minimum 64MB primary memory.

Hard disk : Minimum 1GB hard disk space.

Monitor : Preferably color monitor (16 bit color) and above.

Web camera.

Compact Disk drive.

A keyboard and a mouse.

3. Analysis

Modules

Add Image/Registration

Image Capture

Login

Eigenface Computation

Identification

A module is a small part of our project. This plays a very important role in the project and in

coding concepts. In Software Engineering concept we treat it has a small part of a system but

whereas in our programming language it is a small part of the program, which we also called as

function in, some cases which constitute the main program.

Importance of modules in any software development side is we can easily understand what the

system we are developing and what its main uses are. At the time of project we may create many

modules and finally we combine them to form a system.

Module Description

Add Image/Registration

Add Image is a module that is considered with adding image along with the user id for login

of the person of whom we are taking image. In this we add Image by capturing from web camera

and store them in our system. During registration four images are captured. Each image is stored

four times as minimum of sixteen images are required for the algorithm of comparison.

Image Capture Module

This module is used to capture image using web camera. This is written as a separate thread

to avoid system hanging. This module is used to capture image in login module and registration

module.

Login

This modules function is to compare the captured image with stored images in the system.

This module uses Eigenface computation defined in next modules for comparison.

Eigenface Computation

This module is used to compute the "face space" used for face recognition. The recognition is

actually being carried out in the FaceBundle object, but the preparation of such object requires

doing lots of computations. The steps are:

* Compute an average face.

* Build a covariance matrix.

* Compute eigenvalues and eigenvector

* Select only sixteen largest eigenvalues (and its corresponding eigenvectors)

* Compute the faces using our eigenvectors

* Compute eigenspace for our given images.

Identification

This module contains the functionality to take the image from above module and it compares or

searches with the images already there in the database. If any image is matched then a success`

message is shown to the user.

Registration:

Login:

Create Username RRegistration

Register your image

Saving the username &

image for future use

Identifying the image

Flow Diagram

Enter Username Enter Password(image)

Authentication Provided

Authentication Declined

Processing the Request

True False

Start

Capture Image Enter Login Id

Action

Success

Success Message Failure Message

Capture Image

Store

Login Register

Compare

Stages of face recognition

4. Design

5.1 Mathematical Background

This section will illustrate mathematical algorithm that are the back bone of Principal

Component Analysis. It is less important to remember the exact mechanics of mathematical

techniques than it is to understand the intuition behind them. The topics are covered

independently of each other and examples are given.

Variance, Covariance, Covariance Matrix and Eigenvectors and Eigenvalues are basis of the

design algorithm.

a. Variance

The variance is a measure of the spread of data. Statisticians are usually concerned with taking a

sample of a population. To use election polls as an example, the population is all the people in

the country, whereas a sample is a subset of the population that the statisticians measure. The

great thing about statistics is that by only measuring a sample of the population, we can work out

what is most likely to be the measurement if we used the entire population.

Let's take an example:

Face location detection

Feature extraction

Facial image classification

X = [1 2 4 6 12 25 45 68 67 65 98]

We could simply use the symbol X to refer to this entire set of numbers. For referring to an

individual number in this data set, we will use subscript on the symbol X to indicate a specific

number. There are number of things that we can calculate about a data set. For example we can

calculate the mean of the sample. It can be given by the formulae:-

mean = sum of all numbers / total no. of numbers

Unfortunately, the mean doesn't tell us a lot about the data except for a sort of middle point. For

example, these two data sets have exactly the same mean (10), but are obviously quite different:

[0 8 12 20] and [8 9 11 12]

So what is different about these two sets? It is the spread of the data that is different. The

Variance is a measure of how spread out data is. It’s just like Standard Deviation.

SD is "The average distance from the mean of the data set to a point". The way to calculate it is

to compute the squares of the distance from each data point to the mean of the set, add them all

up, divide by n-1, and take the positive square root. As formulae:

b. Covariance

Variance and SD are purely 1-dimensional.Data sets like this could be: height of all the people in

the room, marks for the last CSC378 exam etc. However many data sets have more than one

dimensions, and the aim of the statistical analysis of these data sets is usually to see if there is

any relationship between the dimensions. For example, we might have as our data set both the

height of all the students in a class, and the mark they received for that paper. We could then

perform statistical analysis to see if the height of a student has any effect on their mark. It is

useful to have measure to find out how much the dimensions vary from the mean with respect o

each other.

Covariance is such a measure. It is always measured between 2 dimensions. If we calculate the

covariance between one dimension and itself, you get the variance. So if we had a three

dimensional data set (x, y, z), then we could measure the covariance between the x and y

dimensions, the x and z dimensions, and the y and z dimensions. Measuring the covariance

between x and x, or y and y, or z and z would give us the variance of the x, y and z dimensions

respectively.

The formula for covariance is very similar to the formulae for variance.

How does this work? Let’s use some example data. Imagine we have gone into the world and

collected some 2-dimensional data, say we have asked a bunch of students how many hours in

total that they spent studying CSC309, and the mark that they received. So we have two

dimensions, the first is the H dimension, the hours studied, and the second is the M dimension,

the mark received.

So what does the covariance between H and M tells us? The exact value is not as important as its

sign (ie. positive or negative). if the value is positive, then that indicates that noth dimensions

increase together, meaning that, in general, as the number of hours of study increased, so did the

final mark.

If the value is negative, then as one dimension increase the other decreases. If we had ended up

with a negative covariance then would mean opposite that as the number of hours of study

increased the final mark decreased.

In the last case, if the covariance is zero, it indicates that the two dimensions are independent of

each other.

c. The covariance Matrix

A useful way to get all the possible covariance values between all the different dimensions is to

calculate them all and put them in a matrix. An example. We will make up the covariance matrix

for an imaginary 3 dimensional data set, using the usual dimensions x,y and z. Then the

covariance matrix has 3 rows and 3 columns, and the values are this:

cov(x,x) cov(x,y) cov(x,z)

C = cov(y,x) cov(y,y) cov(y,z)

cov(z,x) cov(z,y) cov(z,z)

Point to note: Down the main diagonal, we see that the covariance value is between one of the

dimensions and itself. These are the variances for that dimension. The other point is that since

cov(a,b) = cov(b,a), the matrix is symmetrical about the main diagonal.

d. Eigenvectors and Eigenvalues

If we multiply a square matrix with any other vector then we will get another vector that is

transformed from its original position. It is the nature of the transformation that the eigenvectors

arise from. Imagine a transformation matrix that, when multiplied on the left, reflected vectors in

the line y=x. Then we can see that if there were a vector that lay on the line y=x, it is reflection

of itself. This vector (and all multiples of it, because it wouldn't matter how long the vector was),

would be an eigenvector of that transformation matrix. Eigenvectors can only be found for

square matrices. And not every square matrix has eigenvectors. And given an n x n matrix that

does have eigenvectors, there are n of them. Another property of eigenvectors is that even if we

scale the vector by some amount before we multiply it, we will still get the same multiple of it as

a result. This is because if we scale a vector by some amount, all we are doing is making it

longer,

Lastly, all the eigenvectors of a matrix are perpendicular, ie. at right angles to each other, no

matter how many dimensions you have. By the way, another word for perpendicular, in math

talk, is orthogonal. This is important because it means that we can express the data in terms of

these perpendicular eigenvectors, instead of expressing them in terms of the x and y axes. Every

eigenvector has a value associated with it, which is called as eigenvalue. Principal eigenvectors

are those which have the highest eigenvalues associated with them.

5.2 PCA Algorithm

a. Eigen faces Approach

Extract relevant information in a face image [Principal Components] and encode that information

in a suitable data structure. For recognition take the sample image and encode it in the same way

and compare it with the set of encoded images. In mathematical terms we want to find eigen

vectors and eigen values of a covariance matrix of images. Where one image is just a single point

in high dimensional space [n * n], where n * n are the dimensions of a image. There can be many

eigen vectors for a covariance matrix but very few of them are the principle one's. Though each

eigen vector can be used for finding different amount of variations among the face image. But

we are only interested in principal eigen vectors because these can account for substantial

variations among a bunch of images. They can show the most significant relationship between

the data dimensions.

Eigenvectors with highest eigen values are the principle component of the Image set. We may

lose some information if we ignore the components of lesser significance. But if the eigen values

are small then we won't lose much. Using those set of eigen vectors we can construct eigenfaces.

b. Finding EigenFaces

(1) Collect a bunch [say 15] of sample face images . Dimensions of all images should be same .

An image can be stored in an array of n*n dimensions [ ] which can be considered as a image

vector.

Where M is the number of images.

(2) Find the average image of bunch of images.

(3) Find the deviated [avg - img1 , avg - img2, ......... , avg - img.n] images .

(4) Calculate the covariance matrix .

http://individual.utoronto.ca/rav/imgs1.jpeg

http://individual.utoronto.ca/rav/imgs1.jpeg

where

But the problem with this approach is that we may not be able to complete this operation for a

bunch of images because covariance matrix will be very huge. For Example Covariance matrix

,where dimension of a image = 256 * 256, will consist of [256 * 256] rows and same numbers of

columns. So its very hard or may be practically impossible to store that matrix and finding that

matrix will require considerable computational requirements.

So for solving this problem we can first compute the matrix L.

And then find the eigen vectors [v] related to it

Eigen Vectors for Covariance matrix C can be found by

where

http://individual.utoronto.ca/rav/FR/cov.htm

http://individual.utoronto.ca/rav/FR/cov.htm

are the Eigen Vectors for C.

(5) Using these eigen vectors , we can construct eigen faces . But we are interested in the eigen

vectors with high eigen values . So eigen vectors with less than a threshold eigen value can be

dropped .So we will keep only those images which correspond to the highest eigen values. This

set of images is called as face space. For doing that in java , we have used colt algebra package.

These are the steps involved in the implementation -->

i) Find [from 4]

Convert it in to a DoubleDenseMatrix2D by using colt matrix class.

ii) Find the eigen vector associated with that by using class :-

cern.colt.matrix.linalg.EigenvalueDecomposition

This will be a M by M [M = number of training images] matrix.

iii) By multiplying that with 'A' [Difference image matrix] we'll be able to get the actual

eigenvector matrix [U] of covariance of 'A'. It will be of M by X [Where X is the total number of

pixels in a image].

c. Classifying Face Images

The eigenfaces derived from the previous section seem adequate for describing face images

under very controlled conditions, we decided to investigate their usefulness as a tool for face

recognition. Since the accurate reconstruction of the image is not a requirement, a smaller

number of eigenfaces are sufficient for the identification process. So identification becomes a

pattern recognition task.

Algorithm:

1. Convert image into a matrix [ ] so that all pixels of the test image are stored in a matrix of

256*256[rows] by 1 [column] size.

2. Find weights associated with each training image. This operation can simply be performed by,

Weight Matrix = TransposeOf (EigenVector-of-CovarianceMatrix) * DifferenceImageMatrix.

This matrix will be of size N by N, where N is the total number of face images. Each entry in the

column will then represent the corresponding weight of that particular image with respect to a

particular eigenvector.

2. Project into "face space" by a simple operation, this operation is same as defined above. But

here we are projecting a single image and hence we will get a matrix of size N [rows] by 1

[columns].Let's call this matrix as 'TestProjection' matrix.

for k=1,2.....N. Where N is the total number of training images.

3. Find the distance between the each element of the testProjection matrix and the corresponding

element of Weight matrix. We will get a new matrix of N [rows] by N [columns].

4. Find the 2-Norm for the above derived matrix. This will be a matrix of 1 [rows] by N

[columns]. Find the minimum value for all the column values. If it is with in some threshold

value then return that column number. That number represents the image number. That number

shows that the test image is nearest to that particular image from the set of training images. If the

minimum value is above the threshold value, then that test image can be considered as a new

image which is not in our training image set. And that can be stored in our training image set by

applying the same procedure [mentioned in section 5.2]. So the system is a kind of learning

system which automatically increases its knowledge if it encounters some unknown image [ the 1

which it couldn't detect ].

5. Testing

Introduction

Software testing is a critical element of software quality assurance and represents the ultimate

service of specification design and coding. The increasing visibility of software as a system

element and the attended costs associated with the software failure and motivating forces for well

planned, thorough testing. It is not unusual for a software development to spend between 30 and

40 percent of total project effort in testing. System Testing Strategies for this system integrate

test case design techniques into a well planned series of steps that result in the successful

construction of this software. It also provides a road map for the developer, the quality assurance

organization and the customer, a roadmap that describes the steps to be conducted as path of

testing, when these steps are planned and then undertaken and how much effort, time and

resources will be required.

The test provisions are follows.

System testing

Software Testing: As the coding is completed according to the requirement we have to test the

quality of the software. Software testing is a critical element of software quality assurance and

represents the ultimate review of specification, design and coding. Although testing is to uncover

the errors in the software but it also demonstrates that software functions appear to be working as

per the specifications, those performance requirements appear to have been met. In addition, data

collected as testing is conducted provide a good indication of software and some indications of

software quality as a whole. To assure the software quality we conduct both White Box Testing

and Black Box Testing.

White Box Testing:

White Box Testing is a test case design method that uses the control structure of the

procedural design to derive test cases. As we are using a non-procedural language, there is very

small scope for the White Box Testing. Whenever it is necessary, there the control structure are

tested and successfully passed all the control structure with a very minimum error.

Black Box Testing:

Black Box Testing focuses on the functional requirement of the software. It enables to

derive sets of input conditions that will fully exercise all functional requirements for a program.

The Black Box Testing finds almost all errors. If finds some interface errors and errors in

accessing the database and some performance errors. In Black Box Testing we use mainly two

techniques Equivalence partitioning the Boundary Volume Analysis Technique.

Equivalence Partitions:

In the method we divide input domain of a program into classes of data from which test cases are

derived. An Equivalence class represents a set of valid or invalid of a set of related values or a

Boolean condition.

The equivalence for these is: Input condition requires specific value-specific or non-specific two

classes.

Input condition requires a range or out of range two classes.

Input condition specifies a number of a set-belongs to a set or not belongs to the set two classes.

Input condition is Boolean-valid or invalid Boolean condition two classes.

Boundary Values Analysis:

Number of errors usually occurs at the boundaries of the input domain generally. In this

technique a selection of test cases is exercised using boundary values i.e., around boundaries. By

the above two techniques, we eliminated almost all errors from the software and checked for

numerous test values for each and every input value. The results were satisfactory. Flow of

Testing System testing is designated to uncover weakness that was not detected in the earlier

tests. The total system is tested for recovery and fallback after various major failures to ensure

that no data are lost. An accepted test is done to validity and reliability of the system. The

philosophy behind the testing is to find error in project.

There are many test cases designed with this is mind. The flow of testing is as follows.

Code Testing

Specification testing is done to check if the program does with it should do and how it should

behave under various conditions or combinations and submitted for processing in the system and

it’s checked if any overlaps occur during the processing. This strategy examines the logic of the

program. Here only syntax of the code is tested. In code testing syntax errors are corrected, to

ensure that the code is perfect.

Unit Testing:

The first level of testing is called unit testing. Here different modules are tested against the

specifications produced during the design of the modules. Unit testing is done to test the working

of individual modules with test oracles. Unit testing comprises a set of tests preformed by an

individual programmer prior to integration of the units into a large system. A program unit is

small enough that the programmer who developed if can test it in a great detail. Unit testing

focuses first on the modules to locate errors. These errors are verified and corrected so that the

unit perfectly fits to the project.

System Testing

The next level of testing is system testing and acceptance testing. This testing is done to check if

the system has its requirements and to find the external behavior of the system. System testing

involves two kinds of activities:

Integration testing

Acceptance testing

Integration Testing

The next level of testing is called the Integration Testing. In this many tested modules are

combined into subsystems, which were tested. Test case data is prepared to check the control

flow of all the modules and to exhaust all possible inputs to the program. Situations like treating

the modules when there is no data entered in the text box is also tested. This testing strategy

dictates the order in which modules must be available, and exerts strong influence on the order in

which the modules must be written, debugged and unit tested. In integration testing, all the

modules / units on which unit testing is performed are integrated together and tested.

Acceptance Testing:

This testing is performed finally by user to demonstrate that the implemented system satisfies its

requirements. The user gives various inputs to get required outputs.

Specification Testing:

Specification testing is done to check if the program does what is should do and how it should

behave under various conditions or combination and submitted for processing in the system and

it is checked if any overlaps occur during the processing.

Testing Objectives:

The following are the testing objectives….

Testing is a process of executing a program with the intent of finding an error.

A good test case is one that has a high probability of finding an as yet undiscovered error.

A successful test is one that uncovers an as yet undiscovered error.

The above objectives imply a dramatic change in view point. They move counter to the

commonly held view that a successful test is one in which no errors are found. Our objective is

to design tests that systematically verify different clauses of errors and do so with minimum

amount of time and effort. If testing is conducted successfully, it will uncover errors in the

software. As a secondary benefit, testing demonstrates that software functions appear to be

working according to specification and that performance requirements appear to have been met.

In addition, data collected as testing is conducted provides a good indication of software. Testing

can’t show the absence of defects, it can only show that software errors are present. It is

important to keep this stated in mind as testing is being conducted.

Testing principles:

Before applying methods to design effective test cases, a software engineer must

understand the basic principles that guide software testing.

• All tests should be traceable to customer requirements.

• Tests should be planned long before testing begins.

• Testing should begin “in the small” and progress towards testing “in the large”.

• Exhaustive testing is not possible.

Test Plan:

A test plan is a document that contains a complete set of test cases for a system, along

with other information about the testing process. The test plan should be returned long before the

testing starts.

Test plan identifies

1. A task set to be applied as testing commences,

2. The work products to be produced as each testing task is executed

3. The manner, in which the results of testing are evaluated, recorded and reuse when regression

testing is conducted. In some cases the test plan is indicated with the project plan. In others the

test plan is a separate document. The test report is a record of the testing performed. The testing

report enables the acquirer to assess the testing and its results. The test report is a record of the

testing performed. The testing report enables the acquirer to assess the testing and its results.

Test cases

Test cases for login page

Sl no

Task Expected result Obtained result Remarks

1 Using valid username

and

password(Image)

Successful

authentication

As expected success

2 Using invalid

username

Authentication

failed

As expected Invalid user

name

3 Using invalid

password(Image)

Authentication

failed

As expected Username and

password are

not correct

4 Without giving

username and

password

Authentication

failed As expected

Please enter

user name and

password

5 Username and

without password

Authentication

failed

As expected Password

cannot be

empty

Test cases for registration page

Sl no Task Expected result Obtained

result

Remarks

1 Capture four images Registration As expected success

and register success

2 Capture three

images and register

Should not allow

to register

As expected

Register button is

disabled if less than

four images are

captured.

3 Without giving port

number

Connection failed As expected Please specify port

number

4 Without selecting IP Connection failed As expected Ip has to be selected

6. Snapshots

Layout

The layout contains two sections. Left section is used for placing web camera window. Right section is used to show capture images for login and registration.

Web camera Window

This is a separate window which is created using separate thread.

Register Window

Four images are shown which are captured during registration.

Login Screen

The image is captured for login is shown in this window. Success message is shown as below.

Login screen ( login failure).

Login Screen and Web camera

Web camera and captured image during login is as shown below.

7. Conclusion

1. The user will be authenticated not only with the username also with the image of the user

2. For the processing, some of the lines on the face will be used so that the image can be identified with the different angles.

3. The image processing process is good enough to provide security for the website.

8. Future Enhancements

1. The project can be enhanced for processing 3D images.

2. Authentication can be implemented by capturing video clip of a person.

3. This can also be used to process the signatures of a person for providing the authentication.

4. We can also use this in real time application.

5. Authentication can be embedded into web application which will be an added advantage for providing the login for the websites.

9. References

Websites

http://www.imageprocessingplace.com/

http://www.graphicsmagick.org/

http://www.imagemagick.org/

http://www.mediacy.com/

Books

Digtal Image Processing Projects- Rs tech Technology

Image processing by Micheal Pedilla

http://www.mediacy.com/

http://www.imagemagick.org/

http://www.graphicsmagick.org/

http://www.imageprocessingplace.com/

Face recognition and Detection by PCA Algorithm

Engineering

Transcript of Face recognition and Detection by PCA Algorithm