Face recognition using independent component analysis and support vector machines

Pattern Recognition Letters 24 (2003) 2153–2157

www.elsevier.com/locate/patrec

Face recognition using independent component analysisand support vector machines q

O. D�eeniz *, M. Castrill�oon, M. Hern�aandez

Instituto Universitario de Sistemas Inteligentes y Aplicaciones Num�eericas en Ingenier�ııa (IUSIANI), Universidad de Las Palmas de GranCanaria, Edificio Central del Parque Cient�ııfico-Tecnol�oogico, Campus Universitario de Tafira, 35017 Las Palmas, Spain

Abstract

Support vector machines (SVM) and independent component analysis (ICA) are two powerful and relatively recent

techniques. SVMs are classifiers which have demonstrated high generalization capabilities in many different tasks,

including the object recognition problem. ICA is a feature extraction technique which can be considered a general-

ization of principal component analysis (PCA). ICA has been mainly used on the problem of blind signal separation. In

this paper we combine these two techniques for the face recognition problem. Experiments were made on two different

face databases, achieving very high recognition rates. As the results using the combination PCA/SVM were not very far

from those obtained with ICA/SVM, our experiments suggest that SVMs are relatively insensitive to the representation

space. Thus as the training time for ICA is much larger than that of PCA, this result indicates that the best practical

combination is PCA with SVM.

� 2003 Elsevier Science B.V. All rights reserved.

Keywords: Face recognition; Support vector machines; Independent component analysis

1. Introduction and motivation

Recognition is a fundamental ability of any vi-

sual system. It refers to the process of labelling an

object as belonging to a certain object class. The

human face is no doubt one of the most important

objects that our visual system has to deal with.

qWork partially funded by DGUI-Gobierno Aut�oonomo de

Canarias PI2000/042 and ULPGC UNI2002/16 research pro-

jects.* Corresponding author.

E-mail address: [email protected] (O. D�eeniz).

0167-8655/03/$ - see front matter � 2003 Elsevier Science B.V. All r

doi:10.1016/S0167-8655(03)00081-3

Humans are able to recognize others in a wide

domain of circumstances, and this capability is

crucial for human–human interaction.

The face recognition problem has attracted

much research effort in the last years. Although it

has proven to be a very difficult task even for

frontal faces, certain algorithms can perform wellunder constrained conditions. The wide array of

possible applications of face recognition has led

to a continuous search for more precise algo-

rithms and techniques. Some authors have put

emphasis on feature extraction techniques, which

lead to a representation space, while others de-

scribed improvements in the classification stage.

ights reserved.

mail to: [email protected]

2154 O. D�eeniz et al. / Pattern Recognition Letters 24 (2003) 2153–2157

The most prominent work was (Turk and Pent-

land, 1991), which introduced the eigenfaces

method, widely used as a reference. Many other

techniques and systems have been proposed, both

for recognition and verification, see (Fromherz,

1998; Chellappa et al., 1995; Samal and Iyengar,1992) for a detailed review. Face recognition

internet resources can be found at Kruizinga

(1998). Also, recent systems are beginning to

make use of video sequences instead of using only

one shot. The performance of these systems,

however, is bound by the error rates obtained

with individual images.

Recent advances in statistical learning theory,and in particular the introduction of support vector

machines (Vapnik, 1995) (SVM) have made it

possible to obtain very high accuracies for the ob-

ject recognition problem. Independent component

analysis (ICA) (Bell and Sejnowski, 1995) is also a

relatively recent technique which has been mainly

applied to blind signal separation, though it has been

successfully applied to the face recognition problemtoo. ICA is a feature extraction technique, while

SVM are a type of classifiers. Both techniques have

shown promising results in diverse applications.

Prompted by that fact, in this paper we combine

them in order to test whether lower error rates can

be achieved for the face recognition problem.

This paper is organized as follows. In the next

two sections we give a brief introduction to ICAand SVM. In Section 4 we describe the experi-

ments carried out. Section 5 concludes by pre-

senting future directions of research.

Fig. 1. Some original (left), PCA (center) and ICA (right) ba

2. Independent component analysis

ICA is a technique for extracting statistically

independent variables from a mixture of them

(Bell and Sejnowski, 1995). ICA has been suc-cessfully applied to many different problems such

as MEG and EEG data analysis (Vig�aario, 1997;

Vig�aario et al., 1998; Makeig et al., 1996), finding

hidden factors in financial data (Kiviluoto and

Orja, 1998; Back and Weigend, 1998) and face

recognition (see Hyv€aarinen and Oja, 1999 for an

introduction and applications).

The ICA technique aims to find a linear trans-form for the input data using a basis as statistically

independent as possible. Thus, ICA can be con-

sidered a generalization of principal component

analysis (PCA). PCA tries to obtain a representa-

tion of the inputs based on uncorrelated variables,

whereas ICA provides a representation based on

statistically independent variables. The basis im-

ages obtained with ICA are more local than thoseobtained with PCA, which suggests that they can

lead to more precise representations. Fig. 1 shows

the difference between PCA and ICA basis images.

In the context of face recognition, ICA has been

showed to produce better results than those ob-

tained with PCA, see Liu and Wechsler (1999) for

a comparison. As opposed to PCA, ICA does not

provide an intrinsic order for the representationcoefficients of the face images. In (Bartlett and

Sejnowski, 1997) the best results were obtained

with a order based on the ratio of between-class to

within-class variance for each coefficient:

sis images for the Yale Face Database (see Section 4).

O. D�eeniz et al. / Pattern Recognition Letters 24 (2003) 2153–2157 2155

r ¼ rbetween

rwithin

¼P

j ð�xxj � �xxÞ2Pj

Pi ðxij � �xxÞ2

; ð1Þ

where rbetween is the variance of the j class means

and rwithin is the sum of the variances within eachclass.

3. Support vector machines

We only give here a brief presentation of the

basic concepts needed. The reader is referred to

Burges (1998) for a more detailed introduction andto SVM application list (2003) for a list of appli-

cations of SVMs. SVMs are based on structural

risk minimization, which is the expectation of the

test error for the trained machine. This risk is

represented as RðaÞ, a being the parameters of the

trained machine. Let l be the number of training

patterns and 06 g6 1. Then, with probability

1 � g the following bound on the expected riskholds (Vapnik, 1995):

RðaÞ6RempðaÞ þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffihðlogð2l=hÞ þ 1Þ � logðg=4Þ

l

r

ð2ÞRempðaÞ being the empirical risk, which is the mean

error on the training set, and h is the VC dimen-

sion. SVMs try to minimize the second term of (2),

for a fixed empirical risk.For the linearly separable case, SVM provides

the optimal hyperplane that separates the training

patterns. The optimal hyperplane maximizes the

sum of the distances to the closest positive and

negative training patterns. This sum is called mar-

gin. In order to weight the cost of misclassification

an additional parameter is introduced. For the non-

linear case, the training patterns are mapped onto ahigh-dimensional space using a kernel function. In

this space the decision boundary is linear. The most

commonly used kernel functions are polynomials,

exponential and sigmoidal functions.

4. Experiments

In order to establish the performance of ICA/

SVM, in comparison with other schemes, we car-

ried out experiments on two independent face

databases, the Yale Face Database (Belhumeur

et al., 1997), and a randomly chosen subset of the

AR face set (Martinez and Benavente, 1998). The

Yale Face Database contains 165 images (11 per

individual), with changes in facial expression, oc-clusion, and illumination conditions. From the AR

Face Database we used 300 face images (12 per

individual), with changes in facial expression and

illumination conditions, and images taken in two

sessions two weeks apart.

All the results were obtained using 2-fold (AR)

and 5-fold (Yale) cross-validation and varying the

number of coefficients used in the range 1 � N , Nbeing the number of training images. ICA coef-

ficients were ordered according to the ratio rexplained in Section 2. All the images were previ-

ously converted to 256 gray levels and histogram

equalization was applied. The background in the

Yale images was manually removed with a rect-

angle. For the images of the AR database a more

plausible normalization was accomplished. Besideshistogram equalization, we also performed geo-

metric normalization. The images were firstly

cropped with an ellipse, thus removing hair and

shoulders. The eyes and mouth were located

manually and then the images were shifted both in

x and y and warped in order to have the eyes and

mouth in the same place for all the images.

The ICA algorithm we used in our experimentswas FastICA (The FastICA MATLAB package,

2003). FastICA provides rapid convergence and

estimates the independent components by maxi-

mizing a measure of independence among the es-

timated original components. The SVM classifier,

as introduced in Section 3, is a 2-class classifier.

Therefore, we had to adapt it to our multiclass

problem. There are two options: using N SVMs (Nbeing the number of classes), separating one class

from the rest, or using NðN � 1Þ=2 SVMs, one for

each pair of classes. As the accuracies between

these two approaches are almost the same

(Chapelle et al., 1999), we choose the first option,

which is less complex. The SVM algorithm used in

the experiments presented problems of conver-

gence when the input coefficients had a relativelyhigh magnitude. This may be alleviated by divid-

ing the input coefficients by a constant value

Fig. 2. Cumulative rank error and classification error as a function of the number of coefficients for the AR face set.

Table 1

Recognition rates obtained for Yale and AR images, using the Nearest Mean Classifier (NMC) and SVM (%)

Face set Feature extraction NMC using

euclidean

distance

SVM using kernel functions

p ¼ 1 p ¼ 2 p ¼ 3 Gaussian

Yale PCA 92.73 98.79 98.79 98.79 99.39

ICA 95.76 99.39 99.39 99.39 99.39

AR PCA 48.33 92 91.67 91 92.67

ICA 70.33 93.33 93.33 92.67 94

SVM was used only with polynomial (degrees p ¼ 1, 2 and 3) and gaussian kernels (varying the kernel parameter r). These kernels were

chosen for the experiments because they showed a better numeric behavior.

2156 O. D�eeniz et al. / Pattern Recognition Letters 24 (2003) 2153–2157

(Thorsten Joachims, personal communication).The results obtained are shown in Table 1. For

SVM, a value of 1000 was used as misclassification

weight. The last column represents the best results

obtained varying r. Note that the combination

PCA-NMC corresponds to the eigenfaces method.

For the Yale Database, there is no clear differ-

ence between ICA/SVM and PCA/SVM. We pos-

tulate that this is due to the fact that theclassification error is too close to zero, which does

not allow for differences to be seen clearly. As for

the AR images, although the best absolute results

are obtained with ICA and SVM, the performance

is not far from that obtained with PCA and SVM.

This is consistent with the results reported in

(Jonsson et al., 1999, 2002), which suggested that

SVM is relatively insensitive to the representationspace. Notice however that here we do not claim

that SVM are relatively insensitive to the repre-

sentation space, but to the election between PCAand ICA. Fig. 2 represents the cumulative rank

error and classification error as a function of the

number of coefficients for the AR face set, which

confirms our hypothesis.

5. Conclusions and future work

We obtained experimental results showing that

very high recognition rates can be achieved using

ICA/SVM, although PCA/SVM also gave good

results. Thus, evidence was given for the fact that

SVMs are relatively insensitive to the representa-

tion space, which is in accordance with the results

reported in (Jonsson et al., 1999, 2002), giving

more importance to the trade-off between cost andperformance, as ICA is not as fast a feature ex-

traction technique as PCA. SVM itself is a slow

O. D�eeniz et al. / Pattern Recognition Letters 24 (2003) 2153–2157 2157

technique (in the learning stage), which adds to

this.

In future work we plan to use powerful tech-

niques such as SVM in the dynamic face recogni-

tion problem, the objective being the recognition

of face sequences. The robustness of a face recog-nition system can be improved using temporal

context and the many images that can be easily

obtained with standard video cameras.

Acknowledgements

The authors would like to thank Dr. MarianStewart Bartlett for her interesting comments on

ICA. Thanks are also due to Prof. Dr. Robert

Duin, David Tax, Thorsten Joachims and Prof.

Jos�ee Javier Lorenzo Navarro for valuable advice

during the writing of this paper. Author supported

by graduate grant D260/54066308-R of Universi-

dad de Las Palmas de Gran Canaria.

References

Back, A.D., Weigend, A.S., 1998. A first application of

independent component analysis to extracting structure

from stock returns. Int. J. Neural Systems 4 (8), 473–

484.

Bartlett, M.S., Sejnowski, T.J., 1997. Independent components

of face images: a representation for face recognition. In:

Proc. of the 4th Annual Joint Symposium on Neural

Computation, Pasadena, CA.

Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J., 1997.

Eigenfaces vs. fisherfaces: recognition using class specific

linear projection. IEEE Trans. Pattern Anal. Machine

Intell. 19 (7), 711–720.

Bell, A., Sejnowski, T., 1995. An information maximization

approach to blind separation and blind deconvolution.

Neural Comput. 7, 1129–1159.

Burges, C.J.C., 1998. A tutorial on support vector machines for

pattern recognition. Data Min. Knowl. Disc. 2 (2), 121–

167.

Chapelle, O., Haffner, P., Vapnik, V., 1999. SVMs for histo-

gram-based image classification. IEEE Trans. Neural Net-

works 10 (5), 1055–1065.

Chellappa, R., Wilson, C., Sirohey, S., 1995. Human and

machine recognition of faces: a survey. Proc. IEEE 83 (5),

705–740.

Fromherz, T., 1998. Face recognition: a summary of 1995–

1997. Technical Report TR-98-027, International Computer

Science Institute, Berkeley.

Hyv€aarinen, A., Oja, E., 1999. Independent Component Anal-

ysis: A Tutorial. Available from <http://www.cis.

hut.fi/~aapo/papers/IJCNN99_tutorialweb/>.

Jonsson, K., Kittler, J., Li, Y.P., Matas, J., 1999. Support

vector machines for face authentication. In: Proc. of the

British Machine Vision Conference, pp. 543–553.

Jonsson, K., Kittler, J., Li, Y.P., Matas, J., 2002. Support

vector machines for face authentication. Image Vision

Comput. 20, 269–275.

Kiviluoto, K., Orja, E., 1998. Independent components analysis

for parallel financial time series. In: Proc. ICONIP�98, vol.

2, Tokyo, Japan, pp. 895–898.

Kruizinga, P., 1998. Face recognition home page.

Liu, C., Wechsler, H., 1999. Comparative assessment of

independent component analysis (ICA) for face recognition.

In: Second International Conference on Audio and Video-

based Biometric Person Authentication.

Makeig, S., Bell, A.J., Jung, T.P., Sejnowski, T.J., 1996.

Independent component analysis for electroencephalografic

data. In: Advances in Neural Information Processing

Systems 8. MIT Press, pp. 145–151.

Martinez, A.M., Benavente, R., 1998. The AR face database.

Technical Report 24, CVC.

Samal, A., Iyengar, P.A., 1992. Automatic recognition and

analysis of human faces and facial expressions: a survey.

Pattern Recognition 25 (1).

SVM application list. Available from <http://www.

clopinet.com/isabelle/Projects/SVM/app-

list. html>.

The FastICA MATLAB package. Available from <http://

www.cis.hut.fi/projects/ica/fastica/>.

Turk, M.A., Pentland, A.P., 1991. Eigenfaces for recognition.

Cognitive Neurosci. 3 (1), 7186. Available from <ftp://

whitechapel.media.mit.edu/pub/images/>.

Vapnik, V., 1995. The Nature of Statistical Learning Theory.

Springer, NY.

Vig�aario, R., 1997. Extraction of ocular artifacts from EEG

using independent component analysis. Electroenceph. Clin.

Neurophysiol. 103 (3), 395–404.

Vig�aario, R., Jousm€aaki, V., H€aam€aal€aainen, M., Hari, R., Oja, E.,

1998. Independent component analysis for identification of

artifacts in magnetoencephalographic recordings. In: Ad-

vances in Neural Information Processing Systems 10. MIT

Press, pp. 229–235.

http://www.cis.hut.fi/~aapo/papers/IJCNN99_tutorialweb/

http://www.cis.hut.fi/~aapo/papers/IJCNN99_tutorialweb/

http://www.cis.hut.fi/projects/ica/fastica/

http://www.cis.hut.fi/projects/ica/fastica/

ftp://whitechapel.media.mit.edu/pub/images/

ftp://whitechapel.media.mit.edu/pub/images/

Face recognition using independent component analysis and support vector machines

Documents

Transcript of Face recognition using independent component analysis and support vector machines