A Tutorial on Support Vector Machines for Pattern Recognition
Face recognition using independent component analysis and support vector machines
Transcript of Face recognition using independent component analysis and support vector machines
Pattern Recognition Letters 24 (2003) 2153–2157
www.elsevier.com/locate/patrec
Face recognition using independent component analysisand support vector machines q
O. D�eeniz *, M. Castrill�oon, M. Hern�aandez
Instituto Universitario de Sistemas Inteligentes y Aplicaciones Num�eericas en Ingenier�ııa (IUSIANI), Universidad de Las Palmas de GranCanaria, Edificio Central del Parque Cient�ııfico-Tecnol�oogico, Campus Universitario de Tafira, 35017 Las Palmas, Spain
Abstract
Support vector machines (SVM) and independent component analysis (ICA) are two powerful and relatively recent
techniques. SVMs are classifiers which have demonstrated high generalization capabilities in many different tasks,
including the object recognition problem. ICA is a feature extraction technique which can be considered a general-
ization of principal component analysis (PCA). ICA has been mainly used on the problem of blind signal separation. In
this paper we combine these two techniques for the face recognition problem. Experiments were made on two different
face databases, achieving very high recognition rates. As the results using the combination PCA/SVM were not very far
from those obtained with ICA/SVM, our experiments suggest that SVMs are relatively insensitive to the representation
space. Thus as the training time for ICA is much larger than that of PCA, this result indicates that the best practical
combination is PCA with SVM.
� 2003 Elsevier Science B.V. All rights reserved.
Keywords: Face recognition; Support vector machines; Independent component analysis
1. Introduction and motivation
Recognition is a fundamental ability of any vi-
sual system. It refers to the process of labelling an
object as belonging to a certain object class. The
human face is no doubt one of the most important
objects that our visual system has to deal with.
qWork partially funded by DGUI-Gobierno Aut�oonomo de
Canarias PI2000/042 and ULPGC UNI2002/16 research pro-
jects.* Corresponding author.
E-mail address: [email protected] (O. D�eeniz).
0167-8655/03/$ - see front matter � 2003 Elsevier Science B.V. All r
doi:10.1016/S0167-8655(03)00081-3
Humans are able to recognize others in a wide
domain of circumstances, and this capability is
crucial for human–human interaction.
The face recognition problem has attracted
much research effort in the last years. Although it
has proven to be a very difficult task even for
frontal faces, certain algorithms can perform wellunder constrained conditions. The wide array of
possible applications of face recognition has led
to a continuous search for more precise algo-
rithms and techniques. Some authors have put
emphasis on feature extraction techniques, which
lead to a representation space, while others de-
scribed improvements in the classification stage.
ights reserved.
2154 O. D�eeniz et al. / Pattern Recognition Letters 24 (2003) 2153–2157
The most prominent work was (Turk and Pent-
land, 1991), which introduced the eigenfaces
method, widely used as a reference. Many other
techniques and systems have been proposed, both
for recognition and verification, see (Fromherz,
1998; Chellappa et al., 1995; Samal and Iyengar,1992) for a detailed review. Face recognition
internet resources can be found at Kruizinga
(1998). Also, recent systems are beginning to
make use of video sequences instead of using only
one shot. The performance of these systems,
however, is bound by the error rates obtained
with individual images.
Recent advances in statistical learning theory,and in particular the introduction of support vector
machines (Vapnik, 1995) (SVM) have made it
possible to obtain very high accuracies for the ob-
ject recognition problem. Independent component
analysis (ICA) (Bell and Sejnowski, 1995) is also a
relatively recent technique which has been mainly
applied to blind signal separation, though it has been
successfully applied to the face recognition problemtoo. ICA is a feature extraction technique, while
SVM are a type of classifiers. Both techniques have
shown promising results in diverse applications.
Prompted by that fact, in this paper we combine
them in order to test whether lower error rates can
be achieved for the face recognition problem.
This paper is organized as follows. In the next
two sections we give a brief introduction to ICAand SVM. In Section 4 we describe the experi-
ments carried out. Section 5 concludes by pre-
senting future directions of research.
Fig. 1. Some original (left), PCA (center) and ICA (right) ba
2. Independent component analysis
ICA is a technique for extracting statistically
independent variables from a mixture of them
(Bell and Sejnowski, 1995). ICA has been suc-cessfully applied to many different problems such
as MEG and EEG data analysis (Vig�aario, 1997;
Vig�aario et al., 1998; Makeig et al., 1996), finding
hidden factors in financial data (Kiviluoto and
Orja, 1998; Back and Weigend, 1998) and face
recognition (see Hyv€aarinen and Oja, 1999 for an
introduction and applications).
The ICA technique aims to find a linear trans-form for the input data using a basis as statistically
independent as possible. Thus, ICA can be con-
sidered a generalization of principal component
analysis (PCA). PCA tries to obtain a representa-
tion of the inputs based on uncorrelated variables,
whereas ICA provides a representation based on
statistically independent variables. The basis im-
ages obtained with ICA are more local than thoseobtained with PCA, which suggests that they can
lead to more precise representations. Fig. 1 shows
the difference between PCA and ICA basis images.
In the context of face recognition, ICA has been
showed to produce better results than those ob-
tained with PCA, see Liu and Wechsler (1999) for
a comparison. As opposed to PCA, ICA does not
provide an intrinsic order for the representationcoefficients of the face images. In (Bartlett and
Sejnowski, 1997) the best results were obtained
with a order based on the ratio of between-class to
within-class variance for each coefficient:
sis images for the Yale Face Database (see Section 4).
O. D�eeniz et al. / Pattern Recognition Letters 24 (2003) 2153–2157 2155
r ¼ rbetween
rwithin
¼P
j ð�xxj � �xxÞ2Pj
Pi ðxij � �xxÞ2
; ð1Þ
where rbetween is the variance of the j class means
and rwithin is the sum of the variances within eachclass.
3. Support vector machines
We only give here a brief presentation of the
basic concepts needed. The reader is referred to
Burges (1998) for a more detailed introduction andto SVM application list (2003) for a list of appli-
cations of SVMs. SVMs are based on structural
risk minimization, which is the expectation of the
test error for the trained machine. This risk is
represented as RðaÞ, a being the parameters of the
trained machine. Let l be the number of training
patterns and 06 g6 1. Then, with probability
1 � g the following bound on the expected riskholds (Vapnik, 1995):
RðaÞ6RempðaÞ þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffihðlogð2l=hÞ þ 1Þ � logðg=4Þ
l
r
ð2ÞRempðaÞ being the empirical risk, which is the mean
error on the training set, and h is the VC dimen-
sion. SVMs try to minimize the second term of (2),
for a fixed empirical risk.For the linearly separable case, SVM provides
the optimal hyperplane that separates the training
patterns. The optimal hyperplane maximizes the
sum of the distances to the closest positive and
negative training patterns. This sum is called mar-
gin. In order to weight the cost of misclassification
an additional parameter is introduced. For the non-
linear case, the training patterns are mapped onto ahigh-dimensional space using a kernel function. In
this space the decision boundary is linear. The most
commonly used kernel functions are polynomials,
exponential and sigmoidal functions.
4. Experiments
In order to establish the performance of ICA/
SVM, in comparison with other schemes, we car-
ried out experiments on two independent face
databases, the Yale Face Database (Belhumeur
et al., 1997), and a randomly chosen subset of the
AR face set (Martinez and Benavente, 1998). The
Yale Face Database contains 165 images (11 per
individual), with changes in facial expression, oc-clusion, and illumination conditions. From the AR
Face Database we used 300 face images (12 per
individual), with changes in facial expression and
illumination conditions, and images taken in two
sessions two weeks apart.
All the results were obtained using 2-fold (AR)
and 5-fold (Yale) cross-validation and varying the
number of coefficients used in the range 1 � N , Nbeing the number of training images. ICA coef-
ficients were ordered according to the ratio rexplained in Section 2. All the images were previ-
ously converted to 256 gray levels and histogram
equalization was applied. The background in the
Yale images was manually removed with a rect-
angle. For the images of the AR database a more
plausible normalization was accomplished. Besideshistogram equalization, we also performed geo-
metric normalization. The images were firstly
cropped with an ellipse, thus removing hair and
shoulders. The eyes and mouth were located
manually and then the images were shifted both in
x and y and warped in order to have the eyes and
mouth in the same place for all the images.
The ICA algorithm we used in our experimentswas FastICA (The FastICA MATLAB package,
2003). FastICA provides rapid convergence and
estimates the independent components by maxi-
mizing a measure of independence among the es-
timated original components. The SVM classifier,
as introduced in Section 3, is a 2-class classifier.
Therefore, we had to adapt it to our multiclass
problem. There are two options: using N SVMs (Nbeing the number of classes), separating one class
from the rest, or using NðN � 1Þ=2 SVMs, one for
each pair of classes. As the accuracies between
these two approaches are almost the same
(Chapelle et al., 1999), we choose the first option,
which is less complex. The SVM algorithm used in
the experiments presented problems of conver-
gence when the input coefficients had a relativelyhigh magnitude. This may be alleviated by divid-
ing the input coefficients by a constant value
Fig. 2. Cumulative rank error and classification error as a function of the number of coefficients for the AR face set.
Table 1
Recognition rates obtained for Yale and AR images, using the Nearest Mean Classifier (NMC) and SVM (%)
Face set Feature extraction NMC using
euclidean
distance
SVM using kernel functions
p ¼ 1 p ¼ 2 p ¼ 3 Gaussian
Yale PCA 92.73 98.79 98.79 98.79 99.39
ICA 95.76 99.39 99.39 99.39 99.39
AR PCA 48.33 92 91.67 91 92.67
ICA 70.33 93.33 93.33 92.67 94
SVM was used only with polynomial (degrees p ¼ 1, 2 and 3) and gaussian kernels (varying the kernel parameter r). These kernels were
chosen for the experiments because they showed a better numeric behavior.
2156 O. D�eeniz et al. / Pattern Recognition Letters 24 (2003) 2153–2157
(Thorsten Joachims, personal communication).The results obtained are shown in Table 1. For
SVM, a value of 1000 was used as misclassification
weight. The last column represents the best results
obtained varying r. Note that the combination
PCA-NMC corresponds to the eigenfaces method.
For the Yale Database, there is no clear differ-
ence between ICA/SVM and PCA/SVM. We pos-
tulate that this is due to the fact that theclassification error is too close to zero, which does
not allow for differences to be seen clearly. As for
the AR images, although the best absolute results
are obtained with ICA and SVM, the performance
is not far from that obtained with PCA and SVM.
This is consistent with the results reported in
(Jonsson et al., 1999, 2002), which suggested that
SVM is relatively insensitive to the representationspace. Notice however that here we do not claim
that SVM are relatively insensitive to the repre-
sentation space, but to the election between PCAand ICA. Fig. 2 represents the cumulative rank
error and classification error as a function of the
number of coefficients for the AR face set, which
confirms our hypothesis.
5. Conclusions and future work
We obtained experimental results showing that
very high recognition rates can be achieved using
ICA/SVM, although PCA/SVM also gave good
results. Thus, evidence was given for the fact that
SVMs are relatively insensitive to the representa-
tion space, which is in accordance with the results
reported in (Jonsson et al., 1999, 2002), giving
more importance to the trade-off between cost andperformance, as ICA is not as fast a feature ex-
traction technique as PCA. SVM itself is a slow
O. D�eeniz et al. / Pattern Recognition Letters 24 (2003) 2153–2157 2157
technique (in the learning stage), which adds to
this.
In future work we plan to use powerful tech-
niques such as SVM in the dynamic face recogni-
tion problem, the objective being the recognition
of face sequences. The robustness of a face recog-nition system can be improved using temporal
context and the many images that can be easily
obtained with standard video cameras.
Acknowledgements
The authors would like to thank Dr. MarianStewart Bartlett for her interesting comments on
ICA. Thanks are also due to Prof. Dr. Robert
Duin, David Tax, Thorsten Joachims and Prof.
Jos�ee Javier Lorenzo Navarro for valuable advice
during the writing of this paper. Author supported
by graduate grant D260/54066308-R of Universi-
dad de Las Palmas de Gran Canaria.
References
Back, A.D., Weigend, A.S., 1998. A first application of
independent component analysis to extracting structure
from stock returns. Int. J. Neural Systems 4 (8), 473–
484.
Bartlett, M.S., Sejnowski, T.J., 1997. Independent components
of face images: a representation for face recognition. In:
Proc. of the 4th Annual Joint Symposium on Neural
Computation, Pasadena, CA.
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J., 1997.
Eigenfaces vs. fisherfaces: recognition using class specific
linear projection. IEEE Trans. Pattern Anal. Machine
Intell. 19 (7), 711–720.
Bell, A., Sejnowski, T., 1995. An information maximization
approach to blind separation and blind deconvolution.
Neural Comput. 7, 1129–1159.
Burges, C.J.C., 1998. A tutorial on support vector machines for
pattern recognition. Data Min. Knowl. Disc. 2 (2), 121–
167.
Chapelle, O., Haffner, P., Vapnik, V., 1999. SVMs for histo-
gram-based image classification. IEEE Trans. Neural Net-
works 10 (5), 1055–1065.
Chellappa, R., Wilson, C., Sirohey, S., 1995. Human and
machine recognition of faces: a survey. Proc. IEEE 83 (5),
705–740.
Fromherz, T., 1998. Face recognition: a summary of 1995–
1997. Technical Report TR-98-027, International Computer
Science Institute, Berkeley.
Hyv€aarinen, A., Oja, E., 1999. Independent Component Anal-
ysis: A Tutorial. Available from <http://www.cis.
hut.fi/~aapo/papers/IJCNN99_tutorialweb/>.
Jonsson, K., Kittler, J., Li, Y.P., Matas, J., 1999. Support
vector machines for face authentication. In: Proc. of the
British Machine Vision Conference, pp. 543–553.
Jonsson, K., Kittler, J., Li, Y.P., Matas, J., 2002. Support
vector machines for face authentication. Image Vision
Comput. 20, 269–275.
Kiviluoto, K., Orja, E., 1998. Independent components analysis
for parallel financial time series. In: Proc. ICONIP�98, vol.
2, Tokyo, Japan, pp. 895–898.
Kruizinga, P., 1998. Face recognition home page.
Liu, C., Wechsler, H., 1999. Comparative assessment of
independent component analysis (ICA) for face recognition.
In: Second International Conference on Audio and Video-
based Biometric Person Authentication.
Makeig, S., Bell, A.J., Jung, T.P., Sejnowski, T.J., 1996.
Independent component analysis for electroencephalografic
data. In: Advances in Neural Information Processing
Systems 8. MIT Press, pp. 145–151.
Martinez, A.M., Benavente, R., 1998. The AR face database.
Technical Report 24, CVC.
Samal, A., Iyengar, P.A., 1992. Automatic recognition and
analysis of human faces and facial expressions: a survey.
Pattern Recognition 25 (1).
SVM application list. Available from <http://www.
clopinet.com/isabelle/Projects/SVM/app-
list. html>.
The FastICA MATLAB package. Available from <http://
www.cis.hut.fi/projects/ica/fastica/>.
Turk, M.A., Pentland, A.P., 1991. Eigenfaces for recognition.
Cognitive Neurosci. 3 (1), 7186. Available from <ftp://
whitechapel.media.mit.edu/pub/images/>.
Vapnik, V., 1995. The Nature of Statistical Learning Theory.
Springer, NY.
Vig�aario, R., 1997. Extraction of ocular artifacts from EEG
using independent component analysis. Electroenceph. Clin.
Neurophysiol. 103 (3), 395–404.
Vig�aario, R., Jousm€aaki, V., H€aam€aal€aainen, M., Hari, R., Oja, E.,
1998. Independent component analysis for identification of
artifacts in magnetoencephalographic recordings. In: Ad-
vances in Neural Information Processing Systems 10. MIT
Press, pp. 229–235.