fisherface-pami97

download fisherface-pami97

of 10

Transcript of fisherface-pami97

  • 7/28/2019 fisherface-pami97

    1/10

  • 7/28/2019 fisherface-pami97

    2/10

    712 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, JULY 1997

    for reconstruction from a low dimensional basis, they maynot be optimal from a discrimination standpoint.

    We should point out that Fishers Linear Discriminant isa classical technique in pattern recognition [4], first de-veloped by Robert Fisher in 1936 for taxonomic classifica-tion [5]. Depending upon the features being used, it hasbeen applied in different ways in computer vision and even

    in face recognition. Cheng et al. presented a method thatused Fishers discriminator for face recognition, wherefeatures were obtained by a polar quantization of the shape[10]. Baker and Nayar have developed a theory of patternrejection which is based on a two class linear discriminant[11]. Contemporaneous with our work [12], Cui et al. appliedFishers discriminator (using different terminology, theycall it the Most Discriminating FeatureMDF) in a methodfor recognizing hand gestures [13]. Though no implemen-tation is reported, they also suggest that the method can beapplied to face recognition under variable illumination.

    In the sections to follow, we compare four methods for

    face recognition under variation in lighting and facial ex-pression: correlation, a variant of the linear subspacemethod suggested by [3], the Eigenface method [6], [7], [8],and the Fisherface method developed here. The compari-sons are done using both a subset of the Harvard Database(330 images) [14], [15] and a database created at Yale (160images). In tests on both databases, the Fisherface methodhad lower error rates than any of the other three methods.Yet, no claim is made about the relative performance ofthese algorithms on much larger databases.

    We should also point out that we have made no attemptto deal with variation in pose. An appearance-basedmethod such as ours can be extended to handle limited

    pose variation using either a multiple-view representation,such as Pentland et als. view-based Eigenspace [16] or Mu-rase and Nayars appearance manifolds [17]. Other ap-proaches to face recognition that accommodate pose varia-tion include [18], [19], [20]. Furthermore, we assume thatthe face has been located and aligned within the image, asthere are numerous methods for finding faces in scenes[21], [22], [20], [23], [24], [25], [7].

    2 METHODS

    The problem can be simply stated: Given a set of face im-ages labeled with the persons identity (the learning set) andan unlabeled set of face images from the same group ofpeople (the test set), identify each person in the test images.

    In this section, we examine four pattern classificationtechniques for solving the face recognition problem, com-paring methods that have become quite popular in the facerecognition literature, namely correlation [26] and Eigen-face methods [6], [7], [8], with alternative methods devel-oped by the authors. We approach this problem within thepattern classification paradigm, considering each of thepixel values in a sample image as a coordinate in a high-dimensional space (the image space).

    2.1 CorrelationPerhaps, the simplest classification scheme is a nearestneighbor classifier in the image space [26]. Under this

    scheme, an image in the test set is recognized (classified) byassigning to it the label of the closest point in the learningset, where distances are measured in the image space. If allof the images are normalized to have zero mean and unitvariance, then this procedure is equivalent to choosing theimage in the learning set that best correlates with the testimage. Because of the normalization process, the result is

    independent of light source intensity and the effects of avideo cameras automatic gain control.

    This procedure, which subsequently is referred to as cor-relation, has several well-known disadvantages. First, if theimages in the learning set and test set are gathered undervarying lighting conditions, then the corresponding pointsin the image space may not be tightly clustered. So, in orderfor this method to work reliably under variations in light-ing, we would need a learning set which densely sampledthe continuum of possible lighting conditions. Second, cor-relation is computationally expensive. For recognition, wemust correlate the image of the test face with each image in

    the learning set; in an effort to reduce the computationtime, implementors [27] of the algorithm described in [26]developed special purpose VLSI hardware. Third, it re-quires large amounts of storagethe learning set mustcontain numerous images of each person.

    2.2 Eigenfaces

    As correlation methods are computationally expensive andrequire great amounts of storage, it is natural to pursuedimensionality reduction schemes. A technique now com-monly used for dimensionality reduction in computer vi-sionparticularly in face recognitionis principal compo-nents analysis (PCA) [14], [17], [6], [7], [8]. PCA techniques,

    also known as Karhunen-Loeve methods, choose a dimen-sionality reducing linear projection that maximizes thescatter of all projected samples.

    More formally, let us consider a set ofNsample images

    x x x1 2, , ,K Nm r taking values in an n-dimensional imagespace, and assume that each image belongs to one of c

    classes X X Xc1 2, , ,Km r . Let us also consider a linear trans-formation mapping the original n-dimensional image spaceinto an m-dimensional feature space, where m < n. The new

    feature vectors ykm

    R are defined by the following linear

    transformation:

    y xk T kW k N= = 1 2, , ,K

    (1)

    where W n m

    R is a matrix with orthonormal columns.If the total scatter matrix ST is defined as

    ST kk

    N

    k

    T= - -

    =

    x x mc hc h1

    where n is the number of sample images, and m Rn is themean image of all samples, then after applying the linear

    transformation WT, the scatter of the transformed feature

    vectors y y y1 2, , ,K Nm ris W S WT T . In PCA, the projectionWopt is chosen to maximize the determinant of the total

    scatter matrix of the projected samples, i.e.,

  • 7/28/2019 fisherface-pami97

    3/10

    BELHUMEUR ET AL.: EIGENFACES VS. FISHERFACES: RECOGNITION USING CLASS SPECIFIC LINEAR PROJECTION 713

    W W S W optW

    TT

    m

    =

    =

    arg max

    w w w1 2 K (2)

    where wi i m= 1 2, , ,K{ } is the set ofn-dimensional eigen-vectors of ST corresponding to the m largest eigenvalues.

    Since these eigenvectors have the same dimension as theoriginal images, they are referred to as Eigenpictures in [6]and Eigenfaces in [7], [8]. If classification is performed us-ing a nearest neighbor classifier in the reduced featurespace and m is chosen to be the number of images Nin thetraining set, then the Eigenface method is equivalent to thecorrelation method in the previous section.

    A drawback of this approach is that the scatter beingmaximized is due not only to the between-class scatter that isuseful for classification, but also to the within-class scatterthat, for classification purposes, is unwanted information.Recall the comment by Moses et al. [9]: Much of the variationfrom one image to the next is due to illumination changes.Thus if PCA is presented with images of faces under varyingillumination, the projection matrix Wopt will contain princi-

    pal components (i.e., Eigenfaces) which retain, in the pro-jected feature space, the variation due lighting. Conse-quently, the points in the projected space will not be wellclustered, and worse, the classes may be smeared together.

    It has been suggested that by discarding the three mostsignificant principal components, the variation due tolighting is reduced. The hope is that if the first principalcomponents capture the variation due to lighting, thenbetter clustering of projected samples is achieved by ig-noring them. Yet, it is unlikely that the first several princi-

    pal components correspond solely to variation in lighting;as a consequence, information that is useful for discrimina-tion may be lost.

    2.3 Linear Subspaces

    Both correlation and the Eigenface method are expected tosuffer under variation in lighting direction. Neither methodexploits the observation that for a Lambertian surface with-out shadowing, the images of a particular face lie in a 3Dlinear subspace.

    Consider a point p on a Lambertian surface illuminated

    by a point light source at infinity. Let s R3be a column

    vector signifying the product of the light source intensitywith the unit vector for the light source direction. When thesurface is viewed by a camera, the resulting image intensityof the pointp is given by

    E p a p pTb g b g b g= n s (3)

    where n pb g is the unit inward normal vector to the surfaceat the pointp, and a(p) is the albedo of the surface at p [28].This shows that the image intensity of the point p is linear

    on s R3 . Therefore, in the absence of shadowing, giventhree images of a Lambertian surface from the same view-point taken under three known, linearly independent light

    source directions, the albedo and surface normal can berecovered; this is the well known method of photometricstereo [29], [30]. Alternatively, one can reconstruct the im-

    age of the surface under an arbitrary lighting direction by alinear combination of the three original images, see [3].

    For classification, this fact has great importance: It showsthat, for a fixed viewpoint, the images of a Lambertian sur-face lie in a 3D linear subspace of the high-dimensional im-age space. This observation suggests a simple classificationalgorithm to recognize Lambertian surfacesinsensitive to

    a wide range of lighting conditions.For each face, use three or more images taken under dif-

    ferent lighting directions to construct a 3D basis for the lin-ear subspace. Note that the three basis vectors have thesame dimensionality as the training images and can bethought of as basis images. To perform recognition, wesimply compute the distance of a new image to each linearsubspace and choose the face corresponding to the shortestdistance. We call this recognition scheme the Linear Sub-space method. We should point out that this method is avariant of the photometric alignment method proposed in[3], and is a special case of the more elaborate recognition

    method described in [15]. Subsequently, Nayar and Murasehave exploited the apparent linearity of lighting to augmenttheir appearance manifold [31].

    If there is no noise or shadowing, the Linear Subspacealgorithm would achieve error free classification under anylighting conditions, provided the surfaces obey the Lam-bertian reflectance model. Nevertheless, there are severalcompelling reasons to look elsewhere. First, due to self-shadowing, specularities, and facial expressions, some re-gions in images of the face have variability that does notagree with the linear subspace model. Given enough im-ages of faces, we should be able to learn which regions aregood for recognition and which regions are not. Second, to

    recognize a test image, we must measure the distance to thelinear subspace for each person. While this is an improve-ment over a correlation scheme that needs a large numberof images to represent the variability of each class, it iscomputationally expensive. Finally, from a storage stand-point, the Linear Subspace algorithm must keep three im-ages in memory for every person.

    2.4 Fisherfaces

    The previous algorithm takes advantage of the fact that,under admittedly idealized conditions, the variation withinclass lies in a linear subspace of the image space. Hence, theclasses are convex, and, therefore, linearly separable. Onecan perform dimensionality reduction using linear projec-tion and still preserve linear separability. This is a strongargument in favor of using linear methods for dimension-ality reduction in the face recognition problem, at leastwhen one seeks insensitivity to lighting conditions.

    Since the learning set is labeled, it makes sense to usethis information to build a more reliable method for re-ducing the dimensionality of the feature space. Here weargue that using class specific linear methods for dimen-sionality reduction and simple classifiers in the reducedfeature space, one may get better recognition rates thanwith either the Linear Subspace method or the Eigenface

    method. Fishers Linear Discriminant (FLD) [5] is an exam-ple of a class specific method, in the sense that it tries toshape the scatter in order to make it more reliable for

  • 7/28/2019 fisherface-pami97

    4/10

    714 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, JULY 1997

    classification. This method selects W in [1] in such a waythat the ratio of the between-class scatter and the within-class scatter is maximized.

    Let the between-class scatter matrix be defined as

    S NB i ii

    c

    i

    T= - -

    =

    m m m mc hc h1

    and the within-class scatter matrix be defined as

    SW k iX

    k i

    T

    i

    c

    k i

    = - -

    =

    x xx

    m mc hc h1

    where m i is the mean image of class Xi , and Ni is the num-ber of samples in class Xi . If SW is nonsingular, the optimal

    projection Wopt is chosen as the matrix with orthonormal

    columns which maximizes the ratio of the determinant ofthe between-class scatter matrix of the projected samples tothe determinant of the within-class scatter matrix of theprojected samples, i.e.,

    WW S W

    W S Wopt

    W

    TB

    TW

    m

    =

    =

    arg max

    w w w1 2 K (4)

    where wi i m= 1 2, , ,Kn s is the set of generalized eigen-vectors ofSB and SW corresponding to the m largest gener-

    alized eigenvalues li i m= 1 2, , ,Kn s, i.e.,S S i mB i i W iw w= =l , , , ,1 2 K

    Note that there are at most c - 1 nonzero generalized eigen-

    values, and so an upper bound on m is c-

    1, where c is thenumber of classes. See [4].To illustrate the benefits of class specific linear projec-

    tion, we constructed a low dimensional analogue to theclassification problem in which the samples from each classlie near a linear subspace. Fig. 2 is a comparison of PCAand FLD for a two-class problem in which the samples fromeach class are randomly perturbed in a direction perpen-dicular to a linear subspace. For this example, N= 20, n = 2,and m = 1. So, the samples from each class lie near a linepassing through the origin in the 2D feature space. BothPCA and FLD have been used to project the points from 2Ddown to 1D. Comparing the two projections in the figure,

    PCA actually smears the classes together so that they are nolonger linearly separable in the projected space. It is clearthat, although PCA achieves larger total scatter, FLDachieves greater between-class scatter, and, consequently,classification is simplified.

    In the face recognition problem, one is confronted with

    the difficulty that the within-class scatter matrix SWn n

    R

    is always singular. This stems from the fact that the rank of

    SW is at most N- c, and, in general, the number of images

    in the learning set N is much smaller than the number ofpixels in each image n. This means that it is possible tochoose the matrix Wsuch that the within-class scatter of the

    projected samples can be made exactly zero.In order to overcome the complication of a singular SW,

    we propose an alternative to the criterion in (4). This

    method, which we call Fisherfaces, avoids this problem byprojecting the image set to a lower dimensional space sothat the resulting within-class scatter matrix SW is nonsin-

    gular. This is achieved by using PCA to reduce the dimen-

    sion of the feature space to N- c, and then applying the

    standard FLD defined by (4) to reduce the dimension to c - 1.More formally, Wopt is given by

    W W WoptT

    fld

    T

    pca

    T

    = (5)where

    W W S W

    WW W S W W

    W W S W W

    pcaW

    TT

    fldW

    TpcaT

    B pca

    TpcaT

    W pca

    =

    =

    arg max

    arg max

    Note that the optimization for Wpca is performed over

    n (N- c) matrices with orthonormal columns, while the

    optimization for Wfld is performed over (N- c) m matrices

    with orthonormal columns. In computing Wpca , we havethrown away only the smallest c - 1 principal components.

    There are certainly other ways of reducing the within-class scatter while preserving between-class scatter. Forexample, a second method which we are currently investi-gating chooses W to maximize the between-class scatter ofthe projected samples after having first reduced the within-class scatter. Taken to an extreme, we can maximize thebetween-class scatter of the projected samples subject to theconstraint that the within-class scatter is zero, i.e.,

    W W S W optW

    TB=

    arg max

    :

    (6)

    where : is the set ofnm matrices with orthonormal col-umns contained in the kernel ofSW.

    Fig. 2. A comparison of principal component analysis (PCA) andFishers linear discriminant (FLD) for a two class problem where datafor each class lies near a linear subspace.

  • 7/28/2019 fisherface-pami97

    5/10

    BELHUMEUR ET AL.: EIGENFACES VS. FISHERFACES: RECOGNITION USING CLASS SPECIFIC LINEAR PROJECTION 715

    3 EXPERIMENTAL RESULTS

    In this section, we present and discuss each of the afore-mentioned face recognition techniques using two differentdatabases. Because of the specific hypotheses that wewanted to test about the relative performance of the consid-ered algorithms, many of the standard databases were in-

    appropriate. So, we have used a database from the HarvardRobotics Laboratory in which lighting has been systemati-cally varied. Secondly, we have constructed a database atYale that includes variation in both facial expression andlighting.

    1

    3.1 Variation in Lighting

    The first experiment was designed to test the hypothesisthat under variable illumination, face recognition algo-rithms will perform better if they exploit the fact that im-ages of a Lambertian surface lie in a linear subspace. Morespecifically, the recognition error rates for all four algo-rithms described in Section 2 are compared using an im-

    age database constructed by Hallinan at the Harvard Ro-botics Laboratory [14], [15]. In each image in this data-base, a subject held his/her head steady while being illu-minated by a dominant light source. The space of lightsource directions, which can be parameterized by spheri-cal angles, was then sampled in 15$ increments. See Fig. 3.From this database, we used 330 images of five people (66of each). We extracted five subsets to quantify the effectsof varying lighting. Sample images from each subset areshown in Fig. 4.

    Subset 1 contains 30 images for which both the longitudi-nal and latitudinal angles of light source direction are

    within 15$

    of the camera axis, including the lighting1. The Yale database is available for download from http://cvc.yale.edu.

    direction coincident with the cameras optical axis.Subset 2 contains 45 images for which the greater of the

    longitudinal and latitudinal angles of light source di-rection are 30$ from the camera axis.

    Subset 3 contains 65 images for which the greater of thelongitudinal and latitudinal angles of light source di-rection are 45$ from the camera axis.

    Subset 4 contains 85 images for which the greater of thelongitudinal and latitudinal angles of light source di-rection are 60$ from the camera axis.

    Subset 5 contains 105 images for which the greater of thelongitudinal and latitudinal angles of light source di-rection are 75$ from the camera axis.

    For all experiments, classification was performed using a

    nearest neighbor classifier. All training images of an indi-

    Fig. 3. The highlighted lines of longitude and latitude indicate the lightsource directions for Subsets 1 through 5. Each intersection of a lon-gitudinal and latitudinal line on the right side of the illustration has acorresponding image in the database.

    Fig. 4. Example images from each subset of the Harvard Database used to test the four algorithms.

  • 7/28/2019 fisherface-pami97

    6/10

    716 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, JULY 1997

    vidual were projected into the feature space. The imageswere cropped within the face so that the contour of thehead was excluded.

    2For the Eigenface and correlation tests,

    the images were normalized to have zero mean and unitvariance, as this improved the performance of these meth-ods. For the Eigenface method, results are shown when tenprincipal components were used. Since it has been sug-gested that the first three principal components are primar-ily due to lighting variation and that recognition rates canbe improved by eliminating them, error rates are also pre-sented using principal components four through thirteen.

    We performed two experiments on the Harvard Data-base: extrapolation and interpolation. In the extrapolationexperiment, each method was trained on samples fromSubset 1 and then tested using samples from Subsets 1, 2,

    and 3.

    3

    Since there are 30 images in the training set, cor-relation is equivalent to the Eigenface method using 29principal components. Fig. 5 shows the result from thisexperiment.

    In the interpolation experiment, each method was trainedon Subsets 1 and 5 and then tested the methods on Subsets 2,3, and 4. Fig. 6 shows the result from this experiment.

    These two experiments reveal a number of interestingpoints:

    1) All of the algorithms perform perfectly when lightingis nearly frontal. However, as lighting is moved off

    2. We have observed that the error rates are reduced for all methods when

    the contour is included and the subject is in front of a uniform background.However, all methods performed worse when the background varies.3. To test the methods with an image from Subset 1, that image was removed

    from the training set, i.e., we employed the leaving-one-out strategy [4].

    axis, there is a significant performance differencebetween the two class-specific methods and the Ei-genface method.

    2) It has also been noted that the Eigenface method isequivalent to correlation when the number of Eigen-faces equals the size of the training set [17], and sinceperformance increases with the dimension of the ei-genspace, the Eigenface method should do no betterthan correlation [26]. This is empirically demonstratedas well.

    3) In the Eigenface method, removing the first threeprincipal components results in better performanceunder variable lighting conditions.

    4) While the Linear Subspace method has error rates thatare competitive with the Fisherface method, it re-

    quires storing more than three times as much infor-mation and takes three times as long.5) The Fisherface method had error rates lower than the

    Eigenface method and required less computation time.

    3.2 Variation in Facial Expression, Eye Wear, andLighting

    Using a second database constructed at the Yale Center forComputational Vision and Control, we designed tests to de-termine how the methods compared under a different rangeof conditions. For sixteen subjects, ten images were acquiredduring one session in front of a simple background. Subjectsincluded females and males (some with facial hair), and

    some wore glasses. Fig. 7 shows ten images of one subject.The first image was taken under ambient lighting in a neutralfacial expression and the person wore glasses. In the second

    Extrapolating from Subset 1

    Method Reduced Error Rate (%)Space Subset 1 Subset 2 Subset 3

    Eigenface 4 0.0 31.1 47.710 0.0 4.4 41.5

    Eigenface 4 0.0 13.3 41.5w/o 1st 3 10 0.0 4.4 27.7

    Correlation 29 0.0 0.0 33.9Linear Subspace 15 0.0 4.4 9.2

    Fisherface 4 0.0 0.0 4.6

    Fig. 5. Extrapolation: When each of the methods is trained on images with near frontal illumination (Subset 1), the graph and corresponding table showthe relative performance under extreme light source conditions.

  • 7/28/2019 fisherface-pami97

    7/10

    BELHUMEUR ET AL.: EIGENFACES VS. FISHERFACES: RECOGNITION USING CLASS SPECIFIC LINEAR PROJECTION 717

    image, the glasses were removed. If the person normallywore glasses, those were used; if not, a random pair was bor-

    rowed. Images 3-5 were acquired by illuminating the face ina neutral expression with a Luxolamp in three positions. Thelast five images were acquired under ambient lighting withdifferent expressions (happy, sad, winking, sleepy, and sur-prised). For the Eigenface and correlation tests, the imageswere normalized to have zero mean and unit variance, as thisimproved the performance of these methods. The imageswere manually centered and cropped to two different scales:The larger images included thefull face and part of the back-ground while the closely cropped ones included internalstructures such as the brow, eyes, nose, mouth, and chin, butdid not extend to the occluding contour.

    In this test, error rates were determined by the leaving-one-out strategy [4]: To classify an image of a person, that

    image was removed from the data set and the dimension-ality reduction matrix Wwas computed. All images in thedatabase, excluding the test image, were then projecteddown into the reduced space to be used for classification.Recognition was performed using a nearest neighbor classi-fier. Note that for this test, each person in the learning set isrepresented by the projection of ten images, except for thetest person who is represented by only nine.

    In general, the performance of the Eigenface methodvaries with the number of principal components. Thus, be-fore comparing the Linear Subspace and Fisherface methodswith the Eigenface method, we first performed an experi-

    Interpolating between Subsets 1 and 5Method Reduced Error Rate (%)

    Space Subset 2 Subset 3 Subset 4Eigenface 4 53.3 75.4 52.9

    10 11.11 33.9 20.0Eigenface 4 31.11 60.0 29.4w/o 1st 3 10 6.7 20.0 12.9

    Correlation 129 0.0 21.54 7.1Linear Subspace 15 0.0 1.5 0.0

    Fisherface 4 0.0 0.0 1.2

    Fig. 6. Interpolation: When each of the methods is trained on images from both near frontal and extreme lighting (Subsets 1 and 5), the graph andcorresponding table show the relative performance under intermediate lighting conditions.

    Fig. 7. The Yale database contains 160 frontal face images covering 16 individuals taken under 10 different conditions: A normal image underambient lighting, one with or without glasses, three images taken with different point light sources, and five different facial expressions.

  • 7/28/2019 fisherface-pami97

    8/10

    718 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, JULY 1997

    ment to determine the number of principal componentsyielding the lowest error rate. Fig. 8 shows a plot of errorrate vs. the number of principal components, for the closelycropped set, when the initial three principal componentswere retained and when they were dropped.

    The relative performance of the algorithms is self evidentin Fig. 9. The Fisherface method had error rates that werebetter than half that of any other method. It seems that the

    Fisherface method chooses the set of projections which per-forms well over a range of lighting variation, facial expres-sion variation, and presence of glasses.

    Note that the Linear Subspace method faired compara-tively worse in this experiment than in the lighting experi-ments in the previous section. Because of variation in facialexpression, the images no longer lie in a linear subspace.Since the Fisherface method tends to discount those por-tions of the image that are not significant for recognizing anindividual, the resulting projections W tend to mask theregions of the face that are highly variable. For example, the

    area around the mouth is discounted, since it varies quite abit for different facial expressions. On the other hand, thenose, cheeks, and brow are stable over the within-class

    Fig. 8. As demonstrated on the Yale Database, the variation in performance of the Eigenface method depends on the number of principal compo-nents retained. Dropping the first three appears to improve performance.

    Leaving-One-Out of Yale Database

    Method Reduced Error Rate (%)Space Close Crop Full Face

    Eigenface 30 24.4 19.4Eigenfacew/o 1st 3 30 15.3 10.8

    Correlation 160 23.9 20.0Linear

    Subspace48 21.6 15.6

    Fisherface 15 7.3 0.6

    Fig. 9. The graph and corresponding table show the relative performance of the algorithms when applied to the Yale Database which containsvariation in facial expression and lighting.

  • 7/28/2019 fisherface-pami97

    9/10

    BELHUMEUR ET AL.: EIGENFACES VS. FISHERFACES: RECOGNITION USING CLASS SPECIFIC LINEAR PROJECTION 719

    variation and are more significant for recognition. Thus, weconjecture that Fisherface methods, which tend to reducewithin-class scatter for all classes, should produce projec-tion directions that are also good for recognizing other facesbesides the ones in the training set.

    All of the algorithms performed better on the images ofthe full face. Note that there is a dramatic improvement in

    the Fisherface method where the error rate was reducedfrom 7.3 percent to 0.6 percent. When the method is trainedon the entire face, the pixels corresponding to the occludingcontour of the face are chosen as good features for dis-criminating between individuals, i.e., the overall shape ofthe face is a powerful feature in face identification. As apractical note, however, it is expected that recognition rateswould have been much lower for the full face images if thebackground or hair styles had varied and may even havebeen worse than the closely cropped images.

    3.3 Glasses Recognition

    When using class specific projection methods, the learningset can be divided into classes in different manners. Forexample, rather than selecting the classes to be individualpeople, the set of images can be divided into two classes:wearing glasses and not wearing glasses. With only twoclasses, the images can be projected to a line using theFisherface methods. Using PCA, the choice of the Eigenfacesis independent of the class definition.

    In this experiment, the data set contained 36 imagesfrom a superset of the Yale Database, half with glasses. Therecognition rates were obtained by cross validation, i.e., toclassify the images of each person, all images of that personwere removed from the database before the projection ma-

    trix W was computed. Table 1 presents the error rates fortwo different methods.

    TABLE 1COMPARATIVE RECOGNITION ERROR RATES FOR GLASSES/

    NO GLASSES RECOGNITION USING THE YALE DATABASE

    Glasses Recognition

    Method Reduced Space Error Rate(%)

    PCA 10 52.6Fisherface 1 5.3

    PCA had recognition rates near chance, since, in mostcases, it classified both images with and without glasses to

    the same class. On the other hand, the Fisherface methodscan be viewed as deriving a template which is suited forfinding glasses and ignoring other characteristics of the face.This conjecture is supported by observing the Fisherface inFig. 10 corresponding to the projection matrix W. Naturally,it is expected that the same techniques could be applied toidentifying facial expressions where the set of training im-ages is divided into classes based on the facial expression.

    4 CONCLUSION

    The experiments suggest a number of conclusions:

    1) All methods perform well if presented with an imagein the test set which is similar to an image in the train-ing set.

    2) The Fisherface method appears to be the best at ex-trapolating and interpolating over variation in lighting,although the Linear Subspace method is a close second.

    3) Removing the largest three principal components doesimprove the performance of the Eigenface method inthe presence of lighting variation, but does notachieve error rates as low as some of the other meth-ods described here.

    4) In the limit, as more principal components are used inthe Eigenface method, performance approaches thatof correlation. Similarly, when the first three principalcomponents have been removed, performance im-proves as the dimensionality of the feature space is in-creased. Note, however, that performance seems tolevel off at about 45 principal components. Sirovitchand Kirby found a similar point of diminishing returnswhen using Eigenfaces to represent face images [6].

    5) The Fisherface method appears to be the best at simul-

    taneously handling variation in lighting and expres-sion. As expected, the Linear Subspace method sufferswhen confronted with variation in facial expression.

    Even with this extensive experimentation, interestingquestions remain: How well does the Fisherface methodextend to large databases. Can variation in lighting condi-tions be accommodated if some of the individuals are onlyobserved under one lighting condition?

    Additionally, current face detection methods are likely tobreak down under extreme lighting conditions such as Sub-sets 4 and 5 in Fig. 4, and so new detection methods areneeded to support the algorithms presented in this paper.

    Finally, when shadowing dominates, performance degradesfor all of the presented recognition methods, and techniquesthat either model or mask the shadowed regions may beneeded. We are currently investigating models for repre-senting the set of images of an object under all possible illumi-nation conditions, and have shown that the set ofn-pixel im-ages of an object of any shape and with an arbitrary reflec-tance function, seen under all possible illumination condi-tions, forms a convex cone in Rn [32]. Furthermore, and mostrelevant to this paper, it appears that this convex illuminationcone lies close to a low-dimensional linear subspace [14].

    ACKNOWLEDGMENTSP.N. Belhumeur was supported by ARO grant DAAH04-95-1-0494. J.P. Hespanha was supported by the U.S. National

    Fig. 10. The left image is an image from the Yale Database of a personwearing glasses. The right image is the Fisherface used for determin-ing if a person is wearing glasses.

  • 7/28/2019 fisherface-pami97

    10/10

    720 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, JULY 1997

    Science Foundation Grant ECS-9206021, AFOSR GrantF49620-94-1-0181, and ARO Grant DAAH04-95-1-0114. D.J.Kriegman was supported by the U.S. National ScienceFoundation under an NYI, IRI-9257990 and by ONR N00014-93-1-0305. The authors would like to thank Peter Hallinan forproviding the Harvard Database, and Alan Yuille and DavidMumford for many useful discussions.

    REFERENCES

    [1] R. Chellappa, C. Wilson, and S. Sirohey, Human and MachineRecognition of Faces: A Survey, Proc. IEEE, vol. 83, no. 5, pp. 705-740, 1995.

    [2] A. Samal and P. Iyengar, Automatic Recognition and Analysis ofHuman Faces and Facial Expressions: A Survey, Pattern Recogni-tion, vol. 25, pp. 65-77, 1992.

    [3] A. Shashua, Geometry and Photometry in 3D Visual Recognition,PhD thesis, Massachusetts Institute of Technology, 1992.

    [4] R. Duda and P. Hart, Pattern Classification and Scene Analysis. NewYork: Wiley, 1973.

    [5] R.A. Fisher, The Use of Multiple Measures in Taxonomic Prob-lems,Ann. Eugenics, vol. 7, pp. 179-188, 1936.

    [6] L. Sirovitch and M. Kirby, Low-Dimensional Procedure for theCharacterization of Human Faces,J. Optical Soc. of Am. A, vol. 2,pp. 519-524, 1987.

    [7] M. Turk and A. Pentland, Eigenfaces for Recognition,J. Cogni-tive Neuroscience, vol. 3, no. 1, 1991.

    [8] M. Turk and A. Pentland, Face Recognition Using Eigenfaces,Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 1991,pp. 586-591.

    [9] Y. Moses, Y. Adini, and S. Ullman, Face Recognition: The Prob-lem of Compensating for Changes in Illumination Direction,European Conf. Computer Vision, 1994, pp. 286-296.

    [10] Y. Cheng, K. Liu, J. Yang, Y. Zhuang, and N. Gu, Human FaceRecognition Method Based on the Statistical Model of Small Sam-ple Size, SPIE Proc. Intelligent Robots and Computer Vision X: Algo-rithms and Technology, 1991, pp. 85-95.

    [11] S. Baker and S.K. Nayar, Pattern Rejection, Proc. IEEE Conf.

    Computer Vision and Pattern Recognition, 1996, pp. 544-549.[12] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, Eigenfacesvs. Fisherfaces: Recognition Using Class Specific Linear Projection,European Conf. Computer Vision, 1996, pp. 45-58.

    [13] Y. Cui, D. Swets, and J. Weng, Learning-Based Hand Sign Rec-ognition Using SHOSLIF-M, Intl Conf. on Computer Vision, 1995,pp. 631-636.

    [14] P. Hallinan, A Low-Dimensional Representation of Human Facesfor Arbitrary Lighting Conditions, Proc. IEEE Conf. Computer Visionand Pattern Recognition, 1994, pp. 995-999.

    [15] P. Hallinan, A Deformable Model for Face Recognition UnderArbitrary Lighting Conditions, PhD thesis, Harvard Univ., 1995.

    [16] A. Pentland, B. Moghaddam, and Starner, View-Based andModular Eigenspaces for Face Recognition, Proc. IEEE Conf.Computer Vision and Pattern Recognition, 1994, pp. 84-91.

    [17] H. Murase and S. Nayar, Visual Learning and Recognition of 3-D Ob-

    jects from Appearance, Intl J. Computer Vision, vol. 14, pp. 5-24, 1995.[18] D. Beymer, Face Recognition Under Varying Pose, Proc. IEEEConf. Computer Vision and Pattern Recognition, 1994, pp. 756-761.

    [19] A. Gee and R. Cipolla, Determining the Gaze of Faces in Images,Image and Vision Computing, vol. 12, pp. 639-648, 1994.

    [20] A. Lanitis, C.J. Taylor, and T.F. Cootes, A Unified Approach toCoding and Interpreting Face Images, Intl Conf. Computer Vision,1995, pp. 368-373.

    [21] Q. Chen, H. Wu, and M. Yachida, Face Detection by Fuzzy PatternMatching, Intl Conf. Computer Vision, 1995, pp. 591-596.

    [22] I. Craw, D. Tock, and A. Bennet, Finding Face Features, Proc.European Conf. Computer Vision, 1992, pp. 92-96.

    [23] T. Leung, M. Burl, and P. Perona, Finding Faces in ClutteredScenes Using Labeled Random Graph Matching, Intl Conf. Com-

    puter Vision, 1995, pp. 637-644.[24] K. Matsuno, C.W. Lee, S. Kimura, and S. Tsuji, Automatic Rec-

    ognition of Human Facial Expressions, Intl Conf. Computer Vision,1995, pp. 352-359.[25] B. Moghaddam and A. Pentland, Probabilistic Visual Learning for

    Object Detection, Intl Conf. Computer Vision, 1995, pp. 786-793.

    [26] R. Brunelli and T. Poggio, Face Recognition: Features vs. Tem-plates, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15,no. 10, pp. 1,042-1,053, Oct. 1993.

    [27] J.M. Gilbert and W. Yang, A Real-Time Face Recognition SystemUsing Custom VLSI Hardware, Proc. IEEE Workshop on Computer

    Architectures for Machine Perception, 1993, pp. 58-66.[28] B.K.P. Horn, Computer Vision. Cambridge, Mass.: MIT Press, 1986.[29] W.M. Silver, Determining Shape and Reflectance Using Multiple Im-

    ages, PhD thesis, Massachusetts Institute of Technology, 1980.

    [30] R.J. Woodham, Analysing Images of Curved Surfaces,ArtificialIntelligence, vol. 17, pp. 117-140, 1981.

    [31] S. Nayar and H. Murase, Dimensionality of Illumination in Ap-pearance Matching, IEEE Conf. Robotics and Automation, 1996.

    [32] P.N. Belhumeur and D.J. Kriegman, What is the Set of Images ofan Object under all Possible Lighting Conditions?, IEEE Proc.Conf. Computer Vision and Pattern Recognition, 1996.

    Peter N. Belhumeur received his ScB degree(with highest honors) in computer and informationengineering from Brown University in 1985. Hereceived an SM and PhD from Harvard Universityin 1991 and 1993, respectively, where he studiedunder a Harvard Fellowship. In 1994, he spend ahalf-year as a Postdoctoral Fellow at the Univer-sity of Cambridges Sir Isaac Newton Institute for

    Mathematical Sciences.Currently, he is an assistant professor of elec-

    trical engineering at Yale University. He teachescourses on signals and systems, pattern and object recognition, andcomputer vision. He is a member of the Yale Center for ComputationalVision and Control and a member of the Army research Center for Imag-ing Science. He has published over twenty papers on image processingand computational vision. He is a recipient of a U.S. National ScienceFoundation Career Award; he has been awarded a Yale University JuniorFaculty Fellowship for the Natural Sciences; and he won the IEEE BestPaper Award for his work on characterizing the set of images of an objectunder variable illumination.

    Joao~ P. Hespanha received the Licenciatura andMS degrees in electrical and computer engineer-ing from Instituto Superior Tcnico, Lisbon, Portu-

    gal, in 1991 and 1993, respectively, and a secondMS degree in electrical engineering from YaleUniversity, New Haven, Connecticut, in 1994,where he is currently pursuing the PhD degree inengineering and applied science.

    From 1987 to 1989, he was a research assis-tant at Instituto de Engenharia de Sistemas eComputadores (INESC) in Lisbon, Portugal, and,

    from 1989 to 1990, was an instructor at Fundo para o DesenvolvimentoTecnolgico (FUNDTEC) in the areas of electronics, computer science,and robotics. From 1992 to 1993, he was a working partner in Sociedadede Projectos em Sistemas e Computadores, Lda., also in Lisbon. Hisresearch interests include nonlinear control, both robust and adaptive,hybrid systems, switching control, and the application of vision to robotics.

    David J. Kriegman received the BSE degree

    (summa cum laude) in electrical engineering andcomputer science in 1983 from Princeton Uni-versity, where he was awarded the Charles IraYoung Award for electrical engineering research.He received the MS degree in 1984 and PhD in1989 in electrical engineering from StanfordUniversity, where he studied under a HertzFoundation Fellowship.

    Currently, he is an associate professor at theCenter for Computational Vision and Control in the

    Departments of Electrical Engineering and Computer Science at YaleUniversity and was awarded a U.S. National Science Foundation YoungInvestigator Award in 1992. His paper on characterizing the set of imagesof an object under variable illumination received the best paper at the1996 IEEE Conference on Computer Vision and Pattern Recognition. Hehas published over sixty papers on object representation and recognition,

    illumination modeling, geometry of curves and surfaces, structure frommotion, robot planning, and mobile robot navigation.