Post on 08-Apr-2018
8/6/2019 Eshan Content
1/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 1
CHAPTER 1
INTRODUCTION
PedaniusDioscorides wrote perhaps theearliest known field guide for botanists, giving pictures
and writtendescriptions of nearly 600 plant species, showing howeach could be found and
identified. This work laid out the rules of modern taxonomy. All of these workshave in common
an effort to teach the reader how toidentify a plant or animal by describable aspects of itsvisual
appearance.
While the use of describable visual attributes foridentification has been around since
antiquity, it hasnot been the focus of work by researchers in computervision and related
disciplines.Existing methodswork byextracting low-level features in images, such as pixelvalues,
gradient directions, histograms of orientedgradients [6], SIFT [7], etc., which are then used
todirectly train classifiers for identification or detection.
In contrast, the use of low-level image features to first learn intermediate representations
in whichimages are labeled with an extensive list of descriptivevisual attributes, is the method
used in this paper. Although these attributes couldclearly be useful in a variety of domains (such
asobject recognition, species identification, architecturaldescription, action recognition, etc.), we
focus solelyon faces in this paper.These face attributes can rangefrom simple demographic
information such as gender,age, or ethnicity; to physical characteristics of a facesuch as nose
size, mouth shape, or eyebrow thickness;and even to environmental aspects such as
lightingconditions, facial expression, or image quality.
The attributes are used to train a classifier which would then be used to identify attributes
from new images.The classifier outputscan then be used to identify faces and search throughlarge
image collections, and they also seem promisingfor use in many other tasks such as image
explorationor automatic description-generation. The attributes can be combined to produce
descriptionsat multiple levels, including object categories, objects,or even instances of objects.
8/6/2019 Eshan Content
2/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 2
CHAPTER 2
IMAGE ATTRIBUTES
2.1Why might one need these attributes?
y Visual attributes much like words can be composed, offering tremendousflexibility and efficiency. Attributes can be combined to produce descriptions at
multiple levels, including object categories, objects, or even instances of objects.
For example, one can describe white male at the category level (a set of
people), or white male brown-hair green-eyes scar on forehead at the object
level (a specific person), or smiling lit-from-above seen-from-left to theprevious for an instance of the object (a particular image of a person). Refer fig.
1.1
y Moreover, attributes are generalizable; one can learn a set of attributes from largeimage collections and then apply them in almost arbitrary combinations to novel
images, objects, or categories. Perhaps most importantly, these attributes can be
chosen to align with the domain-appropriate vocabulary that people have
developed over time for describing different types of objects. For faces, this
includes descriptions at the coarsest level (such as gender and age) to more subtle
aspects (such as expressions and shape of face parts) to highly face-specific marks
(such as moles and scars).
y While describable visual attributes are one of the most natural ways of describingfaces, a persons appearance can also be described in terms of the similarity of a
part of their face to the same part of another individuals. For example, someones
mouth might be like Angelina Jolies, or their nose like Brad Pitts. Dissimilarities
also provide useful information e.g., her eyes are not like Jennifer Anistons.
We call these similes.
8/6/2019 Eshan Content
3/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 3
2.2Uses of trained classifiers
Two major uses of classifierstrained on describable visual attributes and similes are: face
verification and image search.
2.2.1 Face VerificationFace verificationis the problem of determining whether two faces areof the same individual.
What makes this problemdifficult is the enormous variability in the mannerin which an
individuals face presents itself to acamera: not only might the pose differ, but so mightthe
expression and hairstyle. Making matters worse at least for researchers in biometrics is that
theillumination direction, camera type, focus, resolution,and image compression are all almost
certain to varyas well. These manifold differences in images of thesame person have confounded
methods for automaticface recognition and verification, often limiting thereliability of automatic
algorithms to the domain ofmore controlled settings with cooperative subjects.We approach the
unconstrained face verificationproblem (with non-cooperative subjects) by comparingfaces using
our attribute and simile classifieroutputs, instead of low-level features directly. Fig. 1.1shows the
outputs of various attribute classifiers, for(a) two images of the same person and (b) images
oftwo different people. Note that in (a), most attributevalues are in strong agreement, despite the
changes inpose, illumination, and expression, while in (b), thevalues are almost perfectly
contrasting. By traininga classifier that uses these labels as inputs for faceverification, we
achieve close to state-of-the-art performanceon the Labeled Faces in theWild (LFW) dataset [8],
at 85.54% accuracy.
2.2.2 Image Search
Another application of describable visual attributesis image search. The ability of current search
enginesto find images based on facial appearance is limitedto images with text annotations. Yet,
there aremany problems with annotation-based image search:
y the manual labeling of images is time-consuming;y the annotations are often incorrect or misleading, as they may refer to other content on a
webpage;
8/6/2019 Eshan Content
4/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 4
y and finally, the vast majority of images are simply not annotated.
Figs. 1.2a and 1.2b show the results ofthe query, smiling asian men with glasses,
usinga conventional image search engine (Yahoo ImageSearch, as of November 2010) and our
search engine,respectively. The difference in quality of search resultsis clearly visible. Yahoos
reliance on text annotationscauses it to find some images that have no relevanceto the query,
while our system returns only the imagesthat match the query. In addition, many of the
correctresults on Yahoo point to stock photography websites,which can afford to manually label
their images withkeywords but only because they have collectionsof a limited size, and they
label only the coarsestattributes. Clearly, this approach does not scale.
Fig. 1.1 An attribute classifier can be trained to recognize the presence or absence of adescribable visual attribute. Theresponses of several such attribute classifiers are shown for (a)
two images of the same person and (b) two imagesof different individuals. In (a), notice howmost attribute values are in strong agreement, despite the changes in pose,illumination,
expression, and image quality. Conversely, in (b), the values differ completely despite thesimilarity in thesesame environmental aspects. We train a verification classifier on these outputs
to perform face verification, achieving 85.54%accuracy on the Labeled Faces in the Wild (LFW)benchmark [8], comparable to the state-of-the-art.
8/6/2019 Eshan Content
5/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 5
Fig. 1.2 Results for the query, smiling Asian men with glasses, using (a) the Yahoo image
search engine (as of November2010) and (b) our face search engine. Conventional image search
engines rely on text annotations, such as file metadata,manual labels, or surrounding text, whichare often incorrect, ambiguous, or missing. In contrast, we use attribute classifierstoautomaticallylabel images with faces in them, and store these labels in a database. At search
time, only this databaseneeds to be queried, and results are returned instantaneously. Theattribute-based search results are much more relevant tothe query.
8/6/2019 Eshan Content
6/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 6
CHAPTER 3
OVERVIEW OF RELAVANT WORK
3.1 Attribute Classification
Prior research on attribute classification has focusedmostly on gender and ethnicity
classification. Earlyworks used neural networks to performgender classification on small
datasets. The Fisherfaceswork [2] showed that linear discriminant analysiscould be used for
simple attribute classificationsuch as glasses/no glasses. Later, Moghaddam andYang [3] used
Support Vector Machines (SVMs) trained on small face-prints to classify the genderof a face,
showing good results. The works of Shakhnarovich and Baluja and Rowley used Adaboost [4] to
selecta linear combination of weak classifiers, allowingfor almost real-time classification of face
attributes,with results in the latter case again demonstrated onthe FERET database. These
methods differ in theirchoice of weak classifiers: the former uses the Haarlikefeatures of the
Viola-Jones face detector [5], whilethe latter uses simple pixel comparison operators. Ina more
general setting, Ferrari and Zisserman [6]described a probabalistic approach for learning
simpleattributes such as colors and stripes.
In contrast to both of these approaches,which are trying to find relations across differentcategories,we concentrate on finding relations betweenobjects in a single category: faces.Faces
have many advantages compared to genericobject categories.
y There is a well-established and consistentreference frame to use for aligning images;y differentiating objects is conceptually simple (e.g., itsunclear whether two cars of the
same model shouldbe considered the same object or not, whereas no suchdifficulty exists
for two faces);
y most attributes canbe shared across all people (unlike, e.g., 4-legged,gothic, or dual-exhaust, which are applicable toanimals, architecture, and automobiles, respectively
but not to each other).
All of these benefits makeit possible for us to train more reliable and usefulclassifiers,
and demonstrate results comparable to thestate-of-the-art.
8/6/2019 Eshan Content
7/29
8/6/2019 Eshan Content
8/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 8
CHAPTER 4
BUILDING THE IMAGE DATASET
Internet services have madecollecting and labeling image data easy in the following ways:
y Large internet photo-sharing sites such as flickr.com and picasa.com are growingexponentially and host billions of public images, some with textual annotations and
comments. In addition, search engines such as Google Images allow searching for images
of particular people (albeit not perfectly).
y Efficient marketplaces for online labor, such as Amazons Mechanical Turk (MTurk),make it possible to label thousands of images easily and with very low overhead.
By exploiting both of these trends, we could create a large dataset of real-world images with
attribute and identity labels, as shown in Fig. 4.1 and described next.
Fig. 4.1. Creating labeled image datasets: Our system downloads images from the internet. Theseimages span many sources of variability, including pose, illumination, expression, cameras, and
environment. Next, faces and fiducial points are detectedusing a commercial detector and storedin the Face Database. A subset of these faces are submitted to theAmazon Mechanical Turk
service, where they are labeled with attributes or identity, which are used to create the datasets.
4.1. Collecting Face Images
4.1.1 Procedure
y Using a variety of online sources, we collect face images, including search engines suchas Yahoo Images and photo-sharing websites such as flickr.com. Depending on the type
of data needed, one can either search for particular peoples names (to build a dataset
8/6/2019 Eshan Content
9/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 9
labeled by identity) or for default image filenames assigned by digital cameras (to use for
labeling with attributes).
y Next, we apply the OKAO face detector [40] to the downloaded images to extract faces.This detector also returns the pose angles of each face, as well as the locations of six
fiducial points: the corners of both eyes and the corners of the mouth. These fiducial
points are used to align faces to a canonical pose.
The 3.1 millionaligned faces collected using this procedure comprisethe Columbia Face
Database.
4.1.2 Conclusion
The following empirical conclusions can be arrived at from the dataset of images
y From the statistics of the randomly-named images, it appears that a significant fractionof them contain faces (25.7%), and on average, each image contains 0.5 faces. Thus, it is
clear that faces are ubiquitous and an important case to understand.
y The dataset is truly a realworld dataset, with completely uncontrolled lighting andenvironments, taken using unknown cameras andunknownlighting conditions unlike
existing image datasets.
4.2Collecting attribute and identity labels
For labeling images in the Columbia Face Database, the Amazon Mechanical Turk (MTurk)
service is used.This service matches workers to online jobs createdby requesters, who can
optionally set quality controlssuch as requiring confirmation of results by multipleworkers, filters
on minimum worker experience, etc.We submitted 110, 000 attribute labeling jobs showing30
images to 3 workers per job, presenting atotal of over 10 million images to users. The jobsasked
workers to select face images which exhibiteda specified attribute.
8/6/2019 Eshan Content
10/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 10
4.3The FaceTracer Dataset
The FaceTracer dataset is a subset of the ColumbiaFace Database, and it includes attribute labels.
Eachof the 15, 000 faces in the dataset has a variety ofmetadata and fiducial points marked. The
attributeslabeled include demographic information such as ageand race, facial features like
mustaches and hair color,and other attributes such as expression, environment,etc. There are 5,
000 labels in all.
4.4 ThePubFig Dataset
The PubFig dataset is a complement to the LFWdataset [8]. It consists of 58, 797 images of 200
publicfigures. The larger number of images per person (ascompared to LFW) allows one to
construct subsets ofthe data across different poses, lighting conditions,and expressions for further
study. Figure 4.2 c showsthe variation present in all the images of a singleindividual. In addition,
this dataset is well-suited forrecognition experiments.
Fig 4.2(a) PubFig Development set (60 individuals)
8/6/2019 Eshan Content
11/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 11
Fig 4.2(b) PubFig Evaluation set (140 individuals)
Fig 4.2(c) All 170 images of Steve Martin
Fig. 4.2 The PubFig dataset consists of 58, 797 images of 200 public figures celebrities and
politicians partitioned into (a)a development set of 60 individuals and (b) an evaluation set of140 individuals. Below each thumbnail is shown the numberof photos of that person. There is no
overlap in either identity or image between the development set and any dataset thatwe evaluateon, including Labeled Faces in the Wild (LFW) [8]. The immense variability in appearance
captured by PubFigcan be seen in (c), which shows all 170 images of Steve Martin(a celebrity).
8/6/2019 Eshan Content
12/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 12
CHAPTER 5
TRAINING THE CLASSIFIER
Given a particular describable visual attribute saygender how can one train a classifier for
the attribute?Attributes can be thought of as functions ai that mapimages I to real values ai.
Large positive values of aiindicate the presence or strength of the ith attribute,while negative
values indicate its absence.
5.1Training Architecture:
An overview of the attribute training architecture isshown in Fig. 5.1 The key idea is to leverage
the manyefficient and effective low-level features that havebeen developed by the computer
vision community,choosing amongst a large set of them to find the onessuited for learning a
particular attribute. This processshould ideally be done in a generic, application- anddomain-
independent way, but with the ability to takeadvantage of domain-specific knowledge where
available.
Fig. 5.1 Overview of attribute training architecture. Given a set of labeled positive and negativetraining images, low-levelfeature vectors are extracted using a large pool of low-level featureoptions.. An automatic, iterative selection process then picks the best set offeatures for correctly
classifying the input data. The outputs are the selected features and the trained attribute classifier.
8/6/2019 Eshan Content
13/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 13
For the domain of faces, this knowledge consistsof an affine alignment procedure and the
use of lowlevelfeatures which have proven to be very usefulin a number of leading vision
techniques, especiallyfor faces. The alignment takes advantage of the factthat all faces have
common structure i.e., two eyes,a nose, a mouth, etc. and that we have fiducial
pointdetections available from a face detector [40].
5.2Low Level Features
Face images are first alignedusing an affine transformation. A set of k low-levelfeature
extractorsfjare applied to an aligned inputimage I to form a feature set F(I):
F(I) = {f1(I), , fk(I)}
We describe each extractorfjin terms of four choices:
y the region of the face to extract features fromy the type of pixel data to usey the kind of normalization to apply to the datay the level of aggregation to use.The complete set of our 10 regions are shown inFig. 5.2The regions correspond to functional
parts of aface, such as the nose, mouth, etc.From each region, one can extract different typesof
information, as categorized in Table 5.1. The typesof pixel data to extract include various color
spaces(RGB, HSV) as well as edge magnitudes and orientations.To remove lighting effects and
better generalizeacross a limited number of training images,one can optionally normalize these
extracted values.
Finally, one can aggregate normalized values overthe region rather than simply concatenating
them.This can be as simple as using only the mean andvariance, or include more information by
computinga histogram of values over the region. A completefeature type is created by choosing a
region from Fig. 5.2 and one entry from each column of Table 5.1.(Of course,not all possible
combinations are valid).
8/6/2019 Eshan Content
14/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 14
Fig. 5.2The face regions used for automatic feature selectionare shown here on an affine-aligned
face image. There is (a)one region for the whole face, and (b) nine regions correspondingtofunctional parts of the face, such as the mouth,eyes, nose, etc. Regions are large enough to
contain the facepart across changes in pose, small errors in alignment, anddifferences betweenindividuals. The regions are manuallydefined, once, in the affine-aligned coordinate system,
andcan then be used automatically for all aligned input faces.
Table 5.1 Feature type options. A complete feature type isconstructed by first converting the
pixels in a givenregion (see Fig. 5.2) to one of the pixel value types fromthe first column, thenapplying one of thenormalizations from the second column, and finallyaggregating these values
into the output featurevector using one of the options from the last column.
5.3 Attribute Classifiers
Attribute classifiers Ci are built using a supervisedlearning approach. Training requires a set of
labeledpositive and negative images for each attribute, examplesof which are shown in Fig. 5.3
The goal is tobuild a classifier that best classifies this training databy choosing an appropriate
Pixel Value Types Normalizations Aggregation
RGB
HSVImage Intensity
Edge MagnitudeEdge Orientation
None
Mean NormalizationEnergy Normalization
None
HistogramMean/Variance
8/6/2019 Eshan Content
15/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 15
subset of the feature setF(I) described in the previous section. We do thisiteratively using
forward feature selection. In eachiteration, we first train several individual classifierson the
current set of features in the output set, concatenatedwith a single region-feature combination.
Each classifiers performance is evaluated using crossvalidation.The features used in the
classifier withthe highest cross-validation accuracy are added tothe output set. We continue
adding features until theaccuracy stops improving. For computational reasons, we dropthe
lowest-scoring 70% of features at each round, butalways keeping at least 10 features.
While the design of the classifier architecture has been flexible enough to handle a large
variety ofattributes, it is important to ensure that we havenot sacrificed accuracy in the process.
We thereforecompare our approach to three previous state-of-the-artmethods for attribute
classification: full-face SVMsusing brightness normalized pixel values, Adaboost using Haar-
like features and Adaboostusing pixel comparison features.Results are shown in Table 5.2.
Using the Columbia Face Database and the learning procedure just described, a total of
73 attribute classifiers have been trained. Their cross-validation accuraciesare shown in Table
5.3, and typically range from 80%to 90% (random performance would be 50% for eachattribute).
Fig. 5.3 Training data for the attribute classifiers consists offace images that match the givenattribute label (positiveexamples) and those that dont (negative examples). Shownhere are a few
of the training images used for four differentattributes. Final classifier accuracies for all 73attributes areshown in Table 5.3.
8/6/2019 Eshan Content
16/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 16
Table 5.2.Comparison of attribute classification performance forgender&smiling attributes.
Table 5.3. Cross-validation accuracies of the 73 attribute classifier.
5.4 Simile Classifiers
8/6/2019 Eshan Content
17/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 17
Simile classifiers measure the similarity of part of apersons face to the same part on a set of
referencepeople. We use the 60 individuals from the developmentset ofPubFig as the reference
people. The left part of Fig. 5.4 shows examples of four regions selectedfrom two reference
people as positive examples. Onthe right are negative examples, which are simply thesame
region extracted from other individuals images.
Fig. 5.4. Each simile classifier is trained using several imagesof a specific reference person,
limited to a small face regionsuch as the eyes, nose, or mouth. We show here threepositive andthree negative examples each, for four regionson two of the reference people used to train these
classifiers.
y the individuals chosen as reference people do not appearin LFW or other benchmarks onwhich we produce results.
y train simile classifiers to recognize similarity topartof a reference persons face in manymages, not similarity to a single image.
For each reference person, we train support vectormachines to distinguish a region (e.g.,
eyebrows, eyes,nose, mouth) on their face from the same regionon other faces.
CHAPTER 6
8/6/2019 Eshan Content
18/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 18
FACE VERIFICATION
Challenges faced with the existing methods of face verification by the method are these
twofaces of the same person often make mistakes thatwould seem to be avoidable: men being
confused forwomen, young people for old, asians for caucasians,etc. On the other hand, small
changes in pose, expression,or lighting can cause two otherwise similarimages of the same
person to be misclassified by analgorithm as different. Based on this observation, it is
hypothesized that the attribute and simile classifierscould avoid such mistakes.
6.1 Training a Verification Classifier
Fig. 6.1 illustrates how attribute-based face verificationis performed on a new pair of input
mages. In orderto decide whether two face images I1 and I2 showthe same person, one can train
a verification classifierV that compares attribute vectors C(I1) and C(I2)and returns v(I1, I2), the
verification decision. Thesevectors are constructed by concatenating the result ofn different
attribute and/or simile classifiers.
To build V , let us make some observations aboutthe particular form of our classifiers:
y Values Ci(I1) and Ci(I2) from the ith classifier should be similar if the images are of thesame individual, and different otherwise.
y Classifier values are raw outputs of binary classifiers, where the objective function istrying to separate examples around 0.
Let ai = Ci(I1) and bi = Ci(I2) be the outputs of theith trait classifier for each face (1 i
n). One wouldlike to combine these values in such a way that oursecond-stage verification
classifier V can make senseof the data. This means creating values that are large(and positive)when the two inputs are of the sameindividual, and negative otherwise. From observationwe see
that using the absolute difference |ai bi|will yield the desired outputs and the product aibi.
Putting both terms together yieldsthe tuple pi:
pi = {|ai bi|, aibi} (2)
8/6/2019 Eshan Content
19/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 19
The concatenation of these tuples for all n attribute/simile classifier outputs forms the
input to theverification classifier V :
v(I1, I2) = V {(p1, . . . , pn)} (3)
Training V requires pairs of positive examples (twoimages of the same person) and
negative examples(images of two different people).
Fig. 6.1 The face verification pipeline. A pair of input images are run through a face and fiducialdetector [40], and the fiducialsare then used to align both faces to a canonical coordinate system.The aligned face images are fed to each of our attributeand simile classifiers individually to
obtain a set of attribute values. Finally, these values are compared using a verificationclassifier tomake the output determination, which is returned along with the distance to the decision
boundary. The entireprocess is fully automatic.
6.2 Experimental Setup
8/6/2019 Eshan Content
20/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 20
Face verification experiments are performed on the LabeledFaces in the Wild (LFW) benchmark
[8] andalso on thePubFig benchmark. For each computationalexperiment, a set of pairs of face
images ispresented for training, and a second set of pairs ispresented for testing.
6.3 Attribute Classifier Results on LFW
Fig. 6.2 shows results on LFW for our attributeclassifiers (red line), simile classifiers (blue line),
anda hybrid of the two (green line), along with several other methods (dotted lines). The highest
accuracy of 85.54% is comparableto the 86.83% accuracy of the current state-of-the-art method
on LFW. The small bump inperformance from combining the attribute and simileclassifiers
suggests that while they contain much ofthe same kind of information, there are still
someinteresting differences. This can be better seen inFig. 6.2, where similes do better in the
low-falsepositiveregime, but attributes do better in the highdetection-rate regime.
Fig. 6.2 Face verification performance on LFW of our attributeclassifiers, simile classifiers, and
a hybrid of the twoare shown in solid red, blue, and green, respectively. Dashedlines are existingmethods
8/6/2019 Eshan Content
21/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 21
6.4 Human Attribute Labels on LFW
It is interesting toconsider how well attribute classifiers could potentiallydo. There are several
reasons to believe that ourresults are only first steps towards this ultimate goal:
y We have currently trained 73 attribute classifiers. Adding more attributes, especially fine-scale ones such as the presence and location of highly discriminative facial features
including moles, scars,and tattoos, should greatly improve performance.
y Of the 73 attributes, many are not discriminative for verification. For example, facialexpression, scene illumination, and image quality are all unlikely to aid in verification.
There is also a severe imbalance in LFW of many basic attributes such as gender and age,
which reduces the expected benefit of using these attributes for verification.
y The attribute functions were trained as binary classifiers rather than as continuousregressors. While we use the distance to the separation boundary as a measure of degree
of the attribute, using regression may improve results.
Fig. 6.3 shows a comparison of face verification performanceon LFW using either these
human attributelabels (blue line) or our automatically-computed classifieroutputs (red line), for
increasing numbers ofattributes. In both cases, the labels are fed to the verificationclassifier V
and training proceeds identically,as described earlier. The set of attributes used foreach
corresponding point on the graphs were chosenmanually (and were identical for both).
Verificationresults using the human attribute labels reach 91.86%accuracy with 18 attributes,
significantly outperformingour computed labels at 81.57% for the same 18attributes. Moreover,
the drop in error rates fromcomputational to human labels is actually increasingwith more
attributes, suggesting that adding moreattributes could further improve accuracies.
8/6/2019 Eshan Content
22/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 22
Fig. 6.3 Comparison of face verification performance on LFWusing human attribute labels (blue
line) vs. automaticallycomputedclassifier outputs (red line).
6.5 Human Verification on LFW
The high accuracies obtained in the previous sectionlead to a natural question: How well do
peopleperform on the verification task itself? While manyalgorithms for automatic face
verification have beendesigned and evaluated on LFW, there are no publishedresults about how
well people perform onthis benchmark.
Results are shown in Fig. 6.4. Using the originalLFW images (red curve), people have
99.20% accuracy essentially perfect3. The task was then made tougher by cropping the images,
leaving only the facevisible (including at least the eyes, nose and mouth,and possibly parts of the
hair, ears, and neck). Thisexperiment measures how much people are helped bythe context
8/6/2019 Eshan Content
23/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 23
(sports shot, interview, press conference,etc.), background (some images of individuals
weretaken with the same background), and hair (althoughsometimes it is partially visible). The
results (bluecurve) show that performance drops to 97.53% atripling of the error rate.
Fig. 6.4. Face verification performance on LFW by humans
6.6Attribute Classifier Results on PubFig
Faceverification is performed on 20, 000 pairs of images of140 people, divided into 10 cross-
validation folds withmutually disjoint sets of 14 people each. These peopleare separate from the
60 people in the developmentset ofPubFig, which were used for training the simileclassifiers.
The performance of the attribute classifierson this benchmark is shown in Fig. 6.5, and it is
indeedmuch lower than on LFW, with an accuracy of 78.65%.
8/6/2019 Eshan Content
24/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 24
Fig. 6.5 Face verification results on theP
ubFig evaluationbenchmark using the attributeclassifiers.
8/6/2019 Eshan Content
25/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 25
CHAPTER 7
FACE SEARCH
Image search engines are currently dependent ontextual metadata. This data can be in the form
offilenames, manual annotations, or surrounding text.However, for the vast majority of images
on theinternet (and in peoples private collections), this datais often ambiguous, incorrect, or
simply not present.This presents a great opportunity to use attributeclassifiers on images with
faces, thereby making themsearchable. To facilitate fast searches on a large collectionof images,
all images are labeled in an offlineprocess using attribute classifiers. The resulting attributelabels
are stored for fast online searches usingthe FaceTracer engine
The FaceTracer engine uses simple text-basedqueries as inputs, since these are both
familiar andaccessible to most internet users, and correspond well to describable visual
attributes. Search results are ranked by confidence, so thatthe most relevant images are shown
first. The engine uses thecomputed distance to the classifier decision boundaryas a measure of
the confidence. For searches withmultiple query terms, the confidences ofdifferent attribute
labels such that the final rankingshows images in decreasing order of relevance to allsearch terms
are combined.To prevent high confidences for oneattribute from dominating the search results,
we firstconvert the confidences into probabilities by fittingGaussian distributions on attribute
scores computedon a held-out set of positive and negative examples,and then use the product of
the probabilities as thesort criteria. This ensures that the images with highconfidences for
allattributes are shown first.
Example queries on the search engine are shown inFigs. 7.1a and 7.1b. The returned
results are all highlyrelevant. Fig. 7.1b additionally demonstrates two other interesting things.
y it was run on a personalized dataset of images from a single user, showing that thismethod can be applied to specialized image collections as well as general ones.
y it shows that we can learn useful things about an image using just the appearance of thefaces within it in this case determining whether the image was taken indoors or
outdoors.
8/6/2019 Eshan Content
26/29
8/6/2019 Eshan Content
27/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 27
This attribute-based search engine can be used inmany other applications, replacing or
augmentingexisting tools. In law enforcement, eyewitnesses tocrimes could use this system to
quickly narrow a listof possible suspects and then identify the actual criminalfrom the reduced
list, saving time and increasingthe chances of finding the right person. On the internet,our face
search engine is a perfect match for socialnetworking websites such as Facebook, which
containlarge numbers of images with people. Additionally,the community aspect of these
websites would allowfor collaborative creation of new attributes. Finally,people could use our
system to more easily organizeand manage their own personal photo collections. Forexample,
searches for blurry or other poor-qualityimages can be used to find and remove all suchimages
from the collection.
8/6/2019 Eshan Content
28/29
Describable Visual Attributes for Face Verification and Image Search
Dept. Of CSE, AMC Engineering College 28
CHAPTER 8
CONCLUSION
These seem to be promising first steps in a newdirection, and there are many avenues to explore.
Theexperiments with human attribute labeling suggest that adding more attributes and
improvingthe attribute training process could yield great benefitsfor face verification. Another
direction to exploreis how best to combine attribute and simile classifierswith low-level image
cues. Finally, an open question ishow attributes can be applied to domains other thanfaces. It
seems that for reliable and accurate attributetraining, analogues to the detection and
alignmentprocess must be found.
The set of attributes used in this work were chosenin an ad-hoc way; how to select them
dynamically ina more principled manner is an interesting topic toconsider. In particular, a system
with a user-in-the-loopcould be used to suggest new attributes. Thanksto Amazon Mechanical
Turk, such a system would beeasy to setup and could operate autonomously.
8/6/2019 Eshan Content
29/29
Describable Visual Attributes for Face Verification and Image Search
REFERENCES
Books:
[1] BahramJavidi, Image Recognition and Classification, Marcel Dekker, 5(2), 2001.
[2] Takeo Kanade, Anil K. Jain, NaliniKantaRatha, Audio- and video-based biometric person
authentication, International Association forPattern Recognition, 2005.
[3] Shigeo Abe, Support Vector Machines for Pattern Classification, Series in Machine
Perception Artificial Intelligence,pringer, pp. 891-895, 2005.
[4]De-Shuang Huang, Advanced Intelligent Computing. Theories and Applications Springer,
2010.
[5]Cha Zhang, Zhengyou Zhang, Face Detection and Adaptation, Morgan & Claypool
Publishers, 2011.
Links:
[6] Histogram of Oriented Gradients,http://portal.acm.org/citation.cfm?id=1069007
[7] SIFT, http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
IEEE:
[8] Hieu V. Nguyen and Li Bai, Cosine Similarity Metric Learning, Computer Vision - ACCV
2010: 10th Asian Conference on Computer, 2010.
[9]Neeraj Kumar, Alexander C. Berg,Peter N. Belhumeur, and Shree K. Nayar, Describable
Visual Attributes forFace Verication and Image Search, IEEE Transaction on Pattern Analysis
and Machine intelligence.