Page 1 CVPR Workshop: Statistical Analysis in Computer Vision A Statistical Assessment of Subject...

CVPR Workshop: Statistical Analysis in Computer Vision

A Statistical Assessment of Subject Factors in the PCA Recognition of Human Subjects

Geof Givens*, J. Ross Beveridge, Bruce A. Draper & David Bolme

Computer Science, Colorado State University*Statistics, Colorado State University


June 22, 2003


Some Human Subjects Are Harder to Recognize than Others. Why?


Better Formulate the Question

• Choose an algorithm– Something standard and simple.– PCA using our publicly released version.

• Choose a distance measure: – Yambor Angle is good and similar to Moon-Phillips FERET. – Much better than standards such as L2.

• Define & Collect Covariates: – Few covariates were collected initially with FERET images.– One person at CSU, David Bolme, scored all images.

• Big questions.– What to measure?– Covariates in isolation or together?


Refinement of NIST preprocessing used in FERET.

Some Basics:Image Preprocessing

• Integer to float conversion– Converts 256 gray levels to single-

floats

• Geometric Normalization– Aligns human chosen eye

coordinates

• Masking– Crop with elliptical mask leaving

only face visible.

• Histogram Equalization– Histogram equalizes unmasked

pixels: 256 levels.

• Pixel normalization– Shift and scale pixel values so

mean pixel value is zero and standard deviation over all pixels is one.


More Basics:Standard PCA Algorithm.

… …

PCA space projection

Training imagesEigenspace

Training

Testing… …

Distance Matrix

Remove in/out training issue by training on all images.


What Covariates?

Glasses Bangs

Facial Hair Mouth Smiling?

EyesAgeGenderRace


Age Young OldGender Male FemaleRace White Black Asian OtherSkin Clear Other

Bangs No YesExpression Neutral OtherEyes Open OtherFacial Hair No YesMakeup No YesMouth Closed Other

Fixed Per Image

Fixed Per SubjectFERET Subject/Image Covariates

The Final Set of Subject Covariates


Collecting the Covariates


What to Measure? Recognition Rate on Partitioned Data

• Measure Recognition Rate for Partitioned Images. – Partition images by covariate : e.g. male versus female.– Compare recognition rate on different sets.

• Good – Answers a very specific version of the question.– Recognition rate is a standard performance measure.

• Bad– Fails to adjust, i.e. control for, other covariates.– Recognition rate for a probe set, not a single subject.– Hidden dependence on gallery image set. – Statistical significance is hard to interpret.


What to measure? Intrapersonal Image Pair Distance

• Measure Distance (Similarity) Between Images– For two images of one subject, closer is better.

• Good– Measure is independent of other subjects, probe sets, etc. – Measure is continuous, analysis of variance appropriate.– Linear model accounts for all covariates at one time. – Standard tests of statistical significance apply.

• Bad– Distance to recognition rate connection indirect. – Supplemental analysis required to establish linkage.

• Made through intermediate measure: recognition rank.


A Glimpse of Distance and Rank Data


Image Pairs for Three Example Subjects Shown Above

Recognition Rank = 1 Recognition Rank = 1 Recognition Rank = 1Distance = -299* Distance = -206* Distance = -110*

Distance x 100,000*


Linear Model Relating Distance to Subject Covariates

Yi = Distance metric for image pair i.

Xi = Human covariate factors for image pair i.

= Parameters quantifying factor effects.

Yi = 0 + 1Xi1 + 2Xi2 + … + i

with i ~ iid Normal(0, 2)


Comments on ANOVA

• 1,072 Subject image pairs.– Each image pair corresponds to a unique person.

• Both images from the same day– Delay between images is known to make recognition harder.

• Subjects did not add/remove glasses– Pilot studies included this case, it is harder.

• ANOVA yielded R2 = 0.39– Covariates explain 39% of observed variation.

– Notable given 75% of subjects can be recognized at rank 1.


FERET Subject Covariates Summary of Results

Glasses Off

Expression Changes

Makeup Changes

Mouth Changes

Glasses Always OnAge Young Age OldEyes OpenEyes Open/Closed Eyes Always ClosedExpression

NeutralAlways Non-neutral

Race White Race Asian

Race African-Amer.Race Other

No Facial HairAlways Facial Hair Facial Hair Changes

MaleFemale

-50% -40% -30% -20% -10% 0% 0% 10% 20% 30% 40% 50%

Change in Similarity Measure

Ha

rde

r to

Re

co

gn

ize E

asie

r to R

ec

og

nize

No MakeupAlways Makeup

Mouth ClosedMouth Always Open

No Bangs Always BangsBangs Change

Skin Clear Skin Not Clear

Base-Case


Two Possible Reservations

• Unbalanced training biases conclusions?Unbalanced training biases conclusions?

“Asians are closer to each other than whites because the algorithm was trained on very few of them. The unbalanced training means less well represented sets of samples are not as well separated.”

• Choice of Y = distance renders results irrelevant?Choice of Y = distance renders results irrelevant?

“We care about recognition rank, or rank-k recognition rate. I don’t know whether your response variable is strongly related to this. Even if it is related overall, does the relationship hold for specific groups?”


Plot of Distance versus the Log of Recognition Rank

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

-350 -300 -250 -200 -150 -100 -50 0Distance (x 100,000)

Log

of

Recog

nit

ion

Ran

k

845 out of 1,072 subject image pairs are closest matches: i.e. Rank is 1.

Visual inspection suggests strong relationship between distance and rank.


Modeling the Relationship Between Distance and Recognition Rank

Yi = Was the ith image pair matched at rank 1 ?

(i.e. Yi = 1 if Ri = 0 and otherwise Yi = 0)

Xi = Distance metric for image pair i. = Parameters quantifying the relationship.

g(Yi|Xi) = Xi' = 0 + 1Xi

Yi | Xi ~ f(Yi|Xi) independently

Now: g(z) = log (z/(1-z)), f(Yi|Xi) = Bernoulli(Yi|Xi)


Logistic Regression of Rank Indicator on Distance

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

-350 -300 -250 -200 -150 -100 -50 0Distance (x 100,000)

Zi -

On

e if

Ran

k is O

ne The probability of a rank 1

match decreases sharply with increasing distance; 1 is -10585.3 with p-value = 0.049

Similar when grouped for race, gender, age, skin and glasses.


Supplemental Balanced Training Experiment for Race.

Young Old Asian 89 6White 89 6Black 78 16White 78 16Other 62 6White 62 6

To Test Race

p-value

Asian 380 78

Glasses = Off, Eyes = Open, Skin = Clear Balance Age

√

√

√

0.0104

0.0064

0.0249

Black

Other

1

2

3

Test Compare ConfirmPCA Dim.Total Images

75

102376

272

Experiment

720 White143 Asian121 Black 88 Other

Follow-up experiment replicates the first, but with carefully balanced training.

Distribution of race in 1,072 Subjects is not balanced.


Supplemental Balanced Training Experiments for Other Covariates

Compare Clear Other Old 131 57

Young 131 57

Old Young Clear 72 57

Other 72 57

Old Young Off 14 16On 14 16

To Test Age

50

Glasses = Off, Eyes = Open, Race = White

To Test Skin Glasses = Off, Eyes = Open, Race = White

To Test Glasses

√

√

0.0001

0.0122

Compare

5

6

Balance Skin

Balance Age

Balance Age Eyes = Open, Race = White, Skin = Clear

Age

Skin

Glasses 120

4

0.0005

Test

Test

√

516

752 130

Test Compare

Total Images

PCA Dim.Total Images

117

PCA Dim.

PCA Dim.Total Images Confirm p-value

Confirm p-value

Confirm p-value

Experiment

Experiment

Experiment


Sampling of Related WorkNicholas Furl, P. Jonathon Phillips and Alice J. O’Toole. Face recognition algorithms and the other-race effect: computational mechanisms for a developmental contact hypothesis.

PCA-based study, recall style experiment, looked at White vs. Asian distinction in FERET, training bias like that of humans.

Jeffrey F. Cohn Ralph Gross, Jianbo Shi. Quo vadis face recognition?: The current state of the art in face recognition, 2002

Partitioned probe set of 1,119 subjects: 87.6 vs. 93.7 recognition rate for females vs. males. Used FaceIt and not FERET data.Found men easier.

Face Recognition Vendor Test 2002, Evaluation Report, P. Jonathon Phillips, Patrick Grother, Ross J. Micheals, Duane M. Blackburn, Elham Tabassi, Mike Bone

Large Data Set, Eight Systems, CMC and ROC analysis.Gender test partitioned probe set and found men easier.


More on the Gender Effect, or Lack of Gender Effect.

From FRVT 2002 Evaluation Report

These results are based upon simple partition of probe set.


What Happens if We Replicate Simpler Partition on Gender?

• Fit our data with a simple on-way ANOVA LM on Gender.• Analogous to partition of male vs. female probes.• Result

– “statistically significant” gender effect.– Male subjects about 13% more easier: p < 0.0001

• But– We already know this result is wrong!– Full analysis with all our covariates shows no gender effect.

• This exercise illustrates a classic mistake It is far too much to hope that a ‘sample of convenience’ willbe balanced with respect to every confounding variable.What might be confounding?


Conclusions

• Most comprehensive study of its kind to date.– i.e. most covariates considered together.

• Interesting Discoveries– Glasses aid recognition (but don’t take them off).– Non-white subjects easier, despite smaller sample.– Lack of gender effect, why?

• Was there enough data?– Yes, compares favorably to standards for statistical analysis.

• Hubris– All studies have limitations.– Linkage between distance and recognition performance.

• Future work– Analogous experiment on portion of FRVT 2002 data.– Image covariate analysis.


A Little Extra:The Best and Worst of FERET

Best Subjects(Closest Pairs)

Worst Subjects(Most Distant Pairs)

Page 1 CVPR Workshop: Statistical Analysis in Computer Vision A Statistical Assessment of Subject...

Documents

Transcript of Page 1 CVPR Workshop: Statistical Analysis in Computer Vision A Statistical Assessment of Subject...