Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey...

28
“Joint Optimization of Cascaded Classifiers for Computer Aided Detection” by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia

description

Paper summary Proposes procedure for offline joint learning of cascade classifiers Resulting classifier is tested on polyp detection from computed tomography images Resulting cascade classifier is more accurate than cascade AdaBoost, on par with SVM, and faster than either

Transcript of Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey...

Page 1: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

“Joint Optimization of Cascaded Classifiers for Computer Aided

Detection” by M.Dundar and J.Bi

Andrey KolobovBrandon Lucia

Page 2: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Talk OutlinePaper summaryProblem statementOverview of cascade classifiers and current

approaches for training themHigh-level idea of AND-OR framework Gory math behind AND-OR frameworkExperimental resultsDiscussion

Page 3: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Paper summary

• Proposes procedure for offline joint learning of cascade classifiers

• Resulting classifier is tested on polyp detection from computed tomography images

• Resulting cascade classifier is more accurate than cascade AdaBoost, on par with SVM, and faster than either

Page 4: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Problem statement• Polyp detection in a CT

image• Methodology: Identify candidate

structures (subwindows)

Compute features of candidate structures

Classify candidates

Page 5: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Cascade classifiers

Page 6: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

A digression to previous work...

Why use cascades in the first place?

We motivate their use with Paul Viola and Michael Jones' 2004 work on detecting faces.

(Also, this is more vision-related)

Page 7: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Viola & Jones Face Detection Work

• Used cascaded classifiers to detect faces• To show why cascades are useful, evaluated

one big (200 feature) classifier vs. 10 20-feature classifiers– 5000 faces, 10000 non-faces– Stage n trained on faces + FP of stage n-1– Monolithic trained on union of all sets used to train

each stage in cascade

Page 8: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Viola & Jones Face Detection Work

• Monolithic vs. Cascade similar in accuracy

• Cascade is ~10 times faster– Eliminate FPs early –

later stages don't think about them.

Page 9: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Viola & Jones Face Detection Work

• In a small experiment, even the big classifier works– This paper claims that the big classifier only works

w/ ~10,000 (“maybe ~100,000”) negative examples– Cascaded version sifts through hundreds of

millions of negatives, since many are pared off at each stage.

Page 10: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Viola & Jones Face Detection Work• In a bigger experiment, they show that their 38

classifier cascade approach is 600 times faster than previous work. – Take that, previous work. (Schneiderman & Kanade,

2000)• The Point: Cascades eliminate candidates early on, so

later stages with higher complexity have to evaluate fewer candidates.

Page 11: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Cascade classifiers• Advantages over monolithic classifiers: faster

learning and smaller computation time• Key insight: a small number of features can reject a

big number of false candidates• To train Ci use set Ti of examples that passed

previous stages• Low false negative rate is critical at all stages

Page 12: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Example: cascade training with AdaBoost

• Given: set H = {h1, …, hM} of single-feature classifiers• Goal: construct cascade C1, …, CN s.t. Ci is a weighted

sum of an mi-subset of H, mi << M• Train Ci with AdaBoost to select an mi-subset using

examples that passed through previous cascade stages.• Optimal mi and N are very hard to find. They are set

empirically. Each stage is trained to achieve a some false positive rate FPi and/or false negative rate FNi

Page 13: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Drawbacks of modern cascade training

• Greedy; classifier at stage i is optimal for this stage but not globally.

• Drawback of AdaBoost cascade training: feature computational complexity is ignored, leading to inefficiency in early stages.

Page 14: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Proposition: AND-OR training• Each stage is trained for optimal system, not stage,

performance• All examples are used for training every stage• The stage classifier complexity increases down the

cascade• Parameters of different stage classifiers may be

adjusted throughout the training process, depending on how other classifiers are doing

• Motivation: a negative example is classified correctly iff it’s rejected by at least one stage (OR); a positive one must pass all stages (AND)

Page 15: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Proposition: AND-OR training

Page 16: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Review of Hyperplane Classifiers w/ Hinge Loss

Hinge Loss

If hinge loss is 0, correct classification.

If hinge loss > 0, incorrect classification.

Goal: Minimize Hinge Loss.

J (α) = Φ(α) + ∑Li=1

wi (1 − αT yi xi)+*

Min: J (α) = Φ(α) + ∑Li=1

wi E

E ≥ (1 − αT yi xi), E ≥ 0 *there are L pairs {xi,yi} of training data

Page 17: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Sequential Cascaded Classifiers So all that pertains to one classifier on its own.

Cascaded Version, w/ k classifiers. (xi gets split into k non-overlapping

subvectors)

– Subvectors ordered by computational complexity

For αk, J(αk) = Φ(αk) + ∑Li=1

wi (1 − αkT yi xik)+ if i is in

Tk-1

T is set of “yes”'s from previous classifiers.

Cut as many “no”'s as possible, leave the rest for the rest of the α's

Page 18: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

AND-OR Cascaded Classifiers Everything at once – same starting pt. for each

J(α1, ... ,αK) = ∑Kk=1Φk(α) + ∑

i in F (1 − αkT yi xik)+ + ∑i in Tmax(0,(1

− α1T yi xi1), ... , (1 − αk

T yi xik))k=1

K

Page 19: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

AND-OR Cascaded Classifiers Everything at once – same starting pt. for each

J(α1, ... ,αK) = ∑Kk=1Φk(α) + ∑

i in F (1 − αkT yi xik)+ + ∑i in Tmax(0,(1

− α1T yi xi1), ... , (1 − αk

T yi xik))k=1

K

“AND” “OR”Regularization Terms

This is the training cost function.

Page 20: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Optimizing Cascaded Classifiers New minimization (similar to the first one)

For each k in K, fix all (αj) | j ≠ k

Minimize Φk(α) +(∑ i in F wiEi) + (∑

i in T Ei)

Ei ≥ 1 − αkT yi xik, Ei ≥ 0

Ei ≥ max(0,(1 − α1T yi xi1),...,(1 − αk-1

T yi xik-1),

(1 − αk+1T yi xik+1),...,(1 − αK

T yi xiK))

all but αk are not variable, so this is easy (linear prog./quadratic prog.

solver)

This subproblem is convex.

Page 21: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Cyclic Optimization Algorithm 0.Initialize all αk to αk

0 using init. training dataset.

1.for all k: fix all αi, i != k, minimize eq. on prev. slide.

2.Compute J(α1, ... αk ... αK) using αkc instead of αk

c-1

3.If Jc has improved enough, or we've run long enough, stop, otherwise, go back to step 1.

Page 22: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Convergence Analysis They show that they are globally convergent to the set of

sub-optimal solutions

Using magic, and a theorem by two people named Fiotot, and Huard.

• Idea behind proof: Since we fix all vars but one, and minimize, the sequence of solutions is a decreasing sequence, with a lower bound of zero – so we converge to local min. or 0. We threshold and limit iterations, so we could hit a flat spot or just fall short.

Page 23: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Evaluation

• Application: polyp detection in computed tomography images. Important because polyps are an early stage of cancer

• Training set: 338 volumes (images), 88 polyps, 137.3 false positives per volume

• Test set: 396 volumes, 106 polyps, 139.4 false positives per volume

• Goal: reduce false positives per volume (FP/vol) to 0-5

Page 24: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Evaluation (cont’d)

• 46 features per candidate object• Comparison of AdaBoost-trained cascade, single-

stage SVM, and AND-OR trained cascade.• Features were split into 3 sets, in the increasing

order of complexity• 3-stage AND-OR classifier was built.• AdaBoost classifier was built in 3 phases, where

phase n used feature sets 1 through n for training.

Page 25: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Evaluation (cont’d)

• Sensitivity thresholds for AdaBoost were 0 missed polyps for phase 1, 2 for phase 2, and 1 for phase 3

• Number` of stages in each phase was picked to satisfy these thresholds.

Page 26: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Results

Page 27: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Results (cont’d)

• Cascade AdaBoost - 118 CPU secs • Cascade AND-OR – 81 CPU secs• SVM – 286 CPU secs• Cascade AND-OR is as accurate as SVM and

30% better than cascade AdaBoost

Page 28: Joint Optimization of Cascaded Classifiers for Computer Aided Detection by M.Dundar and J.Bi Andrey Kolobov Brandon Lucia.

Discussion• Other methods of solving the optimization• How to assign cascade parameters

– How are feature sub-vectors sorted– heuristic? What is “ordered by computational complexity”?

• Speed increase was more prominent result, but hardly mentioned next to discussion of false positives.

• Generally better to use parallel cascades?