1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

74
1 Image Parsing : Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman

Transcript of 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

Page 1: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

1

Image Parsing: Unifying Segmentation, Detection,

and Recognition

Shai BagonOren Boiman

Page 2: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

2

Image Understanding

• A long standing goal of Computer Vision

• Consists of understanding:– Objects and visual patterns– Context– State / Actions of objects– Relations between objects– Physical layout– Etc.

A picture is worth a

thousand words…

Page 3: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

3

Natural Language Understanding• Very far from being solved• Even NL parsing (syntax) is

problematic

• Ambiguities requirehigh level (semantic)knowledge

Page 4: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

4

Image Parsing• Decomposition to constituent visual

patterns– Edge Detection– Segmentation– Object Recognition

Page 5: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

5

Image Parsing Framework

Segmentation Edge Detection

Object Recognition Classification

Generic Framework

Low-Low-Level Level TasksTasks

High-High-Level Level TasksTasks

IS

Page 6: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

6

Inference:

Top-down (Generative)

Constellation, Star-Model etc.

Bottom-up(Discriminative)

SVM, Boosting, Neural Nets etc.

SPSIPISP || ITestsq jj |

+ Fast

- Possibly Inconsistent

+ Consistent Solutions

- Slow

Approach used in

“Image Parsing”

I S

Page 7: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

7

Coming up next…

• Define a (Monstrous) Generative model for Image Parsing

• How to perform s-l-o-w inference on such models (MCMC)

• How to accelerate inference using bottom-up cues (DDMCMC)

SPSIPISP ||

Page 8: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

8

Image Parsing Generative Model

– No. of regions K

– Region Shapes Li

and Types ζi

– Region Parameters Θi

SP

SIP |

K

iiiiR LIP

i1

,,|

Uniform

Uniform exp

exp

I

S

Page 9: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

10

Generic Regions

Constant up to Gaussian noise

Gray level histogram

Quadratic form

2,~ Ngl Ghhgl ,...,~ 1

2,,

~,

yxN

yxgl

feydxcybxyaxyx 22,

Page 10: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

11

Faces• Use a PCA model (Eigen-faces)• Estimate Cov. Σ and prin. comp.

,...~ 11 nnVcVcNF

nVV ...1

Page 11: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

12

Text region shapes

• Use Spline templates• Allow Affine transformation• Allow small deformations of control

point• Shading intensity model

Page 12: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

13

Problem Formulation

• Now we can compute

• We’d like to optimize

• over the space ofparse graphs

SPSIPISP ||

ISPS

|maxarg

K

iiiiR

K

iiiiii

LIpSIP

pLppKpSP

i1

1

,,||

||

Page 13: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

15

Optimizing P(S|I) is not easy…• Hybrid State Space:

Continuous & Discrete• Enormous number of local maxima• Graphical model structure is not pre-

determined

Rules out gradient methods

Rules out Belief propagation

Page 14: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

16

Optimize by Sampling!

• Monte Carlo Principle – Use random samples to optimize!– Lets say we’re given N samples from P(S|I)

•S1,…,SN

– Compute P(Si|I)

• Given Si it is easy to compute P(Si|I)

– Choose the best Si !

Page 15: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

17

Detour: Sampling methods• How to sample from

(very) complex probability space• Sampling algorithm• Why is Markov Chained in Monte

Carlo?

Page 16: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

18

Example

• Sample from

22|4|42

2

25

21

2

1

2

1

42

1 2

2

xeexp x

x

Page 17: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

19

Markov Chain

• A sequence of RandomVariables

• Markov property

• Transition

,...2,1,,, 321 tsssX t

tttt xxpxxxp |,...,| 111

Kpp tt

1

04.6.

9.1.0

010

KGiven the

present

The future is independent of the past

jiji ssKK ,

Page 18: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

20

Markov Chain – cont.

• Under certain conditions MC converges to unique distribution

• Stationary distribution – first eigen-vector of K

pKpKpp ˆˆˆ 1

Page 19: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

21

Markov Chain Monte Carlo• Reminder: • Had we wanted a sample from

Take the value of Xt,

• How to make our the stationary distribution of MC ?

• How to guarantee convergence ?

p

pKp ˆ1

pp ˆ

t

pKpKpp ˆˆˆ 1

Page 20: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

22

Markov Chain convergence• Irreducibility:

– The walk can reach any statestarting at any state

• Non-periodicity– Stationary distribution cannot depend

on t

Page 21: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

23

• Detailed Balance:(stationary distribution), if

• Written as matrix product• Sufficient condition to converge to

p(x)

xxKxp ** xxKxpxxKxp ***

The same distribution p(.)

How to make p(x) Stationary

Kpp

*xxKxp *x

*x

Probability sum to 1

pxp ˆ

Forward stepBackward step

xxKxpxpSx

**

*Independent of x*

Page 22: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

24

Kernel Selection

• Detailed Balance requires Kernel:• Metropolis-Hastings Kernel:

– Proposal: where to go next– Acceptance: should we go

• MH Kernel provides detailed balance

*xxK

xxq |

xxqxp

xxqxpxx

|

|,1min,

Among the ten most influencing algorithms in science and engineering

Page 23: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

25

Metropolis Hastings

• Sample x*~q(x*|xt)

• Compute acceptance probability

• If rand<A,

• Else,

*1 xxt

tt xx 1

xxqxp

xxqxpxx

|

|,1min,

Page 24: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

26

Can we use any q(.) ?

1. Easy to sample from:– we sample from q(.) instead of p(.)

Page 25: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

27

Can we use any q(.) ?

2. Supports p(x) 00 xqxp

p(x)

q(x)

Page 26: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

28

Can we use any q(.) ?

3. Explores p(x) wisely:– Too narrow q(.): q(x*|x) ~ N(x, .1)– Too wide q(.): q(x*|x) ~ N(0,20)

p(x)

q(x)

Page 27: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

29

Can we use any q(.) ?

1. Easy to sample from:• we sample from q(.) instead of p(.)

2. Supports p(x)–

3. Explores p(x) wisely:– q(.) too narrow – q(.) too wide -> low acceptance

• The best q(.) is p(.) – but we can’t sample p(.) directly.

00 xqxp

Page 28: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

30

Combining Kernels

• Suppose we have

Satisfying detailed balance with the same

• Then also satisfies detailed balance.

mixxK i ,..,1,*

*** xxKxpxxKxp ii

xp

*xxKwK ii

i

Page 29: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

31

Combining MH Kernels

• The same applies to Metropolis Hastings Kernels:

– Combining MH Kernels with different proposals – MC will converge to xp

Page 30: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

32

Example Revisited

• Proposal distribution:• Acceptance:

25,.~|* xNxxq

xxqxCxLxN

xxqxCxLxNA

|21

|21

,1min*

****

Given x - easy to

compute p(x) Normalization factor cancels

out

Page 31: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

33

Example – cont.

Page 32: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

34

MAP Estimation

• Converge to• Simulated Annealing:

– explore less – exploit more!

• As the density is peaked at the global maxima

xpxp iTi

1

0iT

xpmaxarg

Page 33: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

35

Annealing - example

• As the density is peaked at the global maxima

0iT

Page 34: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

36

• Dimensionality variation in our space

• Cannot directly comparedensity of differentstates!

Model Selection

Varying number of

regions

Varying types of

explanations per region

Page 35: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

37

• Pair-wise common measure

Jump across dimensions

Page 36: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

38

Reversible Jumps

• Common measure– Sample extensions u and u* s.t

dim(u)+dim(x) = dim(u*)+dim(x*)– Use common dimension for comparison

using invertible deterministic functions h and h’

– Explicitly allow reversible jumps x* x

uxhuxh ,, ***

Page 37: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

39

MCMC Summary

• Sample p(x) using Markov Chain • Proposal q(x*|x)

– Supports p(x)– Guides the sampling

• Detailed balance– MH Kernel ensures convergence to p(x)

• Reversible Jumps– Comparing across models and dimensions

Page 38: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

40

If you want to make a new sample,

You should first learn how to propose.

Acceptance is random

Eventually you’ll get trapped in endless chains

until you become stationary.

Some say it is better to do reversible jumps between models.

MCMC – Take home message

Page 39: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

41

Back to image parsing

• A state is a parse tree• Moves between

possible parsesof the image

Varying number of

regions

Different region types: Text, Face

and GenericVarying

number of parameters

Page 40: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

42

• Birth / Death of a Face / Text

• Split / Merge of a generic region

• Model switching for a region

• Region boundary evolution

MCMC Moves

Page 41: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

43

Moves -> Kernel

• Birth / Death of a Face / Text

• Split / Merge of a generic region

• Model switching for a region

• Region boundary evolution

MCMC Moves

Page 42: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

44

Moves -> Kernel

TextBirth

TextDeath

FaceBirth

FaceDeath

SplitRegion

MergeRegion

ModelSwitching

BoundaryEvolution

TextSub-Kernel

FaceSub-Kernel

GenericSub-Kernel

ISSK ;|*Dimensionality change: must allow reversible

jump

Page 43: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

45

Using bottom-up cues

• So far we haven’t stated the proposal probabilities q(.)

• If q(.) is uninformed of the image, convergence can be painfully slow

• Solution: use the image to propose moves

Face birth kernel

Page 44: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

46

Data Driven MCMC

• Define proposal probabilitiesq(x*|x;I)

• The proposal probabilities will depend on discriminative tests– Faces detection– Text detection– Edge detection– Parameter clustering

• Generative model with Discriminative proposals

Page 45: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

47

Face/Text Detection

• Bottom-up cues: AdaBoost– hard classification

– Estimate posterior instead

– Run on sliding windows at several scales

ITst,signIsign AdaAda

iiihH

1,I| ITst, AdaAda lelq l

Page 46: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

48

Edge Map

• Canny edge detection at several scales

• Only these edges for split / merge

Page 47: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

49

Parameters clustering

• Estimate likely parameter settings in the image

• Cluster using Mean-Shift

Page 48: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

50

How to propose?

• q(S*|S,I) should approximate p(S*|I)• Choose one sub-kernel at random

– (e.g., create face)

• Use bottom-up cues to generate proposals: S1,S2,…

• Weight proposal according to p(Si|I)

• Sample from discrete distribution

Page 49: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

51

Generic region – split/merge• Split/merge according to edge map• Dimensionality change – reversible

S S’

Page 50: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

52

Generic region – split/merge• Splitting k into i,j: Sk -> Sij

• Proposals are weighted

• Normalize weight to probabilities• Sample

ISP

ISPw

k

ijsplit |

|

kk

ijijsplit SPSIP

SPSIPw

|

|

Page 51: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

54

Faces sub-kernel

• Adding a face :S->S’ • Take AdaBoost proposals • Compute weights wi=P(S’|I)/P(S|I)• Normalize weights to probability• Sample

• Reversible kernel – add/remove face kernel

Page 52: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

55

Accept / Reject

• We have the proposal q(S’|S;I) • Check Metropolis Hastings

acceptance

ISpISSq

ISpISSq

|;'|

|';|',1min

Page 53: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

56

Full diagram

TextBirth

TextDeath

FaceBirth

FaceDeath

SplitRegion

MergeRegion

ModelSwitching

BoundaryEvolution

TextSub-Kernel

FaceSub-Kernel

GenericSub-Kernel

ISSK ;|*Generative

Text Detection Face Detection Edge Detection Parameter Clustering

Input ImageDiscriminativ

e

Page 54: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

57

Results

Page 55: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

58

Results

Page 56: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

59

Results

Page 57: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

60

Results

Page 58: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

61

Results

Page 59: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

62

Limitations

• Scaling to a large number of objects– Algorithm design complexity– Convergence speed– Dealing with complex objects

• Good Synthesis / Detectionbut not so good segmentation

Page 60: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

63

Extensions

Page 61: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

64

Extensions

Page 62: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

65

Extensions

Page 63: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

66

• Image Parsing– Decomposition to constituent

visual patterns

• Top-down Generative Model for Parse Graphs

• Optimization using DDMCMC– MCMC – Discriminative bottom-up proposals

Summary

Page 64: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

67

References• Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, Song-

Chun Zhu. Image Parsing: Unifying Segmentation, Detection, and Recognition. International Journal of Computer Vision, 2005.

• Z. Tu and S. Zhu. Image Segmentation by DDMCMC. IEEE Trans. Pattern Analysis and Machine Intelligence, 2002.

• Zhuowen Tu, Xiangrong Chen, A.L. Yuille and S.C. Zhu. Image Parsing: Unifying Segmentation, Detection, and Recognition. IEEE International Conference on Computer Vision, 2003.

• C. Andrieu, N. de Freitas, A. Doucet and M. Jordan. An introduction to MCMC for machine learning. Machine Learning, vol. 50, pp. 5--43, Jan.- Feb. 2003.

Page 65: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

68

Backups

Page 66: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

70

Example

• Compute posterior for a simple GMM:– Given one X, what component

of the mixture generated it?– Exhaustive search –

What if larger space? MpMxpxMp ||

Page 67: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

71

Example revisited

Page 68: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

74

Binarization

• Extracting text boundaries• Adaptive thresholding

WindowWindowThr std2.mean

Page 69: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

75

What’s so special about Text?• Information lies in boundary

– AdaBoost: suggests region– Adaptive binarization: boundary

refinement

Page 70: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

76

• Union of model subspaces

• How can we compare densitiesacross dimensions?

Model selection

U U

-5

-4

-3

-2

-1

0

1

2

3

4

5-5

-4-3

-2-1

01

23

45

-5

0

5

Page 71: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

77

Parameter clustering

• Each cluster in parameter set induce saliency map

Shading

Gray level

Page 72: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

78

Generic region – split/merge• Splitting k into i,j or merging i,j into k• Suggestions are weighted

jjjiii

kkk

jjjjiiii

jijimerge

kkk

jjjiii

kkkk

jiksplit

LpLp

qLq

LRpLRp

RRqw

Lp

qLqqLq

LRp

RRqw

,,,,

,

,,|,,|

,

,,

,,

,,|

,

,

RegionAffinity

ShapePrior

ParameterClustering

Current RegionProbability

Current parametersProbability

Page 73: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

79

Switching node’s attributes• No dimensionality change• Weighting the proposals by

iiiiiii

iiichangei

LpLIp

qLqw

,,,,|

',''

Page 74: 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

80

Boundary Evolution Kernel• Does not change dimensionality• For two adjacent regions:

– Log likelihood ratio– Changes in area– Boundary curvature– Deviation from control points (text)– Brownian noise

j

i

vIp

vIp

;

;log