1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

1

Image Parsing: Unifying Segmentation, Detection,

and Recognition

Shai BagonOren Boiman

2

Image Understanding

• A long standing goal of Computer Vision

• Consists of understanding:– Objects and visual patterns– Context– State / Actions of objects– Relations between objects– Physical layout– Etc.

A picture is worth a

thousand words…

3

Natural Language Understanding• Very far from being solved• Even NL parsing (syntax) is

problematic

• Ambiguities requirehigh level (semantic)knowledge

4

Image Parsing• Decomposition to constituent visual

patterns– Edge Detection– Segmentation– Object Recognition

5

Image Parsing Framework

Segmentation Edge Detection

Object Recognition Classification

Generic Framework

Low-Low-Level Level TasksTasks

High-High-Level Level TasksTasks

IS

6

Inference:

Top-down (Generative)

Constellation, Star-Model etc.

Bottom-up(Discriminative)

SVM, Boosting, Neural Nets etc.

SPSIPISP || ITestsq jj |

+ Fast

- Possibly Inconsistent

+ Consistent Solutions

- Slow

Approach used in

“Image Parsing”

I S

7

Coming up next…

• Define a (Monstrous) Generative model for Image Parsing

• How to perform s-l-o-w inference on such models (MCMC)

• How to accelerate inference using bottom-up cues (DDMCMC)

SPSIPISP ||

8

Image Parsing Generative Model

– No. of regions K

– Region Shapes Li

and Types ζi

– Region Parameters Θi

SP

SIP |

K

iiiiR LIP

i1

,,|

Uniform

Uniform exp

exp

I

S

10

Generic Regions

Constant up to Gaussian noise

Gray level histogram

Quadratic form

2,~ Ngl Ghhgl ,...,~ 1

2,,

~,

yxN

yxgl

feydxcybxyaxyx 22,

11

Faces• Use a PCA model (Eigen-faces)• Estimate Cov. Σ and prin. comp.

,...~ 11 nnVcVcNF

nVV ...1

12

Text region shapes

• Use Spline templates• Allow Affine transformation• Allow small deformations of control

point• Shading intensity model

13

Problem Formulation

• Now we can compute

• We’d like to optimize

• over the space ofparse graphs

SPSIPISP ||

ISPS

|maxarg

K

iiiiR

K

iiiiii

LIpSIP

pLppKpSP

i1

1

,,||

||

15

Optimizing P(S|I) is not easy…• Hybrid State Space:

Continuous & Discrete• Enormous number of local maxima• Graphical model structure is not pre-

determined

Rules out gradient methods

Rules out Belief propagation

16

Optimize by Sampling!

• Monte Carlo Principle – Use random samples to optimize!– Lets say we’re given N samples from P(S|I)

•S1,…,SN

– Compute P(Si|I)

• Given Si it is easy to compute P(Si|I)

– Choose the best Si !

17

Detour: Sampling methods• How to sample from

(very) complex probability space• Sampling algorithm• Why is Markov Chained in Monte

Carlo?

18

Example

• Sample from

22|4|42

2

25

21

2

1

2

1

42

1 2

2

xeexp x

x

19

Markov Chain

• A sequence of RandomVariables

• Markov property

• Transition

,...2,1,,, 321 tsssX t

tttt xxpxxxp |,...,| 111

Kpp tt

1

04.6.

9.1.0

010

KGiven the

present

The future is independent of the past

jiji ssKK ,

20

Markov Chain – cont.

• Under certain conditions MC converges to unique distribution

• Stationary distribution – first eigen-vector of K

pKpKpp ˆˆˆ 1

21

Markov Chain Monte Carlo• Reminder: • Had we wanted a sample from

Take the value of Xt,

• How to make our the stationary distribution of MC ?

• How to guarantee convergence ?

p̂

p

pKp ˆ1

pp ˆ

t

pKpKpp ˆˆˆ 1

22

Markov Chain convergence• Irreducibility:

– The walk can reach any statestarting at any state

• Non-periodicity– Stationary distribution cannot depend

on t

23

• Detailed Balance:(stationary distribution), if

• Written as matrix product• Sufficient condition to converge to

p(x)

xxKxp ** xxKxpxxKxp ***

The same distribution p(.)

How to make p(x) Stationary

Kpp

*xxKxp *x

*x

Probability sum to 1

pxp ˆ

Forward stepBackward step

xxKxpxpSx

**

*Independent of x*

24

Kernel Selection

• Detailed Balance requires Kernel:• Metropolis-Hastings Kernel:

– Proposal: where to go next– Acceptance: should we go

• MH Kernel provides detailed balance

*xxK

xxq |

xxqxp

xxqxpxx

|

|,1min,

Among the ten most influencing algorithms in science and engineering

25

Metropolis Hastings

• Sample x*~q(x*|xt)

• Compute acceptance probability

• If rand<A,

• Else,

*1 xxt

tt xx 1

xxqxp

xxqxpxx

|

|,1min,

26

Can we use any q(.) ?

1. Easy to sample from:– we sample from q(.) instead of p(.)

27


2. Supports p(x) 00 xqxp

p(x)

q(x)

28


3. Explores p(x) wisely:– Too narrow q(.): q(x*|x) ~ N(x, .1)– Too wide q(.): q(x*|x) ~ N(0,20)

p(x)

q(x)

29


1. Easy to sample from:• we sample from q(.) instead of p(.)

2. Supports p(x)–

3. Explores p(x) wisely:– q(.) too narrow – q(.) too wide -> low acceptance

• The best q(.) is p(.) – but we can’t sample p(.) directly.

00 xqxp

30

Combining Kernels

• Suppose we have

Satisfying detailed balance with the same

• Then also satisfies detailed balance.

mixxK i ,..,1,*

*** xxKxpxxKxp ii

xp

*xxKwK ii

i

31

Combining MH Kernels

• The same applies to Metropolis Hastings Kernels:

– Combining MH Kernels with different proposals – MC will converge to xp

32

Example Revisited

• Proposal distribution:• Acceptance:

25,.~|* xNxxq

xxqxCxLxN

xxqxCxLxNA

|21

|21

,1min*

****

Given x - easy to

compute p(x) Normalization factor cancels

out

33

Example – cont.

34

MAP Estimation

• Converge to• Simulated Annealing:

– explore less – exploit more!

• As the density is peaked at the global maxima

xpxp iTi

1

0iT

xpmaxarg

35

Annealing - example

• As the density is peaked at the global maxima

0iT

36

• Dimensionality variation in our space

• Cannot directly comparedensity of differentstates!

Model Selection

Varying number of

regions

Varying types of

explanations per region

37

• Pair-wise common measure

Jump across dimensions

38

Reversible Jumps

• Common measure– Sample extensions u and u* s.t

dim(u)+dim(x) = dim(u*)+dim(x*)– Use common dimension for comparison

using invertible deterministic functions h and h’

– Explicitly allow reversible jumps x* x

uxhuxh ,, ***

39

MCMC Summary

• Sample p(x) using Markov Chain • Proposal q(x*|x)

– Supports p(x)– Guides the sampling

• Detailed balance– MH Kernel ensures convergence to p(x)

• Reversible Jumps– Comparing across models and dimensions

40

If you want to make a new sample,

You should first learn how to propose.

Acceptance is random

Eventually you’ll get trapped in endless chains

until you become stationary.

Some say it is better to do reversible jumps between models.

MCMC – Take home message

41

Back to image parsing

• A state is a parse tree• Moves between

possible parsesof the image

Varying number of

regions

Different region types: Text, Face

and GenericVarying

number of parameters

42

• Birth / Death of a Face / Text

• Split / Merge of a generic region

• Model switching for a region

• Region boundary evolution

MCMC Moves

43

Moves -> Kernel

• Birth / Death of a Face / Text

• Split / Merge of a generic region

• Model switching for a region

• Region boundary evolution

MCMC Moves

44

Moves -> Kernel

TextBirth

TextDeath

FaceBirth

FaceDeath

SplitRegion

MergeRegion

ModelSwitching

BoundaryEvolution

TextSub-Kernel

FaceSub-Kernel

GenericSub-Kernel

ISSK ;|*Dimensionality change: must allow reversible

jump

45

Using bottom-up cues

• So far we haven’t stated the proposal probabilities q(.)

• If q(.) is uninformed of the image, convergence can be painfully slow

• Solution: use the image to propose moves

Face birth kernel

46

Data Driven MCMC

• Define proposal probabilitiesq(x*|x;I)

• The proposal probabilities will depend on discriminative tests– Faces detection– Text detection– Edge detection– Parameter clustering

• Generative model with Discriminative proposals

47

Face/Text Detection

• Bottom-up cues: AdaBoost– hard classification

– Estimate posterior instead

– Run on sliding windows at several scales

ITst,signIsign AdaAda

iiihH

1,I| ITst, AdaAda lelq l

48

Edge Map

• Canny edge detection at several scales

• Only these edges for split / merge

49

Parameters clustering

• Estimate likely parameter settings in the image

• Cluster using Mean-Shift

50

How to propose?

• q(S*|S,I) should approximate p(S*|I)• Choose one sub-kernel at random

– (e.g., create face)

• Use bottom-up cues to generate proposals: S1,S2,…

• Weight proposal according to p(Si|I)

• Sample from discrete distribution

51

Generic region – split/merge• Split/merge according to edge map• Dimensionality change – reversible

S S’

52

Generic region – split/merge• Splitting k into i,j: Sk -> Sij

• Proposals are weighted

• Normalize weight to probabilities• Sample

ISP

ISPw

k

ijsplit |

|

kk

ijijsplit SPSIP

SPSIPw

|

|

54

Faces sub-kernel

• Adding a face :S->S’ • Take AdaBoost proposals • Compute weights wi=P(S’|I)/P(S|I)• Normalize weights to probability• Sample

• Reversible kernel – add/remove face kernel

55

Accept / Reject

• We have the proposal q(S’|S;I) • Check Metropolis Hastings

acceptance

ISpISSq

ISpISSq

|;'|

|';|',1min

56

Full diagram

TextBirth

TextDeath

FaceBirth

FaceDeath

SplitRegion

MergeRegion

ModelSwitching

BoundaryEvolution

TextSub-Kernel

FaceSub-Kernel

GenericSub-Kernel

ISSK ;|*Generative

Text Detection Face Detection Edge Detection Parameter Clustering

Input ImageDiscriminativ

e

57

Results

58

Results

59

Results

60

Results

61

Results

62

Limitations

• Scaling to a large number of objects– Algorithm design complexity– Convergence speed– Dealing with complex objects

• Good Synthesis / Detectionbut not so good segmentation

63

Extensions

64

Extensions

65

Extensions

66

• Image Parsing– Decomposition to constituent

visual patterns

• Top-down Generative Model for Parse Graphs

• Optimization using DDMCMC– MCMC – Discriminative bottom-up proposals

Summary

67

References• Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, Song-

Chun Zhu. Image Parsing: Unifying Segmentation, Detection, and Recognition. International Journal of Computer Vision, 2005.

• Z. Tu and S. Zhu. Image Segmentation by DDMCMC. IEEE Trans. Pattern Analysis and Machine Intelligence, 2002.

• Zhuowen Tu, Xiangrong Chen, A.L. Yuille and S.C. Zhu. Image Parsing: Unifying Segmentation, Detection, and Recognition. IEEE International Conference on Computer Vision, 2003.

• C. Andrieu, N. de Freitas, A. Doucet and M. Jordan. An introduction to MCMC for machine learning. Machine Learning, vol. 50, pp. 5--43, Jan.- Feb. 2003.

68

Backups

70

Example

• Compute posterior for a simple GMM:– Given one X, what component

of the mixture generated it?– Exhaustive search –

What if larger space? MpMxpxMp ||

71

Example revisited

74

Binarization

• Extracting text boundaries• Adaptive thresholding

WindowWindowThr std2.mean

75

What’s so special about Text?• Information lies in boundary

– AdaBoost: suggests region– Adaptive binarization: boundary

refinement

76

• Union of model subspaces

• How can we compare densitiesacross dimensions?

Model selection

U U

-5

-4

-3

-2

-1

0

1

2

3

4

5-5

-4-3

-2-1

01

23

45

-5

0

5

77

Parameter clustering

• Each cluster in parameter set induce saliency map

Shading

Gray level

78

Generic region – split/merge• Splitting k into i,j or merging i,j into k• Suggestions are weighted

jjjiii

kkk

jjjjiiii

jijimerge

kkk

jjjiii

kkkk

jiksplit

LpLp

qLq

LRpLRp

RRqw

Lp

qLqqLq

LRp

RRqw

,,,,

,

,,|,,|

,

,,

,,

,,|

,

,

RegionAffinity

ShapePrior

ParameterClustering

Current RegionProbability

Current parametersProbability

79

Switching node’s attributes• No dimensionality change• Weighting the proposals by

iiiiiii

iiichangei

LpLIp

qLqw

,,,,|

',''

80

Boundary Evolution Kernel• Does not change dimensionality• For two adjacent regions:

– Log likelihood ratio– Changes in area– Boundary curvature– Deviation from control points (text)– Brownian noise

j

i

vIp

vIp

;

;log

1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.

Documents

Transcript of 1 Image Parsing: Unifying Segmentation, Detection, and Recognition Shai Bagon Oren Boiman.