Lecture 17 Gaussian Mixture Models and Expectation Maximization Machine Learning.
Gaussian Mixture Models and Expectation-Maximization Algorithm.
-
Upload
alison-parrish -
Category
Documents
-
view
250 -
download
1
Transcript of Gaussian Mixture Models and Expectation-Maximization Algorithm.
Gaussian Mixture Models andExpectation-Maximization
Algorithm
2
The RGB Domain
A regular image
3
The RGB Domain
Image pixels in RGB space
4
Pixel Clusters
Suppose we cluster the points for 2 clusters
5
Pixel Clusters
The result in image space
6
Normal Distribution (1D Gaussian)
22
1( , ) exp
22
xf x
,mean
2 , std
7
d = 2 x = random data point (2D vector) = mean value (2D vector) = covariance matrix (2D matrix)
2D Gaussians
11
( , ) exp22 det( )
T
d
x xf x
The same equation holds for a 3D Gaussian
8
2D Gaussians
11
( , ) exp22 det( )
T
d
x xf x
9
Exploring Covariance Matrix
2
21
( , )
cov( , )1
cov( , )
i i i
NT w
i ii h
x random vector w h
w hx x
N h w
is symmetric has eigendecomposition (svd)
* * TV D V
1 2 ... d
10
Covariance Matrix Geometry
1
2
* *
1*
2*
TV D V
a v
b v
b
a
11
3D Gaussians
2
2
1 2
( , , )
cov( , ) cov( , )1
cov( , ) cov( , )
cov( , ) cov( , )
i
rNT
i i gi
b
x r g b
g r b r
x x r g b gN
r b g b
12
GMMs – Gaussian Mixture Models
W
H
Suppose we have 1000 data points in 2D space (w,h)
13
W
H
GMMs – Gaussian Mixture Models
Assume each data point is normally distributed Obviously, there are 5 sets of underlying gaussians
14
The GMM assumption
There are K components (Gaussians) Each k is specified with three parameters: weight, mean,
covariance matrix The total density function is:
1
1
1
1
1( ) exp
22 det( )
{ , , }
0 1
TK
j j j
j dj
j
Kj j j j
K
j jj
x xf x
weight j
15
The EM algorithm (Dempster, Laird and Rubin, 1977)
Raw data GMMs (K = 6) Total Density Function
ii
16
EM Basics
Objective:Given N data points, find maximum likelihood estimation of :
Algorithm:1. Guess initial
2. Perform E step (expectation) Based on , associate each data point with specific gaussian
3. Perform M step (maximization) Based on data points clustering, maximize
4. Repeat 2-3 until convergence (~tens iterations)
1argmax ( ,..., )Nf x x
17
EM Details
E-Step (estimate probability that point t associated to gaussian j):
M-Step (estimate new parameters):
,
1
( , )1,..., 1,...,
( , )
j t j j
t j K
i t i ii
f xw j K t N
f x
,1
,1
,1
,1
,1
1
( )( )
Nnewj t j
t
N
t j tnew tj N
t jt
N new new Tt j t j t jnew t
j N
t jt
wN
w x
w
w x x
w
18
EM Example
Gaussian j
data point t
blue: wt,j
19
EM Example
20
EM Example
21
EM Example
22
EM Example
23
EM Example
24
EM Example
25
EM Example
26
Back to Clustering
We want to label “close” pixels with the same label Proposed metric: label pixels from the same gaussian
with same label Label according to max probability:
Number of labels = K
,( ) argmax( )t jj
label t w
Graph-Cut Optimization
28
Motivation for Graph-Cuts
Let’s recall the car example
29
Motivation for Graph-Cuts
Suppose we have two clusters in color-space Each pixel is colored by it’s associated gaussian index
30
A Problem: Noise
Why? Pixel labeling is done independently for each pixel, ignoring the spatial relationships between pixels!
Motivation for Graph-Cuts
31
Previous model for labeling:
A new model for labeling. Minimize E:
f = Labeling function, assigns label fp for each pixel p Edata = Data Term Esmooth = Smooth Term Lamda is a free parameter
Formalizing a New Labeling Problem
,
1,..., ( )( ) argmax( )
( )p p jj
j K gaussiansf label p w
p image pixels
( ) ( ) ( )data smoothE f E f E f
32
Labels Set: { j=1,…,K } Edata:
Penalize disagreement between pixel and the GMM
Esmooth: Penalize disagreement between two pixels, unless it’s a natural
edge in the image
dist(p,q) = normalized color-distance between p,q
The Energy Function
,( ) ( ) 1pp p p p p f
p Pixels
D f D f w
, ,,
0( , ) ( , )
1 ( , )p q
p q p q p q p qp q
neighbors
f fV f f V f f
dist p q ow
( ) ( ) ( )data smoothE f E f E f
33
Solving Min(E) is NP-hard It is possible to approximate the solution using iterative
methods Graph-Cuts based methods approximate the global
solution (up to constant factor) in polynomial time
Read: “Fast Approximate Energy Minimization via Graph Cuts”, Y. Boykov, O. Veksler and R. Zabih, PAMI 2001
Minimizing the Energy
34
When using iterative methods, each iteration some of the pixels change their labeling
Given a label α, a move from partition P (labeling f) to a new partition P’ (labeling f’) is called an α-expansion move if:
α-expansion moves
Current Labeling
One Pixel Move
α-β-swapMove
α-expansionMove
' 'l lP P P P l
35
Algorithm for Minimizing E(f)
1. Start with an arbitrary labeling
2. Set success = 0
3. For each label j3.1 Find f’ = argmin(E(f’)) among f’ within one α-expansion of f
3.2 If E(f’) < E(f), set f = f’ and success = 1
4. If (success == 1) Goto 2
5. Return f
How to find argmin(E(f’)) ?
36
A Reminder: min-cut / max-flow
Given two terminal nodes α and β in G=(V,E), a cut is a set of edges C E that separates α from β in G’=(V,E\C) Also, no proper subset of C separates α from β in G’.
The cost of a cut is defined as the sum of all the edge weights in the cut. The minimum-cut of G is the cut C with the lowest cost.
The minimum-cut problem is solvable in practically linear time.
37
Finding the Optimal Expansion Move
Problem:
Find f’ = argmin(E(f’)) among f’ within one α-expansion of f
Solution:
Translate the problem to a min-cut problem on an appropriately
defined graph.
38
Graph Structure for Optimal Expansion Move
pp
p p
if nodet Cf
f if nodet C
Terminal α
Terminal not(α)Cut C
1-1 correspondence between cut and labeling E(f) is minimized!
39
Each pixel gets a node
A Closer Look
P1 P2 Pα
40
Add auxiliary nodes between pixel with different labels
A Closer Look
P1 P2 Pα
41
Add two terminal nodes for α and not(α)
A Closer Look
P1 P2 Pα
42
A Closer Look
P1 P2 Pα
43
A Closer Look
P1 P2 Pα
44
A Closer Look
P1 P2 Pα
45
A Closer Look
P1 P2 Pα
46
A Closer Look
P1 P2 Pα
47
Implementation Notes
Neighboring system can be 4-connected pixels,8-connected and even more.
Lamda allows to determine the ratio between the data term and the smooth term.
Solving Min(E) is simpler and possible in polynomial time if only two labels involved(see “Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images”, Y. Boykov and M-P. Jolly 2001)
There is a ready-to-use package for solving max-flow(see http://www.cs.cornell.edu/People/vnk/software/maxflow-v2.2.src.tar.gz)
Final ProjectOptimized Color Transfer
www.cs.tau.ac.il/~gamliela/color_transfer_project/color_transfer_project.htm