Machine Learning

41
Outline Machine Learning Devdatt Dubhashi Department of Computer Science and Engineering Chalmers University Gothenburg, Sweden. LP3 2007 Dubhashi Machine Learrning

description

 

Transcript of Machine Learning

Page 1: Machine Learning

Outline

Machine Learning

Devdatt DubhashiDepartment of Computer Science and Engineering

Chalmers UniversityGothenburg, Sweden.

LP3 2007

Dubhashi Machine Learrning

Page 2: Machine Learning

Outline

Outline

1 k-Means Clustering

2 Mixtures of Gaussians and EM Algorithm

Dubhashi Machine Learrning

Page 3: Machine Learning

Outline

Outline

1 k-Means Clustering

2 Mixtures of Gaussians and EM Algorithm

Dubhashi Machine Learrning

Page 4: Machine Learning

k-meansmix gaussians

Clustering

Data set {x 1, · · · , x N} of N observations of a randomd-dim eculidean variable x .

Goal is to partition the data set ito K clusters (K known).

Intuitively, the points within a cluster must be “close” toeach other compared to pints outside the cluster.

Dubhashi Machine Learrning

Page 5: Machine Learning

k-meansmix gaussians

Cluster centers and assignments

Find a set of centers µk , k ∈ [K ]

Assign each data point to one of the centers so as tominimize the sum of the squares of the distances to theassigned centers.

Dubhashi Machine Learrning

Page 6: Machine Learning

k-meansmix gaussians

Assignment and Distortion

Introduce binary indicator variables

rn,k :=

{1, if xn is asssigned to µk

0, otherwise

Minimize the distortion measure

J :=∑

n∈[N]

∑k∈[K ]

rn,k ||xn − µk ||2 .

Dubhashi Machine Learrning

Page 7: Machine Learning

k-meansmix gaussians

Two Step Optimization

Start with some initial values of µk . Basic iteration consists oftwo steps until convergence.

M Minimize J wrt rn,k keeping µk fixed.

E Minimize J wrt µk keeping rn,k fixed.

Dubhashi Machine Learrning

Page 8: Machine Learning

k-meansmix gaussians

Two Step Optimization: M Step

Minimize J wrt rn,k keeping µk fixed:

rn,k :=

{1, if k = argminj ||xn − µk ||2

0, otherwise.

Dubhashi Machine Learrning

Page 9: Machine Learning

k-meansmix gaussians

Two Step Optimization: E Step

Minimize J wrt µk keeping rn,k fixed: J is a quadratic function ofµk , so setting derivative to zero gives:∑

n∈[N]

rn,k (x n − µk ) = 0,

hence,

µk =

∑n rn,kx n∑

n rn,k.

In words: set µk to be the mean of the points assigned tocluster k , hence called K-means Algorithms.

Dubhashi Machine Learrning

Page 10: Machine Learning

k-meansmix gaussians

K-Means Algorithm Analysis

Since J decreses at each iteration, convergenceguaranteesd.

But may converge to a local rather than a global optimum.

Dubhashi Machine Learrning

Page 11: Machine Learning

k-meansmix gaussians

K-Means Algorithm: Example

(a)

−2 0 2

−2

0

2 (b)

−2 0 2

−2

0

2

Dubhashi Machine Learrning

Page 12: Machine Learning

k-meansmix gaussians

K-Means Algorithm: Example

(c)

−2 0 2

−2

0

2 (d)

−2 0 2

−2

0

2

Dubhashi Machine Learrning

Page 13: Machine Learning

k-meansmix gaussians

K-Means Algorithm: Example

(e)

−2 0 2

−2

0

2 (f)

−2 0 2

−2

0

2

Dubhashi Machine Learrning

Page 14: Machine Learning

k-meansmix gaussians

K-Means Algorithm: Example

(g)

−2 0 2

−2

0

2 (h)

−2 0 2

−2

0

2

Dubhashi Machine Learrning

Page 15: Machine Learning

k-meansmix gaussians

K-Means Algorithm: Example

(i)

−2 0 2

−2

0

2

Dubhashi Machine Learrning

Page 16: Machine Learning

k-meansmix gaussians

K-Means and Image Segmentation

Image Segmentation problem: partition image into regionsof homogeneosu visual appearance, corresponding toobjects or parts of objects.

Each pixel is a 3-dim point corresponding to intensities ofred, blue and green channels.

perform K-means and redraw image replacing ecah pixelby the corresponding center µk .

Dubhashi Machine Learrning

Page 17: Machine Learning

k-meansmix gaussians

K-Means Algorithm: Example

����� �����

Dubhashi Machine Learrning

Page 18: Machine Learning

k-meansmix gaussians

K-Means Algorithm: Example

������� Original image

Dubhashi Machine Learrning

Page 19: Machine Learning

k-meansmix gaussians

K-Means Algorithm: Example

Dubhashi Machine Learrning

Page 20: Machine Learning

k-meansmix gaussians

K-Means Algorithm: Example

Dubhashi Machine Learrning

Page 21: Machine Learning

k-meansmix gaussians

K-Means and Data Compression

Lossy as opposed to lossless compression where weaccept some errors in recontruction in return for higher rateof compression.

Instead of storing all the N data pointsm store only theidentity of the assigned cluster, and the cluster centers.

Significant savings provided K << N.

Each data point approximated by nearest center µk :code-book vectors.

New data compressed by finding nearest center andstoring only the label k of corresponding cluster.

Scheme called Vector Quantization.

Dubhashi Machine Learrning

Page 22: Machine Learning

k-meansmix gaussians

K-Means and Data Compression: Example

Suppose original image has N pixels comprising {R, G, B}values which are stored with 8 bits precision. Then totalspace required is 24N bits.

Instead if we first do K -means and transmit only label ofcorresponding cluster for ecah pixel, this takes log K bitsper pixel for a total of N log K bits.

Also need to transmit the K code–book vectors whichneeds 24K bits.

In example, original image has 240 × 180 = 43, 200 pixels,requring 24 × 43, 200 = 1, 036, 800 pixels.

Compressed images require 43, 248 (K = 2), 86, 472(K = 3) and 173, 040 (K = 10) bits.

Dubhashi Machine Learrning

Page 23: Machine Learning

k-meansmix gaussians

Mixtures of Gaussians: Motivation

Pure Gaussian distributions have limitations when it comesto modelling real life data.

Example: “Olfd Faithful” eruption durations.

Forms two dominant clumps

Single Gaussian can’t model this data well

Linear superposition of two Gaussians does much better.

Dubhashi Machine Learrning

Page 24: Machine Learning

k-meansmix gaussians

Old Faithful Eruptions

1 2 3 4 5 640

60

80

100

1 2 3 4 5 640

60

80

100

Dubhashi Machine Learrning

Page 25: Machine Learning

k-meansmix gaussians

Mixtures of Gaussians: Modelling

Linear combination of Gaussianscan give rise to complexdistributions.

By using a suffieicnt number ofGaussians and adjusting theirmeans and covariannces, as wellas linear combinationcoefiicients, can model almostany continuous density toarbitrary accuracy.

x

p(x)

Dubhashi Machine Learrning

Page 26: Machine Learning

k-meansmix gaussians

Mixtures of Gaussians: Definition

Superpositon of Gaussians of the form

p(x ) :=∑

k∈[K ]

πkN (x | µk ,Σk ).

Each Gaussian density N (x | µk ,Σk ) is a component ofthe mixture with its own mean and covariance.

Parameters πk are mixing coefficients and satisfy0 ≤ πk ≤ 1 and

∑k πk = 1.

Dubhashi Machine Learrning

Page 27: Machine Learning

k-meansmix gaussians

Mixtures of Gaussians: Definition

0.5 0.3

0.2

(a)

0 0.5 1

0

0.5

1 (b)

0 0.5 1

0

0.5

1

Dubhashi Machine Learrning

Page 28: Machine Learning

k-meansmix gaussians

Mixtures of Gaussians: Definition

0.5 0.3

0.2

(a)

0 0.5 1

0

0.5

1

Dubhashi Machine Learrning

Page 29: Machine Learning

k-meansmix gaussians

Equivalent Definition: Latent Variable

Can introduce a latent variable z which is such that exactlyone component is 1 and the rest are zeros, withp(zk = 1) = πk . This variable gives the component.Given z, the conditional distribution is

p(x | zk = 1) = N (x | µk ,Σk ).

Inverting this, using Baye’s rule

γ(zk ) := p(zk = 1 | x )

=p(zk = 1)p(x | zk = 1)∑

j p(zj = 1)p(x | zj = 1)

=πkN (x | µk ,Σk )∑

j πjN (x | µj ,Σj)

is the posterior probability or responsibility that componentk takes for observation x .

Dubhashi Machine Learrning

Page 30: Machine Learning

k-meansmix gaussians

Mixtures and Responsibilities

(a)

0 0.5 1

0

0.5

1 (b)

0 0.5 1

0

0.5

1

Dubhashi Machine Learrning

Page 31: Machine Learning

k-meansmix gaussians

Mixtures and Responsibilities

(b)

0 0.5 1

0

0.5

1 (c)

0 0.5 1

0

0.5

1

Dubhashi Machine Learrning

Page 32: Machine Learning

k-meansmix gaussians

Learning Mixtures

Suppose we have a data set of observations representedby a N × D matrix X := {x 1, · · · , x N} and we want tomodel it as a mixture of K Gaussians.

Need to find mixing coefficients πk , and parameters ofcomponent models, µk and Σk .

Dubhashi Machine Learrning

Page 33: Machine Learning

k-meansmix gaussians

Learning Mixtures: The Means

Start with the loglikelihood function:

ln p(X | πµ,Σ) =∑

n∈[N]

ln

∑k∈[K ]

πkN (x n | µk ,Σk )

Setting derivative wrt µk to zero, and assuming Σ isinvertible gives:

µk =1

Nk

∑n∈[N]

γ(zn,k )x n,

where Nk :=∑

n∈[N] γ(zn,k ).

Dubhashi Machine Learrning

Page 34: Machine Learning

k-meansmix gaussians

Learning Mixtures: The Means

Interpret Nk as the “effective number of points” assigned tocluster k .

Note that the mean µk for the k th Gaussian component isgiven by a weighted mean of all points in the data set

The weighting factor for data point x n is given by theposterior probability or responsibilty of component k forgenerating x n.

Dubhashi Machine Learrning

Page 35: Machine Learning

k-meansmix gaussians

Learning Mixtures: The Covariances

Setting derivative wrt Σk to zero, and assuming Σ isinvertible gives:

σk =1

Nk

∑n∈[N]

γ(zn,k )(x n − µk )(x n − µk )T .

which is same as sigle Gaussian solution but with aavergae weighted by the corresponding posteriorprobability.

Dubhashi Machine Learrning

Page 36: Machine Learning

k-meansmix gaussians

Learning Mixtures: Mixing Coefficients

Setting derivative wrt πk to zero, and taking into accountthat

∑k πk = 1 (lagrange multipliers!)

πk =Nk

N.

The mixing coefficient for the k th componet is the averageresponsibilility that the component takes for explaining thedata set.

Dubhashi Machine Learrning

Page 37: Machine Learning

k-meansmix gaussians

Learning Mixtures: EM Algorithm

1 Initialize means, covars and mix coeffs and repeat:2 E Step : Evaluate responsibilities using current parameters:

γ(zn,k ) =πkN (x n | µk ,Σk )∑

j πjN (x n | µj ,Σj)

3 M Step : Re-estimate parameters using currentresponsibilities:

µnewk =

1Nk

∑n

γ(zn,k )x n

Σnewk =

1Nk

∑n

γ(zn,k )(x n − µnewk )(x n − µnew

k )T .

πnewk =

Nk

N,

where Nk :=∑

n γ(zn,k ).

Dubhashi Machine Learrning

Page 38: Machine Learning

k-meansmix gaussians

EM Algorithm: Example

(a)−2 0 2

−2

0

2

(b)−2 0 2

−2

0

2

Dubhashi Machine Learrning

Page 39: Machine Learning

k-meansmix gaussians

EM Algorithm: Example

(c)

�����

−2 0 2

−2

0

2

(d)

�����

−2 0 2

−2

0

2

Dubhashi Machine Learrning

Page 40: Machine Learning

k-meansmix gaussians

EM Algorithm: Example

(e)

�����

−2 0 2

−2

0

2

(f)

�������

−2 0 2

−2

0

2

Dubhashi Machine Learrning

Page 41: Machine Learning

k-meansmix gaussians

EM vs K-Means

K-means performs a hard assignment of data points toclusters i.e. each data point is assigned to a unique cluster.

EM algorithm makes a soft assignment based on posteriorprobabilities.

K-means can be derived as the limit of the EM algorithmassigned to a particular instance of Gaussian mixtures.

Dubhashi Machine Learrning