MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning...

27
MACHİNE LEARNİNG 8. Clustering

Transcript of MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning...

Page 1: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

MACHİNE LEARNİNG8. Clustering

Page 2: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Motivation

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

2

Classification problem: Need P(C|X) Bayes: can be computed from P(x|C) Need to estimation P(x|C) from data Assume a model (e.g. normal distribution) up to

parameters Compute estimators(ML, MAP) for parameters from

data

Regression Need to estimate joint P(x,r) Bayes: can be computed from P(r|x) Assume model up to parameters (e.g. linear) Compute parameters from data (e.g. least squares)

Page 3: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Motivation

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

3

Not always can assume that data came from single distribution/model

Nonparametric method: don’t assume any model, compute probability of new data directly from old data

Semi-parametric/mixture models: assume data came from a unknown mixture of known models

Page 4: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Motivation

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

4

Optical Character Recognition Two way to write 7 (w/o horizontal bar) Can’t assume single distribution Mixture of unknown number of templates

Compared to classification Number of classes is known Each training sample has a label of a class Supervised Learning

Page 5: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Mixture Densities

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

5

k

iii Ppp

1

| GGxx

where Gi the components/groups/clusters,

P ( Gi ) mixture proportions (priors),

p ( x | Gi) component densities

Gaussian mixture where p(x|Gi) ~ N ( μi , ∑i ) parameters Φ = {P ( Gi ), μi , ∑i }k

i=1

unlabeled sample X={xt}t (unsupervised learning)

Page 6: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Example : Color quantization

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

6

Image: each pixels represented by 24 bit color Colors come from different distribution (e.g.

sky, grass) Don’t have labeling for each pixels if it’s sky or

grass Want to use only 256 colors in palette to

represent image as close as possible to original Quantize uniformly: assign single color to each

2^24/256 interval Waste values for rarely occurring intervals

Page 7: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Quantization

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

7

Sample (pixels): k reference vectors (palette): Select reference vector for each pixel:

Reference vectors: codebook vectors or code words

Compress image Reconstruction error

jt

jit mxmx min

otherwise0

minif 1

1

jt

jit

ti

t i itt

ikii

b

bE

mxmx

mxm X

Page 8: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Encoding/Decoding

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

8

otherwise0

minif 1 jt

jit

tib

mxmx

Page 9: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

K-means clustering

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

9

Minimize reconstruction error

Take derivatives and set to zero

Reference vectors is the mean of all instances it represents

1

k t ti i ii t i

E b

m x mX

Page 10: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

K-Means clustering

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

10

Iterative procedure for finding reference vectors

Start with random reference vectors Estimate labels b Re-compute reference vectors as means Continue till converge

Page 11: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

k-means Clustering

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

11

Page 12: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)12

Page 13: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Expectation Maximization(EM): Motivation

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

13

Date came from several distribution Assume each distribution is known up to

parameters If we would know for each data instance

from what distribution it came, could use parametric estimation

Introduce unobservable (latent) variables which indicate source distribution

Run iterative process Estimate latent variables from data and current

estimation of distribution parameters Use current values of latent variables to refine

parameter estimation

Page 14: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

EM

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

14

Log-Likelihood

Assume hidden variables Z, which when known, make optimization much simpler

Complete likelihood, Lc(Φ |X,Z), in terms of X and Z

Incomplete likelihood, L(Φ |X), in terms of X

t

k

iii

t

t

t

Pp

p

1

|log

|log|

GG

XL

x

x

Page 15: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Latent Variables

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

15

Unknown Can’t compute complete likelihood Lc(Φ |

X,Z) Can compute its expected value E-step: | | |l l

CE Q L X,Z X,

Page 16: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

E- and M-steps

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

16

1

E-step: | | |

M-step: argmax |

l lC

l l

E

Q L X,Z X,

Q

Iterate the two steps:1. E-step: Estimate z given X and current Φ2. M-step: Find new Φ’ given z, X, and old Φ.

Page 17: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Example:

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

17

Data came from mix of Gaussians Maximize likelihood assuming we know

latent “indicator variable”

E-step: expected value of indicator variables

Page 18: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)18

P(G1|x)=h1=0.5

Page 19: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

EM for Gaussian mixtures

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

19

Assume all groups/clusters are Gaussians

Multivariate Uncorrelated Same Variance Harden indicators

EM: expected values are between 0 and 1 K-means: 0 or 1

Same as k-means

Page 20: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Dimensionality Reduction vs. Clustering

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

20

Dimensionality reduction methods find correlations between features and group features Age and Income are correlated

Clustering methods find similarities between instances and group instances Customer A and B are from the same cluster

Page 21: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Clustering: Usage for supervised learning

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

21

Describe data in terms of cluster Represent all data in cluster by cluster mean Range of attributes

Map data into new space(preprocessing) D- dimension original space k- number of clusters Use indicator variables as data

representations k might be larger then d

Page 22: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Mixture of Mixtures

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

22

K

iii

k

jijiji

Ppp

Pppi

1

1

|

||

CC

GGC

xx

xx

In classification, the input comes from a mixture of classes (supervised).

If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:

Page 23: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Hierarchical Clustering

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

23

Probabilistic view Fit mixture model to data Find codewords minimizing reconstruction

error

Hierarchical clustering Group similar items together No specific model/distribution Items in groups is more similar to each

other than instances in different groups

Page 24: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Hierarchical Clustering

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

24

p/d

j

psj

rj

srm xx,d

1

1 xx

Minkowski (Lp) (Euclidean for p = 2)

City-block distance

d

j

sj

rj

srcb xx,d

1xx

Page 25: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Agglomerative Clustering

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

25

Start with clusters each having single point At each step merge similar clusters Measure of similarity

Minimal distance(single link) Distance between closest points in2 groups

Maximal distance(complete link) Distance between most distant points in2 groups

Average distance Distance between group centers

Page 26: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Example: Single-Link Clustering

Based on for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

26

Dendrogram

Page 27: MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Choosing k

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1)

27

Defined by the application, e.g., image quantization

Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances)