Download - CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Transcript
Page 1: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

CS6220 DATA MINING TECHNIQUES

Instructor Yizhou Sun yzsunccsneuedu

April 10 2013

Chapter 11 Advanced Clustering Analysis

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

2

Recall K-Means

bull Objective function

bull 119869 = ||119909119894 minus 119888119895||2

119862 119894 =119895119896119895=1

bull Total within-cluster variance

bull Re-arrange the objective function

bull 119869 = 119908119894119895||119909119894 minus 119888119895||2

119894119896119895=1

bull Where 119908119894119895 = 1 119894119891 119909119894 119887119890119897119900119899119892119904 119905119900 119888119897119906119904119905119890119903 119895 119908119894119895 = 0 119900119905ℎ119890119903119908119894119904119890

bull Looking for

bull The best assignment 119908119894119895

bull The best center 119888119895

3

Solution of K-Means

bull Iterations

bull Step 1 Fix centers 119888119895 find assignment 119908119894119895 that minimizes 119869

bull =gt 119908119894119895 = 1 119894119891 ||119909119894 minus 119888119895||2 is the smallest

bull Step 2 Fix assignment 119908119894119895 find centers that minimize 119869

bull =gt first derivative of 119869 = 0

bull =gt 120597119869

120597119888119895= minus2 119908119894119895(119909119894 minus 119888119895) =119894

119896119895=1 0

bull =gt119888119895 = 119908119894119895119909119894119894

119908119894119895119894

bull Note 119908119894119895119894 is the total number of objects in cluster j

4

Limitations of K-Means

bull K-means has problems when clusters are of differing

bull Sizes

bull Densities

bull Non-Spherical Shapes

11

Limitations of K-Means Different Density and Size

12

Limitations of K-Means Non-Spherical Shapes

13

Fuzzy Set and Fuzzy Cluster

bull Clustering methods discussed so far

bull Every data object is assigned to exactly one cluster

bull Some applications may need for fuzzy or soft cluster assignment

bull Ex An e-game could belong to both entertainment

and software

bull Methods fuzzy clusters and probabilistic model-based clusters

bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)

14

Probabilistic Model-Based Clustering

bull Cluster analysis is to find hidden categories

bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)

Ex categories for digital cameras sold

consumer line vs professional line

density functions f1 f2 for C1 C2

obtained by probabilistic clustering

A mixture model assumes that a set of observed objects is a mixture

of instances from multiple probabilistic clusters and conceptually

each observed object is generated independently

Our task infer a set of k probabilistic clusters that is mostly likely to

generate D using the above data generation process

15

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 2: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

2

Recall K-Means

bull Objective function

bull 119869 = ||119909119894 minus 119888119895||2

119862 119894 =119895119896119895=1

bull Total within-cluster variance

bull Re-arrange the objective function

bull 119869 = 119908119894119895||119909119894 minus 119888119895||2

119894119896119895=1

bull Where 119908119894119895 = 1 119894119891 119909119894 119887119890119897119900119899119892119904 119905119900 119888119897119906119904119905119890119903 119895 119908119894119895 = 0 119900119905ℎ119890119903119908119894119904119890

bull Looking for

bull The best assignment 119908119894119895

bull The best center 119888119895

3

Solution of K-Means

bull Iterations

bull Step 1 Fix centers 119888119895 find assignment 119908119894119895 that minimizes 119869

bull =gt 119908119894119895 = 1 119894119891 ||119909119894 minus 119888119895||2 is the smallest

bull Step 2 Fix assignment 119908119894119895 find centers that minimize 119869

bull =gt first derivative of 119869 = 0

bull =gt 120597119869

120597119888119895= minus2 119908119894119895(119909119894 minus 119888119895) =119894

119896119895=1 0

bull =gt119888119895 = 119908119894119895119909119894119894

119908119894119895119894

bull Note 119908119894119895119894 is the total number of objects in cluster j

4

Limitations of K-Means

bull K-means has problems when clusters are of differing

bull Sizes

bull Densities

bull Non-Spherical Shapes

11

Limitations of K-Means Different Density and Size

12

Limitations of K-Means Non-Spherical Shapes

13

Fuzzy Set and Fuzzy Cluster

bull Clustering methods discussed so far

bull Every data object is assigned to exactly one cluster

bull Some applications may need for fuzzy or soft cluster assignment

bull Ex An e-game could belong to both entertainment

and software

bull Methods fuzzy clusters and probabilistic model-based clusters

bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)

14

Probabilistic Model-Based Clustering

bull Cluster analysis is to find hidden categories

bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)

Ex categories for digital cameras sold

consumer line vs professional line

density functions f1 f2 for C1 C2

obtained by probabilistic clustering

A mixture model assumes that a set of observed objects is a mixture

of instances from multiple probabilistic clusters and conceptually

each observed object is generated independently

Our task infer a set of k probabilistic clusters that is mostly likely to

generate D using the above data generation process

15

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 3: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Recall K-Means

bull Objective function

bull 119869 = ||119909119894 minus 119888119895||2

119862 119894 =119895119896119895=1

bull Total within-cluster variance

bull Re-arrange the objective function

bull 119869 = 119908119894119895||119909119894 minus 119888119895||2

119894119896119895=1

bull Where 119908119894119895 = 1 119894119891 119909119894 119887119890119897119900119899119892119904 119905119900 119888119897119906119904119905119890119903 119895 119908119894119895 = 0 119900119905ℎ119890119903119908119894119904119890

bull Looking for

bull The best assignment 119908119894119895

bull The best center 119888119895

3

Solution of K-Means

bull Iterations

bull Step 1 Fix centers 119888119895 find assignment 119908119894119895 that minimizes 119869

bull =gt 119908119894119895 = 1 119894119891 ||119909119894 minus 119888119895||2 is the smallest

bull Step 2 Fix assignment 119908119894119895 find centers that minimize 119869

bull =gt first derivative of 119869 = 0

bull =gt 120597119869

120597119888119895= minus2 119908119894119895(119909119894 minus 119888119895) =119894

119896119895=1 0

bull =gt119888119895 = 119908119894119895119909119894119894

119908119894119895119894

bull Note 119908119894119895119894 is the total number of objects in cluster j

4

Limitations of K-Means

bull K-means has problems when clusters are of differing

bull Sizes

bull Densities

bull Non-Spherical Shapes

11

Limitations of K-Means Different Density and Size

12

Limitations of K-Means Non-Spherical Shapes

13

Fuzzy Set and Fuzzy Cluster

bull Clustering methods discussed so far

bull Every data object is assigned to exactly one cluster

bull Some applications may need for fuzzy or soft cluster assignment

bull Ex An e-game could belong to both entertainment

and software

bull Methods fuzzy clusters and probabilistic model-based clusters

bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)

14

Probabilistic Model-Based Clustering

bull Cluster analysis is to find hidden categories

bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)

Ex categories for digital cameras sold

consumer line vs professional line

density functions f1 f2 for C1 C2

obtained by probabilistic clustering

A mixture model assumes that a set of observed objects is a mixture

of instances from multiple probabilistic clusters and conceptually

each observed object is generated independently

Our task infer a set of k probabilistic clusters that is mostly likely to

generate D using the above data generation process

15

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 4: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Solution of K-Means

bull Iterations

bull Step 1 Fix centers 119888119895 find assignment 119908119894119895 that minimizes 119869

bull =gt 119908119894119895 = 1 119894119891 ||119909119894 minus 119888119895||2 is the smallest

bull Step 2 Fix assignment 119908119894119895 find centers that minimize 119869

bull =gt first derivative of 119869 = 0

bull =gt 120597119869

120597119888119895= minus2 119908119894119895(119909119894 minus 119888119895) =119894

119896119895=1 0

bull =gt119888119895 = 119908119894119895119909119894119894

119908119894119895119894

bull Note 119908119894119895119894 is the total number of objects in cluster j

4

Limitations of K-Means

bull K-means has problems when clusters are of differing

bull Sizes

bull Densities

bull Non-Spherical Shapes

11

Limitations of K-Means Different Density and Size

12

Limitations of K-Means Non-Spherical Shapes

13

Fuzzy Set and Fuzzy Cluster

bull Clustering methods discussed so far

bull Every data object is assigned to exactly one cluster

bull Some applications may need for fuzzy or soft cluster assignment

bull Ex An e-game could belong to both entertainment

and software

bull Methods fuzzy clusters and probabilistic model-based clusters

bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)

14

Probabilistic Model-Based Clustering

bull Cluster analysis is to find hidden categories

bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)

Ex categories for digital cameras sold

consumer line vs professional line

density functions f1 f2 for C1 C2

obtained by probabilistic clustering

A mixture model assumes that a set of observed objects is a mixture

of instances from multiple probabilistic clusters and conceptually

each observed object is generated independently

Our task infer a set of k probabilistic clusters that is mostly likely to

generate D using the above data generation process

15

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 5: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Limitations of K-Means

bull K-means has problems when clusters are of differing

bull Sizes

bull Densities

bull Non-Spherical Shapes

11

Limitations of K-Means Different Density and Size

12

Limitations of K-Means Non-Spherical Shapes

13

Fuzzy Set and Fuzzy Cluster

bull Clustering methods discussed so far

bull Every data object is assigned to exactly one cluster

bull Some applications may need for fuzzy or soft cluster assignment

bull Ex An e-game could belong to both entertainment

and software

bull Methods fuzzy clusters and probabilistic model-based clusters

bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)

14

Probabilistic Model-Based Clustering

bull Cluster analysis is to find hidden categories

bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)

Ex categories for digital cameras sold

consumer line vs professional line

density functions f1 f2 for C1 C2

obtained by probabilistic clustering

A mixture model assumes that a set of observed objects is a mixture

of instances from multiple probabilistic clusters and conceptually

each observed object is generated independently

Our task infer a set of k probabilistic clusters that is mostly likely to

generate D using the above data generation process

15

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 6: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Limitations of K-Means Different Density and Size

12

Limitations of K-Means Non-Spherical Shapes

13

Fuzzy Set and Fuzzy Cluster

bull Clustering methods discussed so far

bull Every data object is assigned to exactly one cluster

bull Some applications may need for fuzzy or soft cluster assignment

bull Ex An e-game could belong to both entertainment

and software

bull Methods fuzzy clusters and probabilistic model-based clusters

bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)

14

Probabilistic Model-Based Clustering

bull Cluster analysis is to find hidden categories

bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)

Ex categories for digital cameras sold

consumer line vs professional line

density functions f1 f2 for C1 C2

obtained by probabilistic clustering

A mixture model assumes that a set of observed objects is a mixture

of instances from multiple probabilistic clusters and conceptually

each observed object is generated independently

Our task infer a set of k probabilistic clusters that is mostly likely to

generate D using the above data generation process

15

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 7: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Limitations of K-Means Non-Spherical Shapes

13

Fuzzy Set and Fuzzy Cluster

bull Clustering methods discussed so far

bull Every data object is assigned to exactly one cluster

bull Some applications may need for fuzzy or soft cluster assignment

bull Ex An e-game could belong to both entertainment

and software

bull Methods fuzzy clusters and probabilistic model-based clusters

bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)

14

Probabilistic Model-Based Clustering

bull Cluster analysis is to find hidden categories

bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)

Ex categories for digital cameras sold

consumer line vs professional line

density functions f1 f2 for C1 C2

obtained by probabilistic clustering

A mixture model assumes that a set of observed objects is a mixture

of instances from multiple probabilistic clusters and conceptually

each observed object is generated independently

Our task infer a set of k probabilistic clusters that is mostly likely to

generate D using the above data generation process

15

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 8: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Fuzzy Set and Fuzzy Cluster

bull Clustering methods discussed so far

bull Every data object is assigned to exactly one cluster

bull Some applications may need for fuzzy or soft cluster assignment

bull Ex An e-game could belong to both entertainment

and software

bull Methods fuzzy clusters and probabilistic model-based clusters

bull Fuzzy cluster A fuzzy set S FS X rarr [0 1] (value between 0 and 1)

14

Probabilistic Model-Based Clustering

bull Cluster analysis is to find hidden categories

bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)

Ex categories for digital cameras sold

consumer line vs professional line

density functions f1 f2 for C1 C2

obtained by probabilistic clustering

A mixture model assumes that a set of observed objects is a mixture

of instances from multiple probabilistic clusters and conceptually

each observed object is generated independently

Our task infer a set of k probabilistic clusters that is mostly likely to

generate D using the above data generation process

15

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 9: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Probabilistic Model-Based Clustering

bull Cluster analysis is to find hidden categories

bull A hidden category (ie probabilistic cluster) is a distribution over the data space which can be mathematically represented using a probability density function (or distribution function)

Ex categories for digital cameras sold

consumer line vs professional line

density functions f1 f2 for C1 C2

obtained by probabilistic clustering

A mixture model assumes that a set of observed objects is a mixture

of instances from multiple probabilistic clusters and conceptually

each observed object is generated independently

Our task infer a set of k probabilistic clusters that is mostly likely to

generate D using the above data generation process

15

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 10: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

16

Mixture Model-Based Clustering

bull A set C of k probabilistic clusters C1 hellipCk with probability density functions f1

hellip fk respectively and their probabilities ω1 hellip ωk

bull Probability of an object o generated by cluster Cj is

bull Probability of o generated by the set of cluster C is

Since objects are assumed to be generated

independently for a data set D = o1 hellip on we have

Task Find a set C of k probabilistic clusters st P(D|C) is maximized

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 11: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

The EM (Expectation Maximization) Algorithm

bull The (EM) algorithm A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models bull E-step assigns objects to clusters according to the current fuzzy

clustering or parameters of probabilistic clusters

bull 119908119894119895119905 = 119901 119911119894 = 119895 120579119895

119905 119909119894 prop 119901 119909119894 119862119895119905 120579119895

119905 119901(119862119895119905)

bull M-step finds the new clustering or parameters that minimize the sum of squared error (SSE) or the expected likelihood bull Under uni-variant normal distribution assumptions

bull 120583119895119905+1 =

119908119894119895119905 119909119894119894

119908119894119895119905

119894 120590119895

2 = 119908119894119895

119905 119909119894minus1198881198951199052

119894

119908119894119895119905

119894 119901 119862119895

119905 prop 119908119894119895119905

119894

bull More about mixture model and EM algorithms httpwwwstatcmuedu~cshalizi350lectures29lecture-29pdf

17

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 12: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

K-Means Special Case of Gaussian Mixture Model

bull When each Gaussian component with covariance matrix 1205902119868

bull Soft K-means

bull When 1205902 rarr 0

bull Soft assignment becomes hard assignment

18

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 13: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Advantages and Disadvantages of Mixture Models

bull Strength

bull Mixture models are more general than partitioning

bull Clusters can be characterized by a small number of parameters

bull The results may satisfy the statistical assumptions of the generative models

bull Weakness

bull Converge to local optimal (overcome run multi-times w random

initialization)

bull Computationally expensive if the number of distributions is large or the

data set contains very few observed data points

bull Need large data sets

bull Hard to estimate the number of clusters

19

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 14: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Kernel K-Means

bull How to cluster the following data

bull A non-linear map 120601119877119899 rarr 119865

bull Map a data point into a higherinfinite dimensional space

bull 119909 rarr 120601 119909

bull Dot product matrix 119870119894119895

bull 119870119894119895 =lt 120601 119909119894 120601(119909119895) gt

20

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 15: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Solution of Kernel K-Means

bull Objective function under new feature space

bull 119869 = 119908119894119895||120601(119909119894) minus 119888119895||2

119894119896119895=1

bull Algorithm

bull By fixing assignment 119908119894119895

bull 119888119895 = 119908119894119895119894 120601(119909119894) 119908119894119895119894

bull In the assignment step assign the data points to the closest

center

bull 119889 119909119894 119888119895 = 120601 119909119894 minus 119908119894prime119895119894prime 120601 119909119894prime

119908119894prime119895119894prime

2

=

120601 119909119894 sdot 120601 119909119894 minus 2 119908

119894prime119895119894prime120601 119909119894 sdot120601 119909

119894prime

119908119894prime119895119894prime+ 119908

119894prime119895119908119897119895120601 119909

119894primesdot120601(119909119897)119897119894prime

( 119908119894prime119895)^2 119894prime

21

Do not really need to know 120601 119909 119887119906119905 119900119899119897119910 119870119894119895

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 16: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Advatanges and Disadvantages of Kernel K-Means

bull Advantages

bull Algorithm is able to identify the non-linear structures

bull Disadvantages

bull Number of cluster centers need to be predefined

bull Algorithm is complex in nature and time complexity is large

bull References

bull Kernel k-means and Spectral Clustering by Max Welling

bull Kernel k-means Spectral Clustering and Normalized Cut by

Inderjit S Dhillon Yuqiang Guan and Brian Kulis

bull An Introduction to kernel methods by Colin Campbell

22

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 17: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm for Mixture Models

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

23

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 18: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Clustering Graphs and Network Data

bull Applications

bull Bi-partite graphs eg customers and products authors and conferences

bull Web search engines eg click through graphs and Web graphs

bull Social networks friendshipcoauthor graphs

24

Clustering books about politics [Newman 2006]

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 19: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Algorithms bull Graph clustering methods

bull Density-based clustering SCAN (Xu et al KDDrsquo2007)

bull Spectral clustering

bull Modularity-based approach

bull Probabilistic approach

bull Nonnegative matrix factorization

bull hellip

25

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 20: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

SCAN Density-Based Clustering of Networks

bull How many clusters

bull What size should they be

bull What is the best partitioning

bull Should some points be segregated

26

An Example Network

Application Given simply information of who associates with whom

could one identify clusters of individuals with common interests or

special relationships (families cliques terrorist cells)

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 21: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

A Social Network Model

bull Cliques hubs and outliers

bull Individuals in a tight social group or clique know many of the same

people regardless of the size of the group

bull Individuals who are hubs know many people in different groups but belong

to no single group Politicians for example bridge multiple groups

bull Individuals who are outliers reside at the margins of society Hermits for

example know few people and belong to no group

bull The Neighborhood of a Vertex

27

v

Define () as the immediate

neighborhood of a vertex (ie the set

of people that an individual knows )

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 22: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Structure Similarity

bull The desired features tend to be captured by a measure we

call Structural Similarity

bull Structural similarity is large for members of a clique and small

for hubs and outliers

|)(||)(|

|)()(|)(

wv

wvwv

28

v

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 23: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Structural Connectivity [1]

bull -Neighborhood

bull Core

bull Direct structure reachable

bull Structure reachable transitive closure of direct structure

reachability

bull Structure connected

)(|)()( wvvwvN

|)(|)( vNvCORE

)()()( vNwvCOREwvDirRECH

)()()( wuRECHvuRECHVuwvCONNECT

[1] M Ester H P Kriegel J Sander amp X Xu (KDD96) ldquoA Density-Based

Algorithm for Discovering Clusters in Large Spatial Databases

29

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 24: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Structure-Connected Clusters

bull Structure-connected cluster C

bull Connectivity

bull Maximality

bull Hubs

bull Not belong to any cluster

bull Bridge to many clusters

bull Outliers

bull Not belong to any cluster

bull Connect to less clusters

)( wvCONNECTCwv

CwwvREACHCvVwv )(

hub

outlier

30

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 25: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

31

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 26: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

063

32

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 27: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

075

067

082

33

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 28: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

34

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 29: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

067

35

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 30: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

073

073

073

36

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 31: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

37

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 32: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

38

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 33: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

068

39

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 34: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

051

40

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 35: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

41

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 36: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07 051

051

068

42

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 37: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

13

9

10

11

7

8

12

6

4

0

1 5

2

3

Algorithm

= 2

= 07

43

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 38: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Running Time

bull Running time = O(|E|) bull For sparse networks = O(|V|)

[2] A Clauset M E J Newman amp C Moore Phys Rev E 70 066111 (2004) 44

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 39: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Spectral Clustering

bull Reference ICDMrsquo09 Tutorial by Chris Ding

bull Example

bull Clustering supreme court justices according to their voting

behavior

45

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 40: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Example Continue

46

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 41: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Spectral Graph Partition

bull Min-Cut

bull Minimize the of cut of edges

47

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 42: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Objective Function

48

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 43: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Minimum Cut with Constraints

49

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 44: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

New Objective Functions

50

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 45: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Other References

bull A Tutorial on Spectral Clustering by U Luxburg httpwwwkybmpgdefileadminuser_uploadfilespublicationsattachmentsLuxburg07_tutorial_44885B05Dpdf

51

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 46: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Chapter 10 Cluster Analysis Basic Concepts and Methods

bull Beyond K-Means

bull K-means

bull EM-algorithm

bull Kernel K-means

bull Clustering Graphs and Network Data

bull Summary

52

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 47: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Summary

bull Generalizing K-Means

bull Mixture Model EM-Algorithm Kernel K-Means

bull Clustering Graph and Networked Data

bull SCAN density-based algorithm

bull Spectral clustering

53

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54

Page 48: CS6220: Data Mining Techniques - UCLAweb.cs.ucla.edu/~yzsun/classes/2013Spring_CS6220/slides/... · 2013-04-10 · Probabilistic Model-Based Clustering • Cluster analysis is to

Announcement

bull HW 3 due tomorrow

bull Course project due next week bull Submit final report data code (with readme) evaluation forms

bull Make appointment with me to explain your project

bull I will ask questions according to your report

bull Final Exam bull 422 3 hours in class cover the whole semester with different

weights

bull You can bring two A4 cheating sheets one for content before midterm and the other for content after midterm

bull Interested in research bull My research area Informationsocial network mining

54