Clustering:HierarchicalClusteringandK3...

46
Clustering: Hierarchical Clustering and K Means Clustering Machine Learning 10601B Seyoung Kim Many of these slides are derived from William Cohen, Ziv BarJoseph, Eric Xing. Thanks!

Transcript of Clustering:HierarchicalClusteringandK3...

Page 1: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Clustering:  Hierarchical  Clustering  and  K-­‐Means  Clustering  

Machine  Learning  10-­‐601B  

Seyoung  Kim  

Many  of  these  slides  are  derived  from  William  Cohen,  Ziv  Bar-­‐Joseph,  Eric  Xing.  Thanks!  

Page 2: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Two  Classes  of  Learning  Problems  

•  Supervised  learning  =  learning  from  labeled  data,  where  class  labels  (in  classificaNon)  or  output  values  (regression)  are  given        –  Train  data:  (X,  Y)  for  inputs  X  and  labels  Y  

•  Unsupervised  learning  =  learning  from  unlabeled,  unannotated  data  –  Train  data:  X  for  unlabeled  data  –  we  do  not  have  a  teacher  that  provides  examples  with  their  labels    

Page 3: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

•  Organizing  data  into  clusters  such  that  there  is  •   high  intra-­‐cluster  similarity  

•   low  inter-­‐cluster  similarity    

•  Informally,  finding  natural  groupings  among  objects.  

•  Why  do  we  want  to  do  that?  

•  Any  REAL  applicaNon?  

Page 4: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Examples  

•  People  

•  Images  

•  Language  

•  species  

Page 5: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Unsupervised  learning  

•   Clustering  methods  –  Non-­‐probabilisNc  method  

•  Hierarchical  clustering  •  K  means  algorithm  

–  ProbabilisNc  method  

•  Mixture  model  

•   We  will  also  discuss  dimensionality  reducNon,  another  unsupervised  learning  method  later  in  the  course  

Page 6: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

The  quality  or  state  of  being  similar;  likeness;  resemblance;  as,  a  similarity  of  features.  

Similarity  is  hard  to  define,  but…    “We  know  it  when  we  see  it”  

The  real  meaning  of  similarity  is  a  philosophical  quesNon.  We  will  take  a  more  pragmaNc  approach.      

Webster's  DicEonary  

Page 7: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

DefiniEon:  Let  O1  and  O2  be  two  objects  from  the  universe  of  possible  objects.  The  distance  (dissimilarity)  between  O1  and  O2  is  a  real  number  denoted  by  D(O1,O2)  

0.23   3   342.7  

gene2 gene1

Page 8: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

What  properEes  should  a  distance  measure  have?  

•  D(A,B)  =  D(B,A)         Symmetry  

•  D(A,A)  =  0           Constancy  of  Self-­‐Similarity  

•  D(A,B)  =  0  IIf  A=  B         Posi:vity  Separa:on  

•  D(A,B)  ≤  D(A,C)  +  D(B,C)       Triangular  Inequality  

Page 9: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

•  D(A,B)  =  D(B,A)         Symmetry  –  Otherwise  you  could  claim  "Alex  looks  like  Bob,  but  Bob  looks  nothing  like  

Alex"  

•  D(A,A)  =  0           Constancy  of  Self-­‐Similarity  –  Otherwise  you  could  claim  "Alex  looks  more  like  Bob,  than  Bob  does"  

•  D(A,B)  =  0  IIf  A=  B         Posi:vity  Separa:on  –  Otherwise  there  are  objects  in  your  world  that  are  different,  but  you  

cannot  tell  apart.  

•  D(A,B)  ≤  D(A,C)  +  D(B,C)       Triangular  Inequality  –  Otherwise  you  could  claim  "Alex  is  very  like  Bob,  and  Alex  is  very  like  Carl,  

but  Bob  is  very  unlike  Carl"  

IntuiEons  behind  desirable  distance  measure  properEes  

Page 10: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Distance  Measures  

•  Suppose  two  object  x  and  y  both  have  p  features  

•  Euclidean  distance  

•  CorrelaNon  coefficient  

d(x,y) = | xi − yii=1

p

∑ |22

Page 11: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Similarity  Measures:  CorrelaEon  Coefficient  

Time

Gene A

Gene A Time

Gene B

Expression Level Expression Level

Expression Level

Time

Gene A Gene B

Gene B

Page 12: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

•  Hierarchical  algorithms:  Create  a  hierarchical  decomposiNon  of  the  set  of  objects  using  some  criterion  

•  ParEEonal  algorithms:  Construct  various  parNNons  and  then  evaluate  them  by  some  criterion  

Top  down  Bojom  up  or  top  down  

Page 13: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

(How-­‐to)  Hierarchical  Clustering  

BoOom-­‐Up  (agglomeraEve):  StarNng  with  each  item  in  its  own  cluster,  find  the  best  pair  to  merge  into  a  new  cluster.  Repeat  unNl  all  clusters  are  fused  together.    

Page 14: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

0   8   8   7   7  

0   2   4   4  

0   3   3  

0   1  

0  

D(    ,    )  =  8  

D(    ,    )  =  1  

We  begin  with  a  distance  matrix  which  contains  the  distances  between  every  pair  of  objects  in  our  database.  

Page 15: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

…  Consider  all  possible  merges…  

Choose  the  best  

Page 16: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

…  Consider  all  possible  merges…  

Choose  the  best  

Consider  all  possible  merges…   …  

Choose  the  best  

Page 17: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

…  Consider  all  possible  merges…  

Choose  the  best  

Consider  all  possible  merges…   …  

Choose  the  best  

Consider  all  possible  merges…  

Choose  the  best  …  

Page 18: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

…  Consider  all  possible  merges…  

Choose  the  best  

Consider  all  possible  merges…   …  

Choose  the  best  

Consider  all  possible  merges…  

Choose  the  best  …  

But  how  do  we  compute  distances  between  clusters  rather  than  objects?  

Page 19: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

CompuEng  distance  between  clusters:  Single  Link  

•  cluster  distance  =  distance  of  two  closest  members  in  each  class  

- Potentially long and skinny clusters

Page 20: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

CompuEng  distance  between  clusters:    Complete  Link  

•  cluster  distance  =  distance  of  two  farthest  members  

+ tight clusters

Page 21: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

CompuEng  distance  between  clusters:  Average  Link  

•  cluster  distance  =  average  distance  of  all  pairs  

the most widely used measure

Robust against noise

Page 22: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Example:  single  link  

1  

2  

3  

4  

5  

Page 23: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Example:  single  link  

1  

2  

3  

4  

5  

Page 24: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Example:  single  link  

1  

2  

3  

4  

5  

Page 25: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Example:  single  link  

1  

2  

3  

4  

5  

Page 26: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Average  linkage  

Single  linkage  

Height  represents  distance  between  objects  /  clusters  

Page 27: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Summary  of  Hierarchal  Clustering  Methods  

•  No  need  to  specify  the  number  of  clusters  in  advance.    

•   Hierarchical  structure  maps  nicely  onto  human  intuiNon  for  some  domains  

•   They  do  not  scale  well:  Nme  complexity  of  at  least  O(n2),  where  n  is  the  number  of  total  objects.  

•   Like  any  heurisNc  search  algorithms,  local  opNma  are  a  problem.  

•   InterpretaNon  of  results  is  (very)  subjecNve.    

Page 28: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

In  some  cases  we  can  determine  the  “correct”  number  of  clusters.  However,  things  are  rarely  this  clear  cut,  unfortunately.  

But  what  are  the  clusters?  

Page 29: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Outlier  

The  single  isolated  branch  is  suggesNve  of  a  data  point  that  is  very  different  to  all  others  

Page 30: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

Example:  clustering  genes  

•  Microarrays  measure  the  acNviNes  of  all  genes  in  different  condiNons  

•  Group  genes  that  perform  the  same  funcNon  

•  Clustering  genes  can  help  determine  new  funcNons  for  unknown  genes  

•  An  early  “killer  applicaNon”  in  this  area  –  The  most  cited  (>12,000)  paper  in  PNAS!  

Page 31: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

ParEEoning  Algorithms  

•  ParNNoning  method:  Construct  a  parNNon  of  n  objects  into  a  set  of  K  clusters  –  Given:  a  set  of  objects  and  the  number  K  

–  Find:  a  parNNon  of  K  clusters  that  opNmizes  the  chosen  parNNoning  criterion  

•  Globally  opNmal:  exhausNvely  enumerate  all  parNNons  

•  EffecNve  heurisNc  methods:  K-­‐means  algorithms  

Page 32: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

•  Nonhierarchical,  each  instance  is  placed  in  exactly  one  of  K  non-­‐overlapping  clusters.  

•  Since  the  output  is  only  one  set  of  clusters,  the  user  has  to  specify  the  desired  number  of  clusters  K.  

Page 33: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

k1  

k2  k3  

Re-­‐assign  and  move  centers,  unNl  no  objects  change  their  cluster  membership  

Page 34: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

K-­‐Means  Clustering  Algorithm  

Find  the  cluster  means  

Re-­‐assign  samples  to  clusters  

Iterate  unNl  convergence  

Page 35: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

K-­‐means  Clustering:  Step  1  

Page 36: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

K-­‐means  Clustering:  Step  2  

Page 37: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

K-­‐means  Clustering:  Step  3  

Page 38: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

K-­‐means  Clustering:  Step  4  

Page 39: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

K-­‐means  Clustering:  Step  5  

Page 40: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

K-­‐Means  

Algorithm  

1.  Decide  on  a  value  for  K.  

2.  IniNalize  the  K  cluster  centers  randomly  if  necessary.  

3.  Decide  the  class  memberships  of  the  N  objects  by  assigning  them  to  the  nearest  cluster  centroids  (aka  the  center  of  gravity  or  mean)  

4.  Re-­‐esNmate  the  K  cluster  centers,  by  assuming  the  memberships  found  above  are  correct.  

5.  If  none  of  the  N  objects  changed  membership  in  the  last  iteraNon,  exit.  Otherwise  go  to  3.  

Page 41: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

10  

1   2   3   4   5   6   7   8   9   10  

1  

2  

3  

4  

5  

6  

7  

8  

9  

Page 42: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

1   2   3   4   5   6   7   8   9   10  

When  k  =  1,  the  objecNve  funcNon  is  873.0  

Obj =k∑ ||

i∑ xi − µk ||2

2

Page 43: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

1   2   3   4   5   6   7   8   9   10  

When  k  =  2,  the  objecNve  funcNon  is  173.1  

Obj =k∑ ||

i∑ xi − µk ||2

2

Page 44: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

1   2   3   4   5   6   7   8   9   10  

When  k  =  3,  the  objecNve  funcNon  is  133.6  

Obj =k∑ ||

i∑ xi − µk ||2

2

Page 45: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

0.00E+00  1.00E+02  2.00E+02  3.00E+02  4.00E+02  5.00E+02  6.00E+02  7.00E+02  8.00E+02  9.00E+02  1.00E+03  

1   2   3   4   5   6  

We  can  plot  the  objecNve  funcNon  values  for  k  equals  1  to  6…  

The  abrupt  change  at  k  =  2,  is  highly  suggesNve  of  two  clusters  in  the  data.  This  technique  for  determining  the  number  of  clusters  is  known  as  “knee  finding”  or  “elbow  finding”.  

Note  that  the  results  are  not  always  as  clear  cut  as  in  this  toy  example  

k  

ObjecNve  FuncNo

n  

Page 46: Clustering:HierarchicalClusteringandK3 Means,Clustering,10601b/slides/clustering.pdfClustering:HierarchicalClusteringandK3 Means,Clustering, Machine(Learning(10.601B(SeyoungKim(Many(of(these(slides(are(derived(from(William(Cohen,(Ziv(Bar.Joseph,(Eric

What  you  should  know  

•  Clustering  as  an  unsupervised  learning  method  

•  What  are  the  different  types  of  clustering  algorithms  •  What  are  the  assumpNons  we  are  making  for  each,  and  what  

can  we  get  from  them  

•  Unsolved  issues:  number  of  clusters,  iniNalizaNon,  etc.