Post on 31-Mar-2015
June 11, 2005 Interface / CSNA Meeting St. Louis 1
Estimating the Cluster Treeof a Density
Werner StuetzleRebecca NugentDepartment of StatisticsUniversity of Washington
June 11, 2005 Interface / CSNA Meeting St. Louis 2
June 11, 2005 Interface / CSNA Meeting St. Louis 3
June 11, 2005 Interface / CSNA Meeting St. Louis 4
Groups K-means with k = 2
June 11, 2005 Interface / CSNA Meeting St. Louis 5
June 11, 2005 Interface / CSNA Meeting St. Louis 6
• Detect that there are 5 or 6 distinct groups.
• Assign group labels to observations.
40 45 50 55
74
76
78
80
82
84
June 11, 2005 Interface / CSNA Meeting St. Louis 7
June 11, 2005 Interface / CSNA Meeting St. Louis 8
June 11, 2005 Interface / CSNA Meeting St. Louis 9
-2 0 2 4 6 8 100
10
02
00
30
0
Feature histogram
June 11, 2005 Interface / CSNA Meeting St. Louis 10
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
(a) (b) (c)
(d) (e) (f)
June 11, 2005 Interface / CSNA Meeting St. Louis 11
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
(a) (b) (c)
(d) (e) (f)
June 11, 2005 Interface / CSNA Meeting St. Louis 12
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 13
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 14
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 15
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 16
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 17
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 18
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 19
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 20
0 1 2 3 4
-10
12
34
rs = 1
rs = 5
rs = 2
June 11, 2005 Interface / CSNA Meeting St. Louis 21
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 22
Note large tree fragments in bottom left panel
-2 0 2 4 6
-3-2
-10
12
3
Weakly bimodal data
-2 0 2 4 6
-3-2
-10
12
3
MST for weakly bimodal data
-2 0 2 4 6
-3-2
-10
12
3
MST after removal of longest edges
0 20 40 60 80
01
02
03
04
0
runts .bimodal2[runts.bimodal2 > 1]
Histogram of runt sizes
June 11, 2005 Interface / CSNA Meeting St. Louis 23
-2 -1 0 1 2 3
-3-2
-10
12
3
Unimodal data
-2 -1 0 1 2 3
-3-2
-10
12
3
MST for unimodal data
-2 -1 0 1 2 3
-3-2
-10
12
3
MST after removal of longest edges
0 20 40 60 80
01
02
03
04
0
runts .unimodal[runts.unimodal > 1]
Histogram of runt sizes
June 11, 2005 Interface / CSNA Meeting St. Louis 24
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 25
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 26
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 27
leve
l
10
12
14
16
A 8
A 7 A 9
A 8
A 2
A 3A 3
A 5A 6
June 11, 2005 Interface / CSNA Meeting St. Louis 28
June 11, 2005 Interface / CSNA Meeting St. Louis 29
cc1
cc2
-4 -2 0 2 4 6
-4-2
02
4
11
1
111
1
1
11
11
1
1
1
1
1
1
1
1
1
1
1
1
1
22
22
22
2
2
2
2
2
2
2
2
2222
2 2
2 22
2
2
2
2
222
2
2
2
2222
2
2
22
2
2
2
2 22 22
22
2
2
2
2
2
33
3
3
33
3
3
3
3
33
3
3
3
3
3
33
33
3
3
3
3
3
3
3
33
33 3
3
33
3
3
3
3
33
333
33
3
3
3
3
3
3 33
3
33
3
33
3 3
3
3
3
3333
3 33
3
3
3
3
33
3
33
3
33
33
3
3
3
33
3 33
3
3
333
33
33
3
33
3
3
3
3
3
3
3 33
3
33
3
3
3
3
33
3
3
333
3
3
3
3 33
3
3
3
33
33
33 3
33
3
33
3
3
3
3
33
33
33
33
3
3
3
3
33
3
33
3
3
33
33
3
33
3
3
3
33
33
3 3 3
3
333
3
3
3
3
3
3 3
3
3
3
3
4
44
4
4
44
4
4
4
4
44
44
4
4
4
4
4
4
4
44
4
4
44
4
44
4
44
4 4
Areas 1-4, projected on crimcords
June 11, 2005 Interface / CSNA Meeting St. Louis 30
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 31
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 32
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 33
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 34
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 35
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 36
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 37
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 38
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 39
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 40
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 41
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 42
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 43
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 44
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 45
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 46
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 47
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 48
0 1 2 3 4
-10
12
34
rs = 1
rs = 5
rs = 2
June 11, 2005 Interface / CSNA Meeting St. Louis 49
Runt Pruning algorithm:
generate_cluster_tree_node (mst, runt_size_threshold) { node = new_cluster_tree_node node.leftson = node.rightson = NULL node.obs = leaves (mst) cut_edge = longest_edge_with_large_runt_size (mst, runt_size_threshold) if (cut_edge) { node.leftson = generate_cluster_tree_node (left_subtree (cut_edge), runt_size_threshold) node.rightson = generate_cluster_tree_node (right_subtree (cut_edge), runt_size_threshold) } return (node)}
0 1 2 3 4
-10
12
34
rs = 1
rs = 5
rs = 2
June 11, 2005 Interface / CSNA Meeting St. Louis 50
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 51
Note large tree fragments in bottom left panel
-2 0 2 4 6
-3-2
-10
12
3
Weakly bimodal data
-2 0 2 4 6
-3-2
-10
12
3
MST for weakly bimodal data
-2 0 2 4 6
-3-2
-10
12
3
MST after removal of longest edges
0 20 40 60 80
01
02
03
04
0
runts .bimodal2[runts.bimodal2 > 1]
Histogram of runt sizes
June 11, 2005 Interface / CSNA Meeting St. Louis 52
-2 -1 0 1 2 3
-3-2
-10
12
3
Unimodal data
-2 -1 0 1 2 3
-3-2
-10
12
3
MST for unimodal data
-2 -1 0 1 2 3
-3-2
-10
12
3
MST after removal of longest edges
0 20 40 60 80
01
02
03
04
0
runts .unimodal[runts.unimodal > 1]
Histogram of runt sizes
June 11, 2005 Interface / CSNA Meeting St. Louis 53
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 54
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 55
leve
l
10
12
14
16
A 8
A 7 A 9
A 8
A 2
A 3A 3
A 5A 6
June 11, 2005 Interface / CSNA Meeting St. Louis 56
June 11, 2005 Interface / CSNA Meeting St. Louis 57
cc1
cc2
-4 -2 0 2 4 6
-4-2
02
4
11
1
111
1
1
11
11
1
1
1
1
1
1
1
1
1
1
1
1
1
22
22
22
2
2
2
2
2
2
2
2
2222
2 2
2 22
2
2
2
2
222
2
2
2
2222
2
2
22
2
2
2
2 22 22
22
2
2
2
2
2
33
3
3
33
3
3
3
3
33
3
3
3
3
3
33
33
3
3
3
3
3
3
3
33
33 3
3
33
3
3
3
3
33
333
33
3
3
3
3
3
3 33
3
33
3
33
3 3
3
3
3
3333
3 33
3
3
3
3
33
3
33
3
33
33
3
3
3
33
3 33
3
3
333
33
33
3
33
3
3
3
3
3
3
3 33
3
33
3
3
3
3
33
3
3
333
3
3
3
3 33
3
3
3
33
33
33 3
33
3
33
3
3
3
3
33
33
33
33
3
3
3
3
33
3
33
3
3
33
33
3
33
3
3
3
33
33
3 3 3
3
333
3
3
3
3
3
3 3
3
3
3
3
4
44
4
4
44
4
4
4
4
44
44
4
4
4
4
4
4
4
44
4
4
44
4
44
4
44
4 4
Areas 1-4, projected on crimcords
June 11, 2005 Interface / CSNA Meeting St. Louis 58
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 59
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 60
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 61
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 62
June 11, 2005 Interface / CSNA Meeting St. Louis 63
ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).
Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).
June 11, 2005 Interface / CSNA Meeting St. Louis 64
June 11, 2005 Interface / CSNA Meeting St. Louis 65
June 11, 2005 Interface / CSNA Meeting St. Louis 66
June 11, 2005 Interface / CSNA Meeting St. Louis 67
June 11, 2005 Interface / CSNA Meeting St. Louis 68
-10 -5 0 5 10
-50
51
0
PC1
PC
2
June 11, 2005 Interface / CSNA Meeting St. Louis 69
-10 -5 0 5 10
-50
51
0
PC1
PC
2
-10 -5 0 5 10
-50
51
0
PC1
PC
2
June 11, 2005 Interface / CSNA Meeting St. Louis 70
June 11, 2005 Interface / CSNA Meeting St. Louis 71
Runt Pruning algorithm:
generate_cluster_tree_node (mst, runt_size_threshold) { node = new_cluster_tree_node node.leftson = node.rightson = NULL node.obs = leaves (mst) cut_edge = longest_edge_with_large_runt_size (mst, runt_size_threshold) if (cut_edge) { node.leftson = generate_cluster_tree_node (left_subtree (cut_edge), runt_size_threshold) node.rightson = generate_cluster_tree_node (right_subtree (cut_edge), runt_size_threshold) } return (node)}
0 1 2 3 4
-10
12
34
rs = 1
rs = 5
rs = 2