June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner...

Post on 31-Mar-2015

212 views 0 download

Tags:

Transcript of June 11, 2005Interface / CSNA Meeting St. Louis1 Estimating the Cluster Tree of a Density Werner...

June 11, 2005 Interface / CSNA Meeting St. Louis 1

Estimating the Cluster Treeof a Density

Werner StuetzleRebecca NugentDepartment of StatisticsUniversity of Washington

June 11, 2005 Interface / CSNA Meeting St. Louis 2

June 11, 2005 Interface / CSNA Meeting St. Louis 3

June 11, 2005 Interface / CSNA Meeting St. Louis 4

Groups K-means with k = 2

June 11, 2005 Interface / CSNA Meeting St. Louis 5

June 11, 2005 Interface / CSNA Meeting St. Louis 6

• Detect that there are 5 or 6 distinct groups.

• Assign group labels to observations.

40 45 50 55

74

76

78

80

82

84

June 11, 2005 Interface / CSNA Meeting St. Louis 7

June 11, 2005 Interface / CSNA Meeting St. Louis 8

June 11, 2005 Interface / CSNA Meeting St. Louis 9

-2 0 2 4 6 8 100

10

02

00

30

0

Feature histogram

June 11, 2005 Interface / CSNA Meeting St. Louis 10

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

(a) (b) (c)

(d) (e) (f)

June 11, 2005 Interface / CSNA Meeting St. Louis 11

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

(a) (b) (c)

(d) (e) (f)

June 11, 2005 Interface / CSNA Meeting St. Louis 12

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 13

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 14

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 15

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 16

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 17

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 18

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 19

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 20

0 1 2 3 4

-10

12

34

rs = 1

rs = 5

rs = 2

June 11, 2005 Interface / CSNA Meeting St. Louis 21

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 22

Note large tree fragments in bottom left panel

-2 0 2 4 6

-3-2

-10

12

3

Weakly bimodal data

-2 0 2 4 6

-3-2

-10

12

3

MST for weakly bimodal data

-2 0 2 4 6

-3-2

-10

12

3

MST after removal of longest edges

0 20 40 60 80

01

02

03

04

0

runts .bimodal2[runts.bimodal2 > 1]

Histogram of runt sizes

June 11, 2005 Interface / CSNA Meeting St. Louis 23

-2 -1 0 1 2 3

-3-2

-10

12

3

Unimodal data

-2 -1 0 1 2 3

-3-2

-10

12

3

MST for unimodal data

-2 -1 0 1 2 3

-3-2

-10

12

3

MST after removal of longest edges

0 20 40 60 80

01

02

03

04

0

runts .unimodal[runts.unimodal > 1]

Histogram of runt sizes

June 11, 2005 Interface / CSNA Meeting St. Louis 24

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 25

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 26

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 27

leve

l

10

12

14

16

A 8

A 7 A 9

A 8

A 2

A 3A 3

A 5A 6

June 11, 2005 Interface / CSNA Meeting St. Louis 28

June 11, 2005 Interface / CSNA Meeting St. Louis 29

cc1

cc2

-4 -2 0 2 4 6

-4-2

02

4

11

1

111

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

22

22

22

2

2

2

2

2

2

2

2

2222

2 2

2 22

2

2

2

2

222

2

2

2

2222

2

2

22

2

2

2

2 22 22

22

2

2

2

2

2

33

3

3

33

3

3

3

3

33

3

3

3

3

3

33

33

3

3

3

3

3

3

3

33

33 3

3

33

3

3

3

3

33

333

33

3

3

3

3

3

3 33

3

33

3

33

3 3

3

3

3

3333

3 33

3

3

3

3

33

3

33

3

33

33

3

3

3

33

3 33

3

3

333

33

33

3

33

3

3

3

3

3

3

3 33

3

33

3

3

3

3

33

3

3

333

3

3

3

3 33

3

3

3

33

33

33 3

33

3

33

3

3

3

3

33

33

33

33

3

3

3

3

33

3

33

3

3

33

33

3

33

3

3

3

33

33

3 3 3

3

333

3

3

3

3

3

3 3

3

3

3

3

4

44

4

4

44

4

4

4

4

44

44

4

4

4

4

4

4

4

44

4

4

44

4

44

4

44

4 4

Areas 1-4, projected on crimcords

June 11, 2005 Interface / CSNA Meeting St. Louis 30

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 31

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 32

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 33

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 34

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 35

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 36

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 37

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 38

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 39

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 40

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 41

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 42

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 43

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 44

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 45

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 46

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 47

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 48

0 1 2 3 4

-10

12

34

rs = 1

rs = 5

rs = 2

June 11, 2005 Interface / CSNA Meeting St. Louis 49

Runt Pruning algorithm:

generate_cluster_tree_node (mst, runt_size_threshold) { node = new_cluster_tree_node node.leftson = node.rightson = NULL node.obs = leaves (mst) cut_edge = longest_edge_with_large_runt_size (mst, runt_size_threshold) if (cut_edge) { node.leftson = generate_cluster_tree_node (left_subtree (cut_edge), runt_size_threshold) node.rightson = generate_cluster_tree_node (right_subtree (cut_edge), runt_size_threshold) } return (node)}

0 1 2 3 4

-10

12

34

rs = 1

rs = 5

rs = 2

June 11, 2005 Interface / CSNA Meeting St. Louis 50

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 51

Note large tree fragments in bottom left panel

-2 0 2 4 6

-3-2

-10

12

3

Weakly bimodal data

-2 0 2 4 6

-3-2

-10

12

3

MST for weakly bimodal data

-2 0 2 4 6

-3-2

-10

12

3

MST after removal of longest edges

0 20 40 60 80

01

02

03

04

0

runts .bimodal2[runts.bimodal2 > 1]

Histogram of runt sizes

June 11, 2005 Interface / CSNA Meeting St. Louis 52

-2 -1 0 1 2 3

-3-2

-10

12

3

Unimodal data

-2 -1 0 1 2 3

-3-2

-10

12

3

MST for unimodal data

-2 -1 0 1 2 3

-3-2

-10

12

3

MST after removal of longest edges

0 20 40 60 80

01

02

03

04

0

runts .unimodal[runts.unimodal > 1]

Histogram of runt sizes

June 11, 2005 Interface / CSNA Meeting St. Louis 53

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 54

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 55

leve

l

10

12

14

16

A 8

A 7 A 9

A 8

A 2

A 3A 3

A 5A 6

June 11, 2005 Interface / CSNA Meeting St. Louis 56

June 11, 2005 Interface / CSNA Meeting St. Louis 57

cc1

cc2

-4 -2 0 2 4 6

-4-2

02

4

11

1

111

1

1

11

11

1

1

1

1

1

1

1

1

1

1

1

1

1

22

22

22

2

2

2

2

2

2

2

2

2222

2 2

2 22

2

2

2

2

222

2

2

2

2222

2

2

22

2

2

2

2 22 22

22

2

2

2

2

2

33

3

3

33

3

3

3

3

33

3

3

3

3

3

33

33

3

3

3

3

3

3

3

33

33 3

3

33

3

3

3

3

33

333

33

3

3

3

3

3

3 33

3

33

3

33

3 3

3

3

3

3333

3 33

3

3

3

3

33

3

33

3

33

33

3

3

3

33

3 33

3

3

333

33

33

3

33

3

3

3

3

3

3

3 33

3

33

3

3

3

3

33

3

3

333

3

3

3

3 33

3

3

3

33

33

33 3

33

3

33

3

3

3

3

33

33

33

33

3

3

3

3

33

3

33

3

3

33

33

3

33

3

3

3

33

33

3 3 3

3

333

3

3

3

3

3

3 3

3

3

3

3

4

44

4

4

44

4

4

4

4

44

44

4

4

4

4

4

4

4

44

4

4

44

4

44

4

44

4 4

Areas 1-4, projected on crimcords

June 11, 2005 Interface / CSNA Meeting St. Louis 58

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 59

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 60

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 61

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 62

June 11, 2005 Interface / CSNA Meeting St. Louis 63

ClusteringasastatisticalproblemAssumethatfeaturevectorsx1;:::;xniid»p(x).DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.ThereforethetotalcollectionofsetscanbearrangedintoamodetreeLeavesofmodetreecorrespondtonodesofp(x).

Considerfeaturevectorsx1;:::;xnasarandomsamplefromsome(innite)population.Letp(x)bethepopulationdensity.(Think"histogramwithinnitesimallysmallbins")DenelevelsetL(;p)byL(;p)=fxjp(x)gLetL1(;p);L2(;p);:::betheconnectedcomponentsofL(;p).If2>1thenforanyi;jeither²Li(2;p)½Lj(1;p),or²theyaredisjoint.Thereforethetotalcollectionofsetscanbearrangedintoatree,theclustertreeofthedensity.Leavesofclustertreecorrespondtomodesofp(x).

June 11, 2005 Interface / CSNA Meeting St. Louis 64

June 11, 2005 Interface / CSNA Meeting St. Louis 65

June 11, 2005 Interface / CSNA Meeting St. Louis 66

June 11, 2005 Interface / CSNA Meeting St. Louis 67

June 11, 2005 Interface / CSNA Meeting St. Louis 68

-10 -5 0 5 10

-50

51

0

PC1

PC

2

June 11, 2005 Interface / CSNA Meeting St. Louis 69

-10 -5 0 5 10

-50

51

0

PC1

PC

2

-10 -5 0 5 10

-50

51

0

PC1

PC

2

June 11, 2005 Interface / CSNA Meeting St. Louis 70

June 11, 2005 Interface / CSNA Meeting St. Louis 71

Runt Pruning algorithm:

generate_cluster_tree_node (mst, runt_size_threshold) { node = new_cluster_tree_node node.leftson = node.rightson = NULL node.obs = leaves (mst) cut_edge = longest_edge_with_large_runt_size (mst, runt_size_threshold) if (cut_edge) { node.leftson = generate_cluster_tree_node (left_subtree (cut_edge), runt_size_threshold) node.rightson = generate_cluster_tree_node (right_subtree (cut_edge), runt_size_threshold) } return (node)}

0 1 2 3 4

-10

12

34

rs = 1

rs = 5

rs = 2