David M. Blei - Computer Science Department at Princeton University

83
Hierarchical clustering David M. Blei COS424 Princeton University February 28, 2008 D. Blei Clustering 02 1 / 21

Transcript of David M. Blei - Computer Science Department at Princeton University

Page 1: David M. Blei - Computer Science Department at Princeton University

Hierarchical clustering

David M. Blei

COS424Princeton University

February 28, 2008

D. Blei Clustering 02 1 / 21

Page 2: David M. Blei - Computer Science Department at Princeton University

Hierarchical clustering

• Hierarchical clustering is a widely used data analysis tool.

• The idea is to build a binary tree of the data that successivelymerges similar groups of points

• Visualizing this tree provides a useful summary of the data

D. Blei Clustering 02 2 / 21

Page 3: David M. Blei - Computer Science Department at Princeton University

Hierarchical clustering

• Hierarchical clustering is a widely used data analysis tool.

• The idea is to build a binary tree of the data that successivelymerges similar groups of points

• Visualizing this tree provides a useful summary of the data

D. Blei Clustering 02 2 / 21

Page 4: David M. Blei - Computer Science Department at Princeton University

Hierarchical clustering

• Hierarchical clustering is a widely used data analysis tool.

• The idea is to build a binary tree of the data that successivelymerges similar groups of points

• Visualizing this tree provides a useful summary of the data

D. Blei Clustering 02 2 / 21

Page 5: David M. Blei - Computer Science Department at Princeton University

Hierarchical clusering vs. k-means

• Recall that k-means or k-medoids requires

• A number of clusters k• An initial assignment of data to clusters• A distance measure between data d(xn, xm)

• Hierarchical clustering only requires a measure of similarity betweengroups of data points.

D. Blei Clustering 02 3 / 21

Page 6: David M. Blei - Computer Science Department at Princeton University

Hierarchical clusering vs. k-means

• Recall that k-means or k-medoids requires

• A number of clusters k

• An initial assignment of data to clusters• A distance measure between data d(xn, xm)

• Hierarchical clustering only requires a measure of similarity betweengroups of data points.

D. Blei Clustering 02 3 / 21

Page 7: David M. Blei - Computer Science Department at Princeton University

Hierarchical clusering vs. k-means

• Recall that k-means or k-medoids requires

• A number of clusters k• An initial assignment of data to clusters

• A distance measure between data d(xn, xm)

• Hierarchical clustering only requires a measure of similarity betweengroups of data points.

D. Blei Clustering 02 3 / 21

Page 8: David M. Blei - Computer Science Department at Princeton University

Hierarchical clusering vs. k-means

• Recall that k-means or k-medoids requires

• A number of clusters k• An initial assignment of data to clusters• A distance measure between data d(xn, xm)

• Hierarchical clustering only requires a measure of similarity betweengroups of data points.

D. Blei Clustering 02 3 / 21

Page 9: David M. Blei - Computer Science Department at Princeton University

Hierarchical clusering vs. k-means

• Recall that k-means or k-medoids requires

• A number of clusters k• An initial assignment of data to clusters• A distance measure between data d(xn, xm)

• Hierarchical clustering only requires a measure of similarity betweengroups of data points.

D. Blei Clustering 02 3 / 21

Page 10: David M. Blei - Computer Science Department at Princeton University

Agglomerative clustering

• We will talk about agglomerative clustering.

• Algorithm:

1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster

D. Blei Clustering 02 4 / 21

Page 11: David M. Blei - Computer Science Department at Princeton University

Agglomerative clustering

• We will talk about agglomerative clustering.

• Algorithm:

1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster

D. Blei Clustering 02 4 / 21

Page 12: David M. Blei - Computer Science Department at Princeton University

Agglomerative clustering

• We will talk about agglomerative clustering.

• Algorithm:

1 Place each data point into its own singleton group

2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster

D. Blei Clustering 02 4 / 21

Page 13: David M. Blei - Computer Science Department at Princeton University

Agglomerative clustering

• We will talk about agglomerative clustering.

• Algorithm:

1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups

3 Until: all the data are merged into a single cluster

D. Blei Clustering 02 4 / 21

Page 14: David M. Blei - Computer Science Department at Princeton University

Agglomerative clustering

• We will talk about agglomerative clustering.

• Algorithm:

1 Place each data point into its own singleton group2 Repeat: iteratively merge the two closest groups3 Until: all the data are merged into a single cluster

D. Blei Clustering 02 4 / 21

Page 15: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

Data

D. Blei Clustering 02 5 / 21

Page 16: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 001

V1

V2

D. Blei Clustering 02 5 / 21

Page 17: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 002

V1

V2

D. Blei Clustering 02 5 / 21

Page 18: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 003

V1

V2

D. Blei Clustering 02 5 / 21

Page 19: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 004

V1

V2

D. Blei Clustering 02 5 / 21

Page 20: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 005

V1

V2

D. Blei Clustering 02 5 / 21

Page 21: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 006

V1

V2

D. Blei Clustering 02 5 / 21

Page 22: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 007

V1

V2

D. Blei Clustering 02 5 / 21

Page 23: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 008

V1

V2

D. Blei Clustering 02 5 / 21

Page 24: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 009

V1

V2

D. Blei Clustering 02 5 / 21

Page 25: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 010

V1

V2

D. Blei Clustering 02 5 / 21

Page 26: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 011

V1

V2

D. Blei Clustering 02 5 / 21

Page 27: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 012

V1

V2

D. Blei Clustering 02 5 / 21

Page 28: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 013

V1

V2

D. Blei Clustering 02 5 / 21

Page 29: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 014

V1

V2

D. Blei Clustering 02 5 / 21

Page 30: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 015

V1

V2

D. Blei Clustering 02 5 / 21

Page 31: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 016

V1

V2

D. Blei Clustering 02 5 / 21

Page 32: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 017

V1

V2

D. Blei Clustering 02 5 / 21

Page 33: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 018

V1

V2

D. Blei Clustering 02 5 / 21

Page 34: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 019

V1

V2

D. Blei Clustering 02 5 / 21

Page 35: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 020

V1

V2

D. Blei Clustering 02 5 / 21

Page 36: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 021

V1

V2

D. Blei Clustering 02 5 / 21

Page 37: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 022

V1

V2

D. Blei Clustering 02 5 / 21

Page 38: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 023

V1

V2

D. Blei Clustering 02 5 / 21

Page 39: David M. Blei - Computer Science Department at Princeton University

Example

● ●

0 20 40 60 80

−20

020

4060

80

iteration 024

V1

V2

D. Blei Clustering 02 5 / 21

Page 40: David M. Blei - Computer Science Department at Princeton University

Agglomerative clustering

• Each level of the resulting tree is a segmentation of the data

• The algorithm results in a sequence of groupings

• It is up to the user to choose a ”natural” clustering from thissequence

D. Blei Clustering 02 6 / 21

Page 41: David M. Blei - Computer Science Department at Princeton University

Agglomerative clustering

• Each level of the resulting tree is a segmentation of the data

• The algorithm results in a sequence of groupings

• It is up to the user to choose a ”natural” clustering from thissequence

D. Blei Clustering 02 6 / 21

Page 42: David M. Blei - Computer Science Department at Princeton University

Agglomerative clustering

• Each level of the resulting tree is a segmentation of the data

• The algorithm results in a sequence of groupings

• It is up to the user to choose a ”natural” clustering from thissequence

D. Blei Clustering 02 6 / 21

Page 43: David M. Blei - Computer Science Department at Princeton University

Dendrogram

• Agglomerative clustering is monotonic

• The similarity between merged clusters is monotone decreasingwith the level of the merge.

• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups

• Provides an interpretable visualization of the algorithm and data

• Useful summarization tool, part of why hierarchical clustering ispopular

D. Blei Clustering 02 7 / 21

Page 44: David M. Blei - Computer Science Department at Princeton University

Dendrogram

• Agglomerative clustering is monotonic

• The similarity between merged clusters is monotone decreasingwith the level of the merge.

• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups

• Provides an interpretable visualization of the algorithm and data

• Useful summarization tool, part of why hierarchical clustering ispopular

D. Blei Clustering 02 7 / 21

Page 45: David M. Blei - Computer Science Department at Princeton University

Dendrogram

• Agglomerative clustering is monotonic

• The similarity between merged clusters is monotone decreasingwith the level of the merge.

• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups

• Provides an interpretable visualization of the algorithm and data

• Useful summarization tool, part of why hierarchical clustering ispopular

D. Blei Clustering 02 7 / 21

Page 46: David M. Blei - Computer Science Department at Princeton University

Dendrogram

• Agglomerative clustering is monotonic

• The similarity between merged clusters is monotone decreasingwith the level of the merge.

• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups

• Provides an interpretable visualization of the algorithm and data

• Useful summarization tool, part of why hierarchical clustering ispopular

D. Blei Clustering 02 7 / 21

Page 47: David M. Blei - Computer Science Department at Princeton University

Dendrogram

• Agglomerative clustering is monotonic

• The similarity between merged clusters is monotone decreasingwith the level of the merge.

• Dendrogram: Plot each merge at the (negative) similarity betweenthe two merged groups

• Provides an interpretable visualization of the algorithm and data

• Useful summarization tool, part of why hierarchical clustering ispopular

D. Blei Clustering 02 7 / 21

Page 48: David M. Blei - Computer Science Department at Princeton University

Dendrogram of example data

2904

2432

2641

2489

2278

2905

2085

2959 27

4327

9723

1422

8224

2520

2414

5517

2316

2216

16 244

851

220

81 184 25

247

7

020

4060

8010

012

0

Cluster Dendrogram

hclust (*, "complete")dist(x)

Hei

ght

Groups that merge at high values relative to the merger values of theirsubgroups are candidates for natural clusters. (Tibshirani et al., 2001)

D. Blei Clustering 02 8 / 21

Page 49: David M. Blei - Computer Science Department at Princeton University

Group similarity

• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.

• Three most popular choices

• Single-linkage: the similarity of the closest pair

dSL(G ,H) = mini∈G ,j∈H

di ,j

• Complete linkage: the similarity of the furthest pair

dCL(G ,H) = maxi∈G ,j∈H

di ,j

• Group average: the average similarity between groups

dGA =1

NGNH

∑i∈G

∑j∈H

di ,j

D. Blei Clustering 02 9 / 21

Page 50: David M. Blei - Computer Science Department at Princeton University

Group similarity

• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.

• Three most popular choices

• Single-linkage: the similarity of the closest pair

dSL(G ,H) = mini∈G ,j∈H

di ,j

• Complete linkage: the similarity of the furthest pair

dCL(G ,H) = maxi∈G ,j∈H

di ,j

• Group average: the average similarity between groups

dGA =1

NGNH

∑i∈G

∑j∈H

di ,j

D. Blei Clustering 02 9 / 21

Page 51: David M. Blei - Computer Science Department at Princeton University

Group similarity

• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.

• Three most popular choices

• Single-linkage: the similarity of the closest pair

dSL(G ,H) = mini∈G ,j∈H

di ,j

• Complete linkage: the similarity of the furthest pair

dCL(G ,H) = maxi∈G ,j∈H

di ,j

• Group average: the average similarity between groups

dGA =1

NGNH

∑i∈G

∑j∈H

di ,j

D. Blei Clustering 02 9 / 21

Page 52: David M. Blei - Computer Science Department at Princeton University

Group similarity

• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.

• Three most popular choices

• Single-linkage: the similarity of the closest pair

dSL(G ,H) = mini∈G ,j∈H

di ,j

• Complete linkage: the similarity of the furthest pair

dCL(G ,H) = maxi∈G ,j∈H

di ,j

• Group average: the average similarity between groups

dGA =1

NGNH

∑i∈G

∑j∈H

di ,j

D. Blei Clustering 02 9 / 21

Page 53: David M. Blei - Computer Science Department at Princeton University

Group similarity

• Given a distance measure between points, the user has many choicesfor how to define intergroup similarity.

• Three most popular choices

• Single-linkage: the similarity of the closest pair

dSL(G ,H) = mini∈G ,j∈H

di ,j

• Complete linkage: the similarity of the furthest pair

dCL(G ,H) = maxi∈G ,j∈H

di ,j

• Group average: the average similarity between groups

dGA =1

NGNH

∑i∈G

∑j∈H

di ,j

D. Blei Clustering 02 9 / 21

Page 54: David M. Blei - Computer Science Department at Princeton University

Properties of intergroup similarity

• Single linkage can produce “chaining,” where a sequence of closeobservations in different groups cause early merges of those groups

• Complete linkage has the opposite problem. It might not mergeclose groups because of outlier members that are far apart.

• Group average represents a natural compromise, but depends on thescale of the similarities. Applying a monotone transformation to thesimilarities can change the results.

D. Blei Clustering 02 10 / 21

Page 55: David M. Blei - Computer Science Department at Princeton University

Properties of intergroup similarity

• Single linkage can produce “chaining,” where a sequence of closeobservations in different groups cause early merges of those groups

• Complete linkage has the opposite problem. It might not mergeclose groups because of outlier members that are far apart.

• Group average represents a natural compromise, but depends on thescale of the similarities. Applying a monotone transformation to thesimilarities can change the results.

D. Blei Clustering 02 10 / 21

Page 56: David M. Blei - Computer Science Department at Princeton University

Properties of intergroup similarity

• Single linkage can produce “chaining,” where a sequence of closeobservations in different groups cause early merges of those groups

• Complete linkage has the opposite problem. It might not mergeclose groups because of outlier members that are far apart.

• Group average represents a natural compromise, but depends on thescale of the similarities. Applying a monotone transformation to thesimilarities can change the results.

D. Blei Clustering 02 10 / 21

Page 57: David M. Blei - Computer Science Department at Princeton University

Caveats

• Hierarchical clustering should be treated with caution.

• Different decisions about group similarities can lead to vastlydifferent dendrograms.

• The algorithm imposes a hierarchical structure on the data, evendata for which such structure is not appropriate.

D. Blei Clustering 02 11 / 21

Page 58: David M. Blei - Computer Science Department at Princeton University

Caveats

• Hierarchical clustering should be treated with caution.

• Different decisions about group similarities can lead to vastlydifferent dendrograms.

• The algorithm imposes a hierarchical structure on the data, evendata for which such structure is not appropriate.

D. Blei Clustering 02 11 / 21

Page 59: David M. Blei - Computer Science Department at Princeton University

Caveats

• Hierarchical clustering should be treated with caution.

• Different decisions about group similarities can lead to vastlydifferent dendrograms.

• The algorithm imposes a hierarchical structure on the data, evendata for which such structure is not appropriate.

D. Blei Clustering 02 11 / 21

Page 60: David M. Blei - Computer Science Department at Princeton University

Examples

• “Repeated Observation of Breast Tumor Subtypes in IndependentGene Expression Data Sets” (Sorlie et al., 2003)

• Hierarchical clustering of gene expression data lead to new theories

• Later, theories tested in the lab.

D. Blei Clustering 02 12 / 21

Page 61: David M. Blei - Computer Science Department at Princeton University

Examples

• “Repeated Observation of Breast Tumor Subtypes in IndependentGene Expression Data Sets” (Sorlie et al., 2003)

• Hierarchical clustering of gene expression data lead to new theories

• Later, theories tested in the lab.

D. Blei Clustering 02 12 / 21

Page 62: David M. Blei - Computer Science Department at Princeton University

Examples

• “Repeated Observation of Breast Tumor Subtypes in IndependentGene Expression Data Sets” (Sorlie et al., 2003)

• Hierarchical clustering of gene expression data lead to new theories

• Later, theories tested in the lab.

D. Blei Clustering 02 12 / 21

Page 63: David M. Blei - Computer Science Department at Princeton University

D. Blei Clustering 02 13 / 21

Page 64: David M. Blei - Computer Science Department at Princeton University

Examples

• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)

• Roger de Piles rated 57 paintings along different dimensions.

• These authors cluster them using different methods, includinghierarchical clustering

• They discuss the different clusters. (They are art critics.)

D. Blei Clustering 02 14 / 21

Page 65: David M. Blei - Computer Science Department at Princeton University

Examples

• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)

• Roger de Piles rated 57 paintings along different dimensions.

• These authors cluster them using different methods, includinghierarchical clustering

• They discuss the different clusters. (They are art critics.)

D. Blei Clustering 02 14 / 21

Page 66: David M. Blei - Computer Science Department at Princeton University

Examples

• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)

• Roger de Piles rated 57 paintings along different dimensions.

• These authors cluster them using different methods, includinghierarchical clustering

• They discuss the different clusters. (They are art critics.)

D. Blei Clustering 02 14 / 21

Page 67: David M. Blei - Computer Science Department at Princeton University

Examples

• “The Balance of Roger de Piles” (Studdert-Kennedy and Davenport,1974)

• Roger de Piles rated 57 paintings along different dimensions.

• These authors cluster them using different methods, includinghierarchical clustering

• They discuss the different clusters. (They are art critics.)

D. Blei Clustering 02 14 / 21

Page 68: David M. Blei - Computer Science Department at Princeton University

Good: They are cautious. “The value of this analysis...will depend onany interesting speculation it may provoke.”

D. Blei Clustering 02 15 / 21

Page 69: David M. Blei - Computer Science Department at Princeton University

Examples

• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)

• Use hierarchical clustering on Austrailian universities

• Use features such as

• # of staff in different departments• entry scores• funding• evaluations

D. Blei Clustering 02 16 / 21

Page 70: David M. Blei - Computer Science Department at Princeton University

Examples

• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)

• Use hierarchical clustering on Austrailian universities

• Use features such as

• # of staff in different departments• entry scores• funding• evaluations

D. Blei Clustering 02 16 / 21

Page 71: David M. Blei - Computer Science Department at Princeton University

Examples

• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)

• Use hierarchical clustering on Austrailian universities

• Use features such as

• # of staff in different departments• entry scores• funding• evaluations

D. Blei Clustering 02 16 / 21

Page 72: David M. Blei - Computer Science Department at Princeton University

Examples

• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)

• Use hierarchical clustering on Austrailian universities

• Use features such as

• # of staff in different departments

• entry scores• funding• evaluations

D. Blei Clustering 02 16 / 21

Page 73: David M. Blei - Computer Science Department at Princeton University

Examples

• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)

• Use hierarchical clustering on Austrailian universities

• Use features such as

• # of staff in different departments• entry scores

• funding• evaluations

D. Blei Clustering 02 16 / 21

Page 74: David M. Blei - Computer Science Department at Princeton University

Examples

• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)

• Use hierarchical clustering on Austrailian universities

• Use features such as

• # of staff in different departments• entry scores• funding

• evaluations

D. Blei Clustering 02 16 / 21

Page 75: David M. Blei - Computer Science Department at Princeton University

Examples

• “Similarity Grouping of Australian Universities” (Stanley andReynlds, 1994)

• Use hierarchical clustering on Austrailian universities

• Use features such as

• # of staff in different departments• entry scores• funding• evaluations

D. Blei Clustering 02 16 / 21

Page 76: David M. Blei - Computer Science Department at Princeton University

One dendrogram

D. Blei Clustering 02 17 / 21

Page 77: David M. Blei - Computer Science Department at Princeton University

Another dendrogram—notice that it’s different

D. Blei Clustering 02 18 / 21

Page 78: David M. Blei - Computer Science Department at Princeton University

• Split values: They notice that there’s no kink and conclude thatthere is no cluster structure in Austrailian universities.

• Good: Cautious interpretation of clustering, analysis of clusteringbased on multiple subsets of the features.

• Bad: Their conclusions—we can’t cluster Australianuniversities—ignores all the algorithmic choices that were made.

D. Blei Clustering 02 19 / 21

Page 79: David M. Blei - Computer Science Department at Princeton University

Examples

• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)

• Data: weekly rates of return for stocks in twelve countries

• Run agglometerative clustering year by year

• Interpret the structure and examine stability over different timeperiods

D. Blei Clustering 02 20 / 21

Page 80: David M. Blei - Computer Science Department at Princeton University

Examples

• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)

• Data: weekly rates of return for stocks in twelve countries

• Run agglometerative clustering year by year

• Interpret the structure and examine stability over different timeperiods

D. Blei Clustering 02 20 / 21

Page 81: David M. Blei - Computer Science Department at Princeton University

Examples

• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)

• Data: weekly rates of return for stocks in twelve countries

• Run agglometerative clustering year by year

• Interpret the structure and examine stability over different timeperiods

D. Blei Clustering 02 20 / 21

Page 82: David M. Blei - Computer Science Department at Princeton University

Examples

• “Comovement of International Equity Markets: A TaxonomicApproach” (Panton et al., 1976)

• Data: weekly rates of return for stocks in twelve countries

• Run agglometerative clustering year by year

• Interpret the structure and examine stability over different timeperiods

D. Blei Clustering 02 20 / 21

Page 83: David M. Blei - Computer Science Department at Princeton University

Examples

Good: Cautious. “This study is only descriptive...A logical subsequentresearch area is to explain observed structural properties and the causesof structural change.”

D. Blei Clustering 02 21 / 21