7.MV - Cluster
-
Upload
rochana-ramanayaka -
Category
Documents
-
view
222 -
download
0
Transcript of 7.MV - Cluster
-
8/13/2019 7.MV - Cluster
1/15
December 25, 2013
Application of Multivariate Statistical Methods inMarketing Research
Industrial Statistics
MS3001Advanced Marketing Research
Faculty of ScienceUniversity of Colombo
Session 4
Cluster Analysis
-
8/13/2019 7.MV - Cluster
2/15
December 25, 2013
Illustration
I need to identify groups of target consumers who are similar in
buying habits, demographic characteristics, or psychographics.
Can districts of Sri Lanka be grouped based on demographics,
socio-cultural parameters, agricultural operations, extent of
development in infrastructure etc?
Cluster Analysis
-
8/13/2019 7.MV - Cluster
3/15
December 25, 2013
Cluster Analysis
In simple terms, Cluster Analysis does to objects orentities what Factor Analysis does to variables.Cluster Analysis groups objects based on a set of variables.
The groups would be relatively homogenous within and
heterogeneous across.
A range of Clustering procedures:Hierarchical
Each cluster (starting with the whole dataset) is divided into two,
then divided again, and so on
K-Means No. of clusters are subjectively input by the researcher.
-
8/13/2019 7.MV - Cluster
4/15
-
8/13/2019 7.MV - Cluster
5/15
December 25, 2013
What is Cluster Analysis?
Cluster: a collection of data objects Similar to the objects in the same cluster (Intraclass similarity) Dissimilar to the objects in other clusters (Interclass dissimilarity)
Cluster analysis Statistical method for grouping a set of data objects into clusters
A good clustering method produces high quality clusters with high intraclass similarity
and low interclass similarity
Clustering is unsupervised classification Can be a stand-alone tool or as a preprocessing step for other algorithms
-
8/13/2019 7.MV - Cluster
6/15
December 25, 2013
Group objects according to their similarity
Cluster:
a set of objects
that are similar
to each other
and separated
from the other
objects.
Example: green/
red data points
were generated
from two differentnormal distributions
-
8/13/2019 7.MV - Cluster
7/15
December 25, 2013
K-MeansClustering
The meaning of K-meansWhy it is called K-means clustering: K points are used to represent
the clustering result; each point corresponds to the centre (mean)
of a cluster
Each point is assigned to the cluster with the closest
center point
The number K, must be specified
Basic algorithm
-
8/13/2019 7.MV - Cluster
8/15
December 25, 2013
The K-MeansClustering Method
Given k, the k-meansalgorithm is implemented in 4 steps: Partition objects into knon-empty subsetsArbitrarily choose kpoints as initial centersAssign each object to the cluster with the nearest seed point (center) Calculate the mean of the cluster and update the seed point Go back to Step 3, stop when no more new assignment
The basic step of k-means clustering is simple: Iterate until stable (= no object move group):
Determine the centroid coordinate Determine the distance of each object to the centroids Group the object based on minimum distance
-
8/13/2019 7.MV - Cluster
9/15
December 25, 2013
The K-MeansClustering Results Example
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2
Arbitrarily choose K
object as initialcluster center
Assigneachobjectsto mostsimilarcenter
Updatethe
clustermeans
Updatetheclustermeans
reassignreassign
-
8/13/2019 7.MV - Cluster
10/15
December 25, 2013
Weaknesses of the K-MeansMethod
Unable to handle noisy data and outliers Very large or very small values could skew the mean
-
8/13/2019 7.MV - Cluster
11/15
December 25, 2013
Hierarchical Clustering
Start with every data point in a separate cluster Keep merging the most similar pairs of data
points/clusters until we have one big cluster left
This is called a bottom-up or agglomerative method
-
8/13/2019 7.MV - Cluster
12/15
December 25, 2013
Hierarchical Clustering (cont.)
This produces a binary tree or
dendrogram
The final cluster is the root and
each data item is a leaf
The height of the bars indicate
how close the items are
-
8/13/2019 7.MV - Cluster
13/15
December 25, 2013
Hierarchical Clustering Demo
-
8/13/2019 7.MV - Cluster
14/15
December 25, 2013
Strengths & Weakness of HierarchicalClustering Methods
Major advantage Conceptually very simple
Easy to implement most commonly used technique
-
8/13/2019 7.MV - Cluster
15/15
December 25, 2013
Applications
Market segmentation is usually conducted using someform of cluster analysis to divide people into segments
Other methods such as latent class models or archetypal
analysis are sometimes used instead
It is also possible to cluster other items such as
products/SKUs, image attributes, brands