Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis...
-
Upload
beverly-robinson -
Category
Documents
-
view
212 -
download
0
Transcript of Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis...
![Page 1: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/1.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
1
Cluster Analysis
(from Chapter 12)
![Page 2: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/2.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
2
Cluster analysis
• It is a class of techniques used to classify cases into groups that are• relatively homogeneous within
themselves and• heterogeneous between each other
• These groups are called clusters
![Page 3: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/3.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
3
Market segmentation
• Cluster analysis is especially useful for market segmentation
• Segmenting a market means dividing its potential consumers into separate sub-sets where• Consumers in the same group are similar with respect
to a given set of characteristics• Consumers belonging to different groups are dissimilar
with respect to the same set of characteristics
• This allows one to calibrate the marketing mix differently according to the target consumer group
![Page 4: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/4.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
4
Other uses of cluster analysis• Clustering of similar brands or products according to
their characteristics allow one to identify competitors, potential market opportunities and available niches.
• Data reduction• Factor analysis and principal component analysis allow to
reduce the number of variablesnumber of variables. • Cluster analysis allows to reduce the number of number of
observationsobservations, by grouping them into homogeneous clusters.
• Maps profiling simultaneously consumers and products, market opportunities and preferences as in preference or perceptual mappings.
![Page 5: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/5.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
5
Steps to conduct a cluster analysis
• Select a distance measure• Select a clustering algorithm• Define the distance between two
clusters• Determine the number of clusters• Validate the analysis
![Page 6: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/6.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
6
Distance measures for individual observations
• To measure similarity between two observations a distance measure is needed.
• Multiple variables require an aggregate distance measure
• The most known measure of distance is the Euclidean distance, which is the concept we use in everyday life for spatial coordinates.
![Page 7: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/7.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
7
Examples of distances
Dij distance between cases i and j
xkj value of variable xk for case j
Problems: Different measures = different weightsCorrelation between variables (double counting)
Solution: Standardization, rescaling, principal Solution: Standardization, rescaling, principal component analysiscomponent analysis
2
1
n
ij ki kjk
D x x
1
n
ij ki kjk
D x x
Euclidean distance
City-block (Manhattan) distance
A
BA
B
![Page 8: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/8.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
8
Clustering procedures
• Hierarchical procedures• Agglomerative (start from n clusters to
get to 1 cluster)• Divisive (start from 1 cluster to get to n
clusters)
• Non hierarchical procedures• K-means clustering (knowledge of the
number of clusters (c) is required).
![Page 9: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/9.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
9
Distance between clusters
• Algorithms vary according to the way the distance between two clusters is defined.
• The most common algorithm for hierarchical methods include• single linkage method• complete linkage method• average linkage method• Ward algorithm• centroid method
![Page 10: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/10.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
10
Linkage methods• Single linkage method (nearest neighbour):
distance between two clusters is the minimum distance among all possible distances between observations belonging to the two clusters.
• Complete linkage method (furthest neighbour): nests two cluster using as a basis the maximum distance between observations belonging to separate clusters.
• Average linkage method: the distance between two clusters is the average of all distances between observations in the two clusters
![Page 11: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/11.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
11
Hierarchical vs. non-hierarchical methods
Hierarchical Methods Non-hierarchical methods
No decision about the number of clusters
Problems when data contain a high level of error
Can be very slow, preferable with small data-sets
Initial decisions are more influential (one-step only)
At each step they require computation of the full proximity matrix
Faster, more reliable, works with large data sets
Need to specify the number of clusters
Need to set the initial seeds Only cluster distances to seeds need
to be computed in each iteration
![Page 12: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/12.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
12
The number of clusters c• Two alternatives
• Determined by the analysis• Fixed by the researchers
• In segmentation studiessegmentation studies, the c c represents the number of potential separate segments.
• Preferable approach: “let the data speak”• Hierarchical approach and optimal partition identified
through statistical tests (stopping rule for the algorithm)• However, the detection of the optimal number of clusters is
subject to a high degree of uncertainty
• If the research objectives allow a choice rather than estimating the number of clusters, non-hierarchical methods are the way to go.
![Page 13: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/13.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
13
Example: fixed number of clusters
• A retailer wants to identify several shopping profiles in order to activate new and targeted retail outlets
• The budget only allows him to open three types of outlets
• A partition into three clusters follows naturally, although it is not necessarily the optimal one.
• Fixed number of clusters and (k-means), non hierarchical approach
![Page 14: Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)](https://reader036.fdocuments.in/reader036/viewer/2022072006/56649d095503460f949dbdf8/html5/thumbnails/14.jpg)
Statistics for Marketing & Consumer ResearchCopyright © 2008 - Mario Mazzocchi
14
Determining the optimal number of cluster from hierarchical
methods(in SPSS)
• Agglomeration schedule (programma di agglomerazione)
• Icicle plot (grafico a “stalattite”)• Dendrogram