Customer Segmentation using Clustering
-
Upload
dessy-amirudin -
Category
Data & Analytics
-
view
43 -
download
1
Transcript of Customer Segmentation using Clustering
![Page 1: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/1.jpg)
1
Confidential. © Stream Intelligence Ltd. All rights reserved.
Introduction to Clustering
![Page 2: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/2.jpg)
2
Confidential. © Stream Intelligence Ltd. All rights reserved.
Agenda
1 Introduction: Business Case
2 Clustering
3 Hierarchical Clustering
4 K-means Clustering
![Page 3: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/3.jpg)
3
Confidential. © Stream Intelligence Ltd. All rights reserved.
Business Case
1
![Page 4: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/4.jpg)
4
Confidential. © Stream Intelligence Ltd. All rights reserved.
Business Case – Predicting Successful Music Production
Clustermusic A
Cluster music B
Clustermusic C
Clustermusic D
• Target is to appear at Billboard’s weekly to 40• Cost per single could up to 300K USD• Music Intelligence Solution using clustering to predict if a music will be
accepted by market• Increase success rate from 1 out of 10 to 8 out of 10
![Page 5: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/5.jpg)
5
Confidential. © Stream Intelligence Ltd. All rights reserved.
Clustering
2
![Page 6: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/6.jpg)
6
Confidential. © Stream Intelligence Ltd. All rights reserved.
Statistical Learning Categorization
Statistical Learning
Unsupervised Learning
Supervised Learning
Clustering Predictive Model
![Page 7: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/7.jpg)
7
Confidential. © Stream Intelligence Ltd. All rights reserved.
Clustering
• Process of grouping a set of physical or abstract objects into clusters (example: customer, product etc.)
• A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters
• Similarity is calculated based distance between point
• Common distance measure is Euclidian distance
![Page 8: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/8.jpg)
8
Confidential. © Stream Intelligence Ltd. All rights reserved.
Hierarchycal Clustering
2
![Page 9: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/9.jpg)
9
Confidential. © Stream Intelligence Ltd. All rights reserved.
Hierarchical Clustering
• Start with each data point in its own cluster
![Page 10: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/10.jpg)
10
Confidential. © Stream Intelligence Ltd. All rights reserved.
Hierarchical Clustering
• Combine two nearest clusters (Euclidian, Centroid)
![Page 11: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/11.jpg)
11
Confidential. © Stream Intelligence Ltd. All rights reserved.
Lets Practice
• The data for this exercise was downloaded from www.movielens.org• Open “clustering_movie.R”• The movies in the dataset are categorized as belonging to different gender:
a. Actionb. Comedyc. Sci-Fid. etc.
![Page 12: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/12.jpg)
12
Confidential. © Stream Intelligence Ltd. All rights reserved.
Dendogram
Heights represent the distance between point/cluster
![Page 13: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/13.jpg)
13
Confidential. © Stream Intelligence Ltd. All rights reserved.
Finding Meaningful Cluster
• How to see which cluster have the most action movies? use this command:
tapply(movies$Action, clusterGroups, mean)
• Exercise: Can you find the characteristic of each cluster? Hint:
- Add the cluster as one of the variable in the data- Load dplyr library- Use aggregate and summarise function
![Page 14: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/14.jpg)
14
Confidential. © Stream Intelligence Ltd. All rights reserved.
Common scenario
Tips:- Normalize the data
Movie Action Romance Rating Revenue (in USD)
A 1 1 5 200B 0 1 4 150C 0 0 3 50D 1 1 4 120
![Page 15: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/15.jpg)
15
Confidential. © Stream Intelligence Ltd. All rights reserved.
K-means Clustering
2
![Page 16: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/16.jpg)
16
Confidential. © Stream Intelligence Ltd. All rights reserved.
K-Means Clustering
1. Group data into K-clusters by:a. Determining the k centroidb. Group the data points to the nearest centroid
2. Algorithm works by iterating between two stages until the data points converge
Objective : High Level Description
![Page 17: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/17.jpg)
17
Suppose k=3
K-Means Illustrations
![Page 18: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/18.jpg)
18
Iteration = 0
1. Start with random positions of centroids.
K-Means Illustrations
![Page 19: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/19.jpg)
19
Iteration = 1
1. Start with random positions of centroids.2. Assign each data point to closest centroid
K-Means Illustrations
![Page 20: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/20.jpg)
20
Iteration = 1
1. Start with random positions of centroids.2. Assign each data point to closest centroid3. Move centroids to center of assigned
points (recalculating C)
K-Means Illustrations
![Page 21: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/21.jpg)
21
Iteration = 3
1. Start with random positions of centroids.2. Assign each data point to closest centroid3. Move centroids to center of assigned
points4. Iterate till minimal cost
K-Means Illustrations
![Page 22: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/22.jpg)
22
Iteration = 3
1. Start with random positions of centroids.2. Assign each data point to closest centroid3. Move centroids to center of assigned
points4. Iterate till minimal cost
What potentially can go wrong?
![Page 23: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/23.jpg)
23Optimum Number of Cluster Illustrations
TSS = Total Sum of Square ErrorK = Number of cluster
Optimum Number of Cluster
![Page 24: Customer Segmentation using Clustering](https://reader036.fdocuments.in/reader036/viewer/2022062523/58ef310d1a28abc4168b4615/html5/thumbnails/24.jpg)
24
Confidential. © Stream Intelligence Ltd. All rights reserved.
Lets Practice
• We will use the credit card profile data (cc-profile.csv)• Open “segmenting_customer.R”
Exercise:• What is the optimum number of cluster?• Please provide the characteristics of segment. Do you think it is meaningful?