Clustering 101 for Insurance Applications...• Partition-based clustering method • Relatively...

April 25, 2019

Tom Kolde, FCAS, MAAA

Linda Brobeck, FCAS, MAAA

Clustering 101 for Insurance Applications

About the Presenters

• Linda Brobeck, FCAS, MAAA • Director & Consulting Actuary• San Francisco, CA

• Tom Kolde, FCAS, MAAA• Consulting Actuary• Chicago, Illinois

Agenda

• Supervised vs. Unsupervised Learning

• Clustering Algorithms Overview

– Hierarchical Clustering

– K-Means

• Clustering Application Examples

Supervised Vs. Unsupervised Machine Learning

Machine Learning

Supervised

Predictive

Target Variable

Task Driven

Regression, Classification

Unsupervised

Descriptive

No Target Variable

Data Driven

Clustering, Pattern Discovery, Dimension Reduction

Reinforcement – Algorithm Learns to React

Principal Component Analysis

Clustering

Neural Networks

Polling Question #1

What types of unsupervised learning have you used in the past?

DE None….YET

SELECT ALL APPLICABLE

Types of Clustering

Clustering Algorithms

Connectivity

Hierarchical

Agglomerative

Divisive

Centroid

K-Means

Fuzzy C-Means

K-Mediods

Distribution

Expectation Maximization

Density

OPTICS

DBSCAN

• Additional types of cluster models

– Neural models

– Principal component analysis

• Hard vs. Soft (Fuzzy) clustering

• Finer distinctions

– Strict partitioning (with or without outliers)

– Overlapping

Other Clustering Options

• Bottom Up - Agglomerative

Hierarchical Clustering (HCA)

7 Clusters

6 Clusters

Euclidean Distance

A = (x1, y1)

B = (x2, y2)

y1 - y2

d = √ (y1 – y2)2 + (x2 - x1)

x2- x1

Distance Matrix

a b c d e f

b 5.39

c 2.31 4.81

d 2.42 5.32 0.51

e 5.02 5.49 2.71 2.69

f 6.00 6.25 3.70 3.63 1.02

g 6.20 6.53 3.91 3.81 1.26 0.28

a 4.0 5.0

b 6.0 10.0

c 6.3 5.2

d 6.4 4.7

e 9.0 5.4

f 10.0 5.2

g 10.2 5.0

Data Points Euclidean Distances

6 Clusters

5 Clusters

4 Clusters

3 Clusters

2 Clusters

1 Cluster

B A C D E F G

Hierarchical Algorithm

• Advantage

– Easy to understand

– Flexible

• Disadvantages

– Not easily computable for large data sets

– Sensitive to outliers

• Partition-based clustering method

• Relatively simple to understand & program

• K-means Algorithm:

1. Start with a random set of k cluster seeds

2. For each data point, calculate the distance to the each cluster seed and assign to the closest seed

3. Once all data points have initially been assigned, calculate the centroid of each cluster

4. Repeat Step 2 using the cluster centroids instead of the initial cluster seed

5. For each new cluster, re-calculate the centroid

6. Repeat steps 3-5 until convergence

Introduction to the K-Means Algorithm

• Our example begins with 95 data points

K-Means Cluster Analysis

0 20 40 60 80 100 120 140 160

• Next we assume the data has 3 clusters and randomly generate initial seed centroids for each

Seed 1

Seed 2

Seed 3

0 20 40 60 80 100 120 140 160

Cluster 1 Cluster 2 Cluster 3

• Each data point is assigned to the closest seed for its initial cluster

Seed 1

Seed 2

Seed 3

0 20 40 60 80 100 120 140 160

• New centroids are calculated for each cluster

Seed 1

Seed 2

Seed 3

Centroid 1

Centroid 3 Centroid 2

0 20 40 60 80 100 120 140 160

• Data points are assigned to the nearest centroid

0 20 40 60 80 100 120 140 160

• New centroids are calculated for the data points within each cluster

0 20 40 60 80 100 120 140 160

• The process continues with data points being assigned to the nearest cluster centroid until convergence

Advantages/Disadvantages of K-Means Algorithm

• Advantage

– Computationally simple

• Disadvantages

– Number of clusters k must be pre-selected

– Results may not be repeatable when using randomly selected seed centroids

• K-Medians Algorithm

– Less sensitive to outliers

– More processing time (to sort dataset)

• Variable reduction for modeling

• Territory analysis for ratemaking

Clustering Applications

• 44 Macro Economic Variables

– Unemployment (current, long-term, local, state, countrywide, changes over time)

– Housing Prices (changes over time, local, state, countrywide)

– Treasury Rates (short-term, long-term, yield curve slope, etc.)

– GDP (change over time, duration negative or positive, ratios)

• Correlation Matrix

• PROC VARCLUS in SAS (Oblique Centroid Component Cluster Analysis)

• Variable selection for each cluster

Variable Reduction Example

• Treas rate 30 yr• UE 1prior MSA• UE 1prior ST• UE 1prior CW• UE 3prior MSA• UE 3prior ST• UE 3prior CW• UE rel ST• UE rel MSA• UE rel CW• UE ST• UE MSA• UE CW• Yield Curve Slope• GDP current• GDP Prior• GDP dur neg• GDP dur pos• GDP recession• GDP ratio• GDP ratio 1YR• GDP ratio 2YR

Variable Reduction Example

• UE 10 yr MSA• UE 10 yr CW• UE 10 yr ST• UE Delta ST• UE Delta MSA• UE Delta CW• Fixed 30 YR rate• House Price Apprec 2YR ST• House Price Apprec 2YR MSA• House Price Apprec 2YR CW• Home Price Index ST• Home Price Index MSA• Home Price Index CW• Treas rate 3 mo• Treas rate 6 mo• Treas rate 1 yr• Treas rate 2 yr• Treas rate 3 yr• Treas rate 5 yr• Treas rate 7 yr• Treas rate 10 yr• Treas rate 20 yr

Portion of the Correlation Matrix

rate 3 mo

rate 6 mo

rate 1 yr

rate 2 yr

rate 3 yr

rate 5 yr

rate 7 yr

rate 10 yr

rate 20 yr

rate 30 yr

Treas rate 3 mo 1 0.99828 0.99324 0.9728 0.94592 0.88626 0.83375 0.78125 0.21737 0.64186

Treas rate 6 mo 0.99828 1 0.9976 0.97972 0.95364 0.89353 0.83958 0.78627 0.21897 0.64229

Treas rate 1 yr 0.99324 0.9976 1 0.99018 0.96911 0.91428 0.86236 0.8095 0.22962 0.66596

Treas rate 2 yr 0.9728 0.97972 0.99018 1 0.99336 0.9569 0.91471 0.86526 0.26694 0.72931

Treas rate 3 yr 0.94592 0.95364 0.96911 0.99336 1 0.98265 0.95119 0.90702 0.29029 0.78043

Treas rate 5 yr 0.88626 0.89353 0.91428 0.9569 0.98265 1 0.99115 0.96453 0.32941 0.86757

Treas rate 7 yr 0.83375 0.83958 0.86236 0.91471 0.95119 0.99115 1 0.98911 0.34906 0.91986

Treas rate 10 yr 0.78125 0.78627 0.8095 0.86526 0.90702 0.96453 0.98911 1 0.36967 0.96373

Treas rate 20 yr 0.21737 0.21897 0.22962 0.26694 0.29029 0.32941 0.34906 0.36967 1 0.39463

Treas rate 30 yr 0.64186 0.64229 0.66596 0.72931 0.78043 0.86757 0.91986 0.96373 0.39463 1

VARCLUS output

Total Proportion Minimum Minimum Maximum

Number Variation of Variation Proportion R-squared 1-R**2

of Explained by Explained Explained for a Ratio for

Clusters Clusters by Clusters by a Cluster Variable a Variable

1 2.0213 0.0459 0.0459 0

2 11.8449 0.2692 0.1105 0 2.3283

3 17.6347 0.4008 0.1212 0 2.0271

4 23.8405 0.5418 0.16 0.0126 1.8387

5 27.7395 0.6304 0.3053 0.0825 1.5727

6 30.1739 0.6858 0.4645 0.1161 1.3948

7 31.5827 0.7178 0.571 0.1292 1.3087

8 32.4495 0.7375 0.5919 0.1292 1.5582

9 33.4705 0.7607 0.6476 0.1292 1.5582

10 35.7483 0.8125 0.71 0.1655 1.5582

11 36.3604 0.8264 0.7369 0.1655 1.5582

12 37.0867 0.8429 0.7459 0.1655 1.5582

13 37.9171 0.8618 0.7898 0.1655 1.5582

VARCLUS output

1 2.0213 0.0459 0.0459 0

2 11.8449 0.2692 0.1105 0 2.3283

3 17.6347 0.4008 0.1212 0 2.0271

4 23.8405 0.5418 0.16 0.0126 1.8387

5 27.7395 0.6304 0.3053 0.0825 1.5727

6 30.1739 0.6858 0.4645 0.1161 1.3948

7 31.5827 0.7178 0.571 0.1292 1.3087

8 32.4495 0.7375 0.5919 0.1292 1.5582

9 33.4705 0.7607 0.6476 0.1292 1.5582

10 35.7483 0.8125 0.71 0.1655 1.5582

11 36.3604 0.8264 0.7369 0.1655 1.5582

12 37.0867 0.8429 0.7459 0.1655 1.5582

13 37.9171 0.8618 0.7898 0.1655 1.5582

VARCLUS output

1 2.0213 0.0459 0.0459 0

2 11.8449 0.2692 0.1105 0 2.3283

3 17.6347 0.4008 0.1212 0 2.0271

4 23.8405 0.5418 0.16 0.0126 1.8387

5 27.7395 0.6304 0.3053 0.0825 1.5727

6 30.1739 0.6858 0.4645 0.1161 1.3948

7 31.5827 0.7178 0.571 0.1292 1.3087

8 32.4495 0.7375 0.5919 0.1292 1.5582

9 33.4705 0.7607 0.6476 0.1292 1.5582

10 35.7483 0.8125 0.71 0.1655 1.5582

11 36.3604 0.8264 0.7369 0.1655 1.5582

12 37.0867 0.8429 0.7459 0.1655 1.5582

13 37.9171 0.8618 0.7898 0.1655 1.5582

VARCLUS OUTPUT

1-R**2

Own Next Ratio

Cluster Closest

Cluster 9 Fixed 30 YR rate 0.9003 0.7565 0.4093

Treas rate 3 mo 0.8434 0.3789 0.2522

Treas rate 6 mo 0.852 0.3861 0.2412

Treas rate 1 yr 0.8793 0.4136 0.2059

Treas rate 2 yr 0.9346 0.4757 0.1247

Treas rate 3 yr 0.9624 0.5218 0.0787

Treas rate 5 yr 0.9746 0.6031 0.0639

Treas rate 7 yr 0.9512 0.6584 0.1427

Treas rate 10 yr 0.9107 0.7231 0.3223

Treas rate 20 yr 0.1655 0.1063 0.9338

Treas rate 30 yr 0.7495 0.7659 1.0701

Cluster 10 UE Delta ST 0.9249 0.2696 0.1028

UE Delta CW 0.9249 0.3729 0.1197

10 Cluster Solution R-squared with

Cluster

1 − 𝑅𝑜𝑤𝑛2

1 − 𝑅𝑛𝑒𝑎𝑟𝑒𝑠𝑡2

• Calculated the correlation matrix to be used in VARCLUS• Selected number of clusters based on the proportion of variation

explained by clusters and the minimum R-squared for a variable within the cluster

• Selected the variable with the smallest 1-R2 ratio to represent the cluster– 5 year treasury rate– Prior quarter countrywide unemployment rate– Prior quarter MSA unemployment rate– Ratio GDP current to 2 years prior– Current GDP– GDP recession indicator– State home price index– MSA home price index– Duration of positive GDP growth– Change in unemployment rate by state

Summary of Variable Reduction Clustering

• Deriving territory definitions is a common application of cluster analysis in ratemaking

• Goals:

– Loss experience by territory should be actuarially credible

– Balance homogeneity of loss experience within territory while producing a manageable number of territories

– Contiguous territories

• Solution = Hierarchical clustering using Ward’s method with contiguity constraint

Introduction to Territorial Clustering

• Each square below represents a zip code in our hypothetical State X

Introduction to Territorial Cluster Analysis

West Town

North Center

Star City

Central City

South Shore City

Old Town

West Town

North Center

Star City

Central City

South Shore City

Old Town

• Step 1 – Determine raw pure premium by zip code

Introduction to Territorial Cluster Analysis

Lower PP Higher PP

West Town

North Center

Star City

Central City

South Shore City

Old Town

• Not every zip code is fully credible

• Spatial smoothing allows us to obtain credible results by zip code

Engineering Credible Loss Experience by Zip Code

Lower PP Higher PP

West Town

North Center

Star City

Central City

South Shore City

Old Town

• Determine the credibility for a single zip code

Spatial Smoothing

Credibility = Z0

Pure Premium = PP0

Lower PP Higher PP

West Town

North Center

Star City

Central City

South Shore City

Old Town

• Determine pure premium and credibility for area including surrounding zip codes

Spatial Smoothing

Credibility = Z1

Pure Premium = PP1

Lower PP Higher PP

West Town

North Center

Star City

Central City

South Shore City

Old Town

• Determine pure premium and credibility for area including surrounding zip codes

Spatial Smoothing

Credibility = Z2

Pure Premium = PP2

Lower PP Higher PP

• Smoothed PP =

PP0 x Z0 + PP1 x (Z1-Z0) + PP2 x (Z2-Z1) + PPState x (1-Z2)

Spatial Smoothing

West Town

North Center

Star City

Central City

South Shore City

Old Town

Lower PP Higher PP

• Spatial Smoothing helps uncover patterns hidden within the loss experience

Spatial Smoothing

West Town

North Center

Star City

Central City

South Shore City

Old Town

West Town

North Center

Star City

Central City

South Shore City

Old Town

Raw Pure Premium Smoothed Pure Premium

Lower PP Higher PP

• Ward’s method seeks to minimize the variance of data characteristics within each cluster

• In territorial cluster analysis this means minimizing the within cluster variance of loss experience metrics, such as frequency or pure premium

• In this case, frequency/pure premium is not viewed as a target variable but rather as a risk characteristic of a zip code

Ward’s Method

• The variance measure for combining clusters is the within-cluster sum of squares between a data object and the mean of the cluster:

– Within-cluster sum of squares = ESS = ∑∑ 𝑋𝑖𝑗 − ത𝑋𝑖.2

– Between-cluster sum of squares = BSS = ∑∑ ത𝑋𝑖. − ത𝑋..2

– Total sum of squares = TSS = ∑∑ 𝑋𝑖𝑗 − ത𝑋..2

– TSS = ESS + BSS

Ward’s Method

• Begin with each zip code as its own cluster (N=600)

• Evaluate each pair of contiguous zip codes to determine the within-cluster variance

• The pair of zip codes which are most similar (i.e., produce the smallest within-cluster variance) is formed into a cluster

• Next, the clusters from the 1st iteration (N-1=599) are evaluated to find the pair with the minimum within-cluster variance. This pair is combined to form the second cluster

• The process continues until all zip codes are grouped into a single cluster

Territorial Cluster Analysis Using Ward’s Method

West Town

North Center

Star City

Central City

South Shore City

Old Town

• The highlighted pair of zip codes produce the smallest within-cluster variance of any pair of contiguous zip codes

West Town

North Center

Star City

Central City

South Shore City

Old Town

• The process continues combining zip codes into clusters until all zip codes are combined into a single cluster

West Town

North Center

Star City

Central City

South Shore City

Old Town

• The process continues combining zip codes into clusters until all zip codes are combined into a single cluster

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Number of Territories

Percentage of Total Variance Explained by Within-Cluster Variance

• Ward’s Method does not explicitly optimize the number of territories but it can provide insight into the percentage of total variance explained by the within-cluster variance

• A common metric used for this evaluation is ESS/TSS

Determining the Number of Territories

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Number of Territories

Percentage of Total Variance Explained by Within-Cluster Variance

• Ward’s Method does not explicitly optimize the number of territories but it can provide insight into the percentage of total variance explained by the within-cluster variance

• A common metric used for this evaluation is ESS/TSS

Determining the Number of Territories

15.2% of the total variance is explained by the within-cluster variance at 12 territories

10.2% of the total variance is explained by the within-cluster variance at 22 territories

Territorial Cluster Analysis Results

3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 3 3 3 3 3 3 3 2 2 l 2 2

3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 6 6 3 3 2 2

3 3 3 3 3 3 3 3 3 3 3 3 7 l 7 6 6 6 3 6 3 6 3 3 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3 3 3 7 7 7 6 6 6 6 6 3 3 3 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3 3 3 3 6 7 6 6 6 6 2 2 2 2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3 3 3 6 6 7 6 6 6 6 6 2 2 2 4 2 2 2 2 2 2

3 5 5 5 5 5 3 6 6 3 3 6 7 7 6 6 6 6 2 2 4 4 4 2 2 2 2 2

3 3 5 3 6 6 6 6 6 6 6 6 6 6 6 4 2 2 2 2 2 2

3 3 3 5 l 5 5 3 6 6 6 6 6 6 6 10 6 6 6 4 4 l 4 4 4 2 2 2 2 2

3 3 3 5 5 5 3 3 3 3 3 3 6 10 l 10 6 6 6 6 4 4 4 4 4 2 2 2 2 2

3 3 3 3 5 5 3 3 11 11 11 6 6 6 6 6 6 6 4 4 4 4 2 2 2

11 3 3 3 11 5 3 3 11 11 6 6 6 6 6 10 6 6 6 6 6 2 2 2 2 2 2 2 2 2

11 3 11 11 11 11 11 11 11 11 11 11 6 6 6 6 6 6 11 6 6 6 2 2 2 2 2 2 2 2

11 11 11 11 11 11 11 11 11 11 11 6 6 6 6 6 6 11 11 11 6 6 6 2 2 2 2 2 2 2

11 11 11 11 11 11 11 11 11 11 11 6 6 6 6 6 11 11 6 6 6 11 6 2 2 11 2 2 2 2

11 11 11 11 11 11 11 11 11 11 11 6 6 6 6 11 11 11 11 8 8 11 11 2 2 11 12 12 11 11

11 11 11 11 11 11 11 11 11 11 11 6 6 6 11 11 11 11 11 11 8 8 11 2 11 l 12 12 11

11 11 11 11 11 11 11 11 11 11 11 11 6 6 6 11 11 11 11 11 11 11 11 11

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11