Basic Modeling Strategy...• Overall Strategy 7.Combine Models CAS Predictive Modeling Oct 2006 GLM...
Transcript of Basic Modeling Strategy...• Overall Strategy 7.Combine Models CAS Predictive Modeling Oct 2006 GLM...
1
GLM II: Basic Modeling Strategy
CAS Predictive Modeling Special Interest SeminarGeoff Werner – Senior Consultant – EMB America
CAS Predictive Modeling
Oct 2006Basic Modeling Session
OUTLINEBackgroundOverall Modeling StrategyBasic Predictive Modeling Steps
1. Get clean data2. Select an initial error structure, link function, and model
structure 3. Test error structure/link function4. Preliminary investigation5. Build predictive models iteratively 6. Validate final predictive model7. Combine models, if modeling frequency and severity
Summary
PURPOSE: To discuss basic modeling strategies and techniques for building appropriate GLM models• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
2
CAS Predictive Modeling
Oct 2006
Purpose of Predictive Modeling
To predict a response variable using a series of explanatory variables (or rating factors)
Dependent/ResponseLossesClaims
Retention
Independent/PredictorsAge Accidents
Limit ConvictionsTerritory Credit Score
WeightsClaims
ExposuresPremium
Statistical Model
Model ResultsParameters
Validation Statistics
Same techniques apply regardless of what is being modeled, this session will focus on risk modeling as it is the most common application
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Multivariate method that considers all factors simultaneously
Generalized Linear Models (GLMs)
y = h(Linear Combination of Rating Factors) + Error
g=h-1 is called the LINK function
Error reflects underlying process
Response Variable
Systematic Component
Random Component= +
Combination of rating factors is the model
structure
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
3
CAS Predictive Modeling
Oct 2006
GLM Building BlocksError Structure
Density: Severity
0
200
400
600
800
1,000
1,200
1,400
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000
Range
Den
sity
Severity
Reflects the variability of the underlying process and can be any distribution within the exponential family, for example:
Frequency: Frequency
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
0 1 2 3 4
Range
Freq
uenc
y
Frequency
- Gamma consistent with severity modeling, may want to try Inverse Gaussian
- Poisson consistent with frequency modeling
- Tweedie consistent with pure premium modeling
- Normal useful for a variety of applications
y = h(Linear Combination of Rating Factors) + Error
Density: Pure Premium
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000
2,200
2,400
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000
Range
Den
sity
Pure Premium
Density: Loss Ratio
0
100
200
300
400
500
600
700
800
900
1,000
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Range
Den
sity
Loss Ratio
Model Lin
k
Erro
r
Mod
el
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
GLM Building BlocksModel Structure
Include variables that are predictive, exclude those that are not
– Gender may not have major impact on theft severity
Simplify some rating factors, if full inclusion is not necessary
– Some levels within a particular predictor may be grouped together (e.g., 50-54 year olds)
– A curve may replicate the signal (e.g., amount of insurance)
Complicate model if the relationship between levels of one variable depends on another characteristic
- The difference between males and females depends on age
y = h(Linear Combination of Rating Factors) + Error
Model Lin
k
Erro
r
Mod
el
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
4
CAS Predictive Modeling
Oct 2006
GLM Building BlocksLink Function
Link function (g=h-1) chosen to based on how the factors are related to produce the best signal:
- Log: variables related multiplicatively (e.g., risk modeling)
- Identity: variables related additively (e.g., risk modeling)
- Logit: retention or risk modeling
- Reciprocal: canonical link for gamma distribution
- Mixed: additive/multiplicative rating algorithms
y = h(Linear Combination of Rating Factors) + Error
Model Lin
k
Erro
r
Mod
el
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006Overall Modeling Strategy Questions
Should you model loss ratios?
Should you model frequency and severity separately by coverage/peril or model in the aggregate?
Should you only model current rating variables?
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
5
CAS Predictive Modeling
Oct 2006Should You Model Loss Ratios?
Some companies model loss ratios- May find it difficult to obtain exposures- Do not want to pull all of the data, so
assume using loss ratios will “adjust”for excluded variables
- Habit formed when performing traditional analysis
Theoretical and practical disadvantages to loss ratio modeling- On-level calculations- No defined error distribution- Difficult to distinguish noise from pattern- If changes made, models cannot be reused
Loss Ratios• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Loss Ratio ModelingOn-Level Calculations
When modeling using loss ratios, premiums should be put on-level to adjust for changes during or after the historical period- Rate changes - Underwriting changes
Not sufficient to use an average on-level approach (e.g., parallelogram method) when changes impact classes differentlyInstead, put premiums on-level at the granular level (e.g., extension of exposures)- Can be time consuming- Data may not be readily available
Depending on the type and magnitude of the changes, failure to put premiums on level can result in serious under- and over-predictionsPure premiums use exposures so this is a non-issue
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
6
CAS Predictive Modeling
Oct 2006
Loss Ratio ModelingDefined Error Structure
When modeling frequency and severity, there are generally accepted distributions
Density: Severity
0
200
400
600
800
1,000
1,200
1,400
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000
Range
Den
sity
Severity
Frequency: Frequency
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
0 1 2 3 4
Range
Freq
uenc
y
Frequency
What is the typical distribution for loss ratios?- There is no generally accepted standard- The distribution will vary by company, line, and over time
Gamma considered a standard for severity modeling
Poisson considered a standard for frequency modeling
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Loss Ratio ModelingDiscerning PatternsWhen viewing frequency and severity data separately, easy to discern patterns from the noise
Frequency
0
100
200
300
400
500
600
Age
Exp
osur
es (0
00s)
-
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040
0.045
Freq
uenc
y
Frequency
0
100
200
300
400
500
600
Age
Exp
osur
es (0
00s)
-
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040
0.045
Freq
uenc
y
Loss Ratios By Age
0
100
200
300
400
500
600
Age
Expo
sure
s (0
00s)
0%
20%
40%
60%
80%
100%
120%
140%
Loss
Rat
io
With loss ratio difficult or impossible to determine pattern from noise
Raw Frequency by Age of Driver Smoothed Frequency by Age of Driver
Raw Loss Ratio by Age of Driver
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
7
CAS Predictive Modeling
Oct 2006
Loss Ratio ModelingRe-usabilityLoss ratio modeling- Modeling losses/premiums, thus it is imperative that
premiums be put on-level- If a review results in changes
• All of the loss ratios will change• The relationships between levels of factors may change as well
- Models built in last review will be inappropriatePure Premium modeling- Modeling does not involve premium, thus unnecessary to put
premiums on level- If a review results in changes
• The frequencies, severities, pure premiums will not change• The relationships between levels will be unaffected
- Models built in last review may still be appropriate
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006Granular or Combined Modeling?
Some tempted to model raw pure premiums or combined coverages/perils, presumably to save time As with traditional analysis (e.g., selecting loss trends), preferable to analyze at the granular level
Different loss trends by peril can mask results
Different frequency and severity trends can mask results
Perils have different size of loss distributions
Frequency and severity have defined error structures
Predictors affect perils differently (e.g., theft device)
Predictors impact frequency and severity differently (e.g., limit)
High variable perils mask stable perilsSeverity trends mask frequency signal
By-Peril or All PerilsFreq/Severity or Pure Premium
If necessary, consider Tweedie and Joint Modeling macros
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
8
CAS Predictive Modeling
Oct 2006Use All Available Data?
Pure ModelingUse all data to remove “noise” and find signalExample, geodemographicdata may be more predictive than current territory
Raw Historical
Data
ModeledData
Constrained Indications
Pure Modeling Constraint Modeling
Companies may limit number of variables reviewed. For example, companies may mistakenly exclude- Variables not allowed by regulation or not currently used- Variables not being changed with current review- Underwriting variables
Constraint ModelingConvert modeled results into usable indicationsIncorporate restrictions- Systems- Regulatory- Competitive
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006Predictive Modeling Overall Strategy
CW Historical Data
Coverage/COLClaim Counts
Exposures Characteristics
Coverage/COLLoss $ Claim
Counts Characteristics
Frequency Models
By Coverage/COL
Severity Models
By Coverage/COL
CW Predictive Models
Avoid modeling loss ratiosBuild frequency and severity models by coverage/cause of lossUse all available data to find the best signal
ModeledPure Premiums
By Coverage/COL
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
9
CAS Predictive Modeling
Oct 2006
7Combine Models
Basic Modeling Steps
1. Gather necessary internal and external data2. Select initial error structure, link function, and model structure3. Perform basic diagnostic tests to become familiar with data4. Validate initial selections for error structure and link function5. Build predictive models
- Add/exclude variables- Group levels- Include variates- Add interactions
6. Perform tests to validate the models built7. Combine granular models, if necessary
6Validate Models
5Build Model
Structure
4Prelim Invest
1 Get Clean Data
2Make Initial
Selections
3Test Error Structure
& Link Function
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Get Clean DataGood project results start with good data- Internal data- External data
Data remains the number 1 issue- Null records or bad data, especially
for variables not used in rating- Poor linkage between losses and policy characteristics- Too much pre-banding of data- No mapping of old groupings into new groupings - For auto, no linkage between operator, vehicle, and
policy characteristics- Inconsistency between variables (e.g., 30 year olds living
in a retirement community)Key: spend the right amount of time on data acquisition!- Typically 50% of first review- Some issues cannot be resolved, impact on analysis
depends on the type and extent of the problem
Data
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
10
CAS Predictive Modeling
Oct 2006Initial Selections
µ0Normal----
µ(1- µ)BinomialLogitConversion Rate
µ3Inverse GaussianLogClaim Severity
µ (1-µ)BinomialLogitRetention Rate
µTGamma or TweedieLogRisk Premium
µ2GammaLogClaim Severity
µPoissonLogClaim Frequency
Variance FunctionMost Appropriate Error Structure
Most Appropriate Link Function
Observed Response
Use generally accepted standards as starting point for link functions and error structures
Reasonable starting point for model structure
– All or all known important factors
– Prior model (last year or other related peril)
– Forward regression model
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Test Error Structure/Link FunctionDistribution Analysis
Density: Severity
0
200
400
600
800
1,000
1,200
1,400
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000
Range
Den
sity
Severity
Examine plots of the data (e.g., size of loss distribution)Frequency: Frequency
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
0 1 2 3 4
Range
Freq
uenc
y
Frequency
- Consistent with gamma - Consistent with Poisson Density: Pure Premium
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000
2,200
2,400
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000
Range
Den
sity
Pure Premium
- Consistent with Tweedie
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
11
CAS Predictive Modeling
Oct 2006
Test Error Structure/Link FunctionMacro Residual Analysis
Normal Error Structure/Log Link (Studentized Standardized Deviance Residuals)
-8
-6
-4
-2
0
2
4
6
8
10
12
14
16
10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5Transformed Fitted Value
> 1
> 3
> 6
> 11
> 16
> 21
> 26
> 31
> 37
> 42
> 47
> 52
> 57
> 62
> 68
> 73
> 78
> 83
> 88
Gammar Error/Log Link (Studentized Standardized Dev iance Residuals)
-8
-6
-4
-2
0
2
4
6
8
10
12
14
16
10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5Transform ed Fitted Va lue
> 1
> 2
> 4
> 6
> 8
> 9
> 11
> 13
> 15
> 16
> 18
> 20
> 22
> 24
> 25
> 27
> 31
- Fanning out suggests power of variance function is too low
Plot of all residuals tests selected error structure/link function
- Elliptical pattern is ideal
- Two concentrations suggests two perils: split of use joint modeling
-3.5
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
-1,000 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000
Fitte d ValueSt
uden
tized
Sta
ndar
dize
d D
evia
nce
Res
idua
ls
> .2
> .5
> .7
> 1.4
> 2.2
> 2.9
> 3.6
> 4.3
> 5.0
> 5.7
> 6.5
> 7.2
> 7.9
> 8.6
> 9.3
> 10.1
> 10.8
> 11.5
> 12.2
> 12.9
> 13.6
> 14.4
> 15.1
> 15.8
> 16.5
> 17.2
> 17.9
> 18.7
> 19.4
> 20.1
> 20.8
> 21.5
> 22.3
> 23.0
> 23.7
> 24.4
> 25.1
> 25.8
> 26.6
> 27.3
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
0.55
0.65
0.75
0.85
0.95
1.05
1.15
1.25
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31-32 33-35
36-40
41-45 46-50
51-55 56-60
61-65 66-70
71-75 75-80
81+0
5
10
15
20
25
30
35Exposure
One-Way
GLM
Rating Area
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1A 1B 2A 2B 3A 3B 4A 4B 5A 5B 6A0
5
10
15
20
25
Exposure
M odel Prediction at Base levels
M odel Prediction + 2 Standard Erro rs
M odel Prediction - 2 Standard Erro rs
Reference P rediction at Base levels
Preliminary InvestigationTraditional statistics and simple graphs provide “quick” feel
One-way v. GLM
Standard Error Graphs
- Highlights what others within your company “know”
- Quickly highlight trends in your data
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
12
CAS Predictive Modeling
Oct 2006Preliminary Investigation
Exposure Distribution (Vehicle Age X NCD)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15+
Vehicle Age
NCD (4+)
NCD (3)
NCD (2)
NCD (1)
NCD (0)
Exposure Distribution (Age X NCD)
16 18 20 22 24 26 28 30 32 3440
-4450
-5460
-64 70+
Age
NCD (4+)
NCD (3)
NCD (2)
NCD (1)
NCD (0)
Low Correlation (.025)
- Distribution of number of years claims free about the same for each vehicle age
High Correlation (.253)
- Older drivers are more likely to be claim-free
Statistics can (e.g., Cramer’s V) identify correlated variables
Identifies independent variables that will have an effect on each other
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006Building the “Best” Model
To produce a sensible model that explains recent historical experience and is likely to be predictive of future experience
UNDERFIT
Predictive
Poor explanatory power
OVERFIT
Poor predictive power
Explains history
Overall Mean
“Best”Models
1 parameter for each
observation
Model Complexity
(Number of Parameters)
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
13
CAS Predictive Modeling
Oct 2006
Building the “Best” Model
Modeling is an iterative process
How does the analyst decide the “Best” Model? - Parameters/standard errors- Consistency of patterns over time or random data sets- Type III statistical tests (e.g., X2 tests)- Judgment (e.g., do the trends make sense)
Simplify
- Exclude
- Group
- Variate Complicate
- Include
- Interactions
Review Model• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Building the “Best” Model
Modeling is an iterative process
Add/Exclude: does the independent variable have predictive power that warrants including it in the model?
Simplify
- Exclude
- Group
- Variate Complicate
- Include
- Interactions
Review Model• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
14
CAS Predictive Modeling
Oct 2006
Build ModelsInclude/Exclude Factors
Re scale d Pre d icte d Value s - Driv e r Re str ictions
0 .7 0
0 .7 5
0 .8 0
0 .8 5
0 .9 0
0 .9 5
1 .0 0
1 .0 5
1 .1 0
1 .1 5
0 %
1 0%
2 0%
3 0%
4 0%
5 0%
6 0%
7 0%
8 0%
9 0%
1 00 %
1 10 %
1 20 %
1 30 %
A n y A ny >2 5 N a m ed >5 0 N am e d 2 5-5 0 In s u red On ly Ins u re d &S p ou s e
N a m e d <2 5
Model P rediction at B ase levels
Model P rediction + 2 S tandard E rrors
Model P rediction - 2S tandard E rrors
Parameters/standard errors tell importance of factors and “confidence” in estimates
Name Value Standard Error
Standard Error (%) Exp(Value)
Any 0.0174 0.04183 240.8 1.0175Any>25 0.0212 0.04349 205.4 1.0214Named >50 -0.0961 0.08120 84.5 0.9084Named 25-50 0.0357 0.02194 61.4 1.0364Insured OnlyInsured & Spouse 0.0255 0.01272 49.8 1.0259Named <25 -0.0446 0.02663 59.7 0.9564
- Graph of parameters/standard errors and “horizontal line test” identifies importance of a factor
- If all the parameters are essentially the same or have large standard errors, the factor may not be important
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Build ModelsInclude/Exclude Factors
Rescaled Predicted Values - Driver Restrictions
0.70
0.75
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
110%
120%
130%
Any Any >25 Named >50 Named 25-50 Insured Only Insured &Spouse
Named <25
Model Prediction at Base levels
Model Prediction + 2 Standard Errors
Model Prediction - 2Standard Errors
Linear Predictor
5.95
6.00
6.05
6.10
6.15
6.20
6.25
6.30
6.35
6.40
6.45
6.50
6.55
6.60
6.65
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
110%
120%
Any Any >25 Named >50 Named 25-50 Insured Only Insured &Spouse
Named <25
Driver Restrictions
Time (1994)
Time (1995)
Time (1996)
Examine consistency over time or random parts of data
Parameter/Standard Errors
Time Testing
- Main effects graph may indicate a questionable estimate
- By testing the pattern over time can see if the same thing happens each year
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
15
CAS Predictive Modeling
Oct 2006
Build ModelsInclude/Exclude Factors
Model With Without
Deviance 8,906.4414 8,909.6226 Degrees of Freedom 18,469 18,475 Scale Parameter 0.4822 0.4823
Chi Square Test 78.6%
Goodness of fit tests (e.g., Chi-Squared) can be used to determine the explanatory power of a variable
Chi-Squared
- Null hypothesis is that the models with and without the factor are the same
More Complex: Include FactorReject<5%
Simpler: Exclude FactorAccept>30%
??????5%-30%
Indicated ModelH0Score
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Building the “Best” Model
Modeling is an iterative process
Group: should some of the levels of a given variable be combined?
Simplify
- Exclude
- Group
- Variate Complicate
- Include
- Interactions
Review Model• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
16
CAS Predictive Modeling
Oct 2006
Build ModelsGroup Rating Levels
Age Predicted Values
250
300
350
400
450
500
550
600
650
700
lt 17 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31-32 33-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 75-80 81+
P o licyho lder Age
0
5
10
15
20
25
30
35
Exposure Age
+ SE - SE
Name ValueStandard
ErrorStandard Error (%) Weight E(Value)
Lt 17 -0.2872 0.40047 139.4 3 0.750417 0.1597 0.06488 40.6 162 1.173118 0.1838 0.05642 30.7 211 1.201819 0.0915 0.07222 78.9 106 1.095820 0.1506 0.07009 46.6 111 1.162521 0.1254 0.05478 43.7 195 1.133622 0.1364 0.05916 43.4 156 1.146223 0.1038 0.03476 33.5 587 1.109424 0.1022 0.03559 34.8 539 1.107625 0.0979 0.03288 33.6 602 1.102926 0.1207 0.03098 25.7 700 1.128327 -0.0015 0.02947 1,929.7 795 0.998528 0.0221 0.02635 119.0 1,004 1.022429 0.0345 0.02611 75.7 983 1.035130 -0.0021 0.02925 1,396.1 711 0.997931-32 0.0291 0.02059 70.8 1,952 1.029533-35 0.0079 0.01941 244.6 2,294 1.008036-40 2,95341-45 -0.0103 0.02110 204.5 1,769 0.9897
Parameters/standard errors tell importance of varying estimates for each level
- Similar parameters or “plateaus” indicate potential groups
- Look for low weights
- Group levels with
• Base level
• Neighboring classes
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Build ModelsGroup Rating LevelsStandard errors discussed earlier identify levels that should begrouped with the base class
Standard error of the parameter differences identifies non-base levels that may be grouped
Lt 17 17 18 19 20 21 22
Lt 1717 90.418 85.6 308.919 107.2 132.7 91.220 92.7 995.9 255.1 161.621 97.8 236.1 127.0 254.7 332.722 95.4 362.2 163.9 199.5 620.3 685.023 102.6 124.2 76.9 618.2 158.1 273.1 193.024 103.1 122.4 76.6 719.3 154.6 259.0 186.925 104.2 112.5 71.7 1,182.8 140.8 217.5 165.426 98.4 176.5 96.1 258.8 246.0 1,250.8 399.827 140.4 42.3 32.4 80.8 48.0 45.9 45.228 129.6 48.8 36.4 106.9 56.1 55.3 53.729 124.6 53.7 39.5 130.3 62.0 62.9 60.330 140.7 42.4 32.5 80.6 48.0 46.1 45.5
31-32 126.6 50.0 36.8 116.4 58.0 57.3 55.533-35 135.7 43.0 32.3 86.7 49.3 46.9 46.336-40 139.4 40.6 30.7 78.9 46.6 43.7 43.4
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
17
CAS Predictive Modeling
Oct 2006
Build ModelsGroup Rating Levels
Age by Year
300
400
500
600
700
800
lt 17 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31-32 33-35 36-40 41-45 46-50 51-55 56-60 61-65 66-70 71-75 75-80 81+
P o licyho lder Age
0
5
10
15
20
25
30
35
94 Exposure 95 Exposure 96 Exposure
94 95 96
Grouped Age
500
550
600
650
700
lt17
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31-32
33-35
36-40
41-45
46-50
51-55
56-60
61-65
66-70
71-75
75-80
81+
Policyholder Age
0
20
40
60
80
100
Time (1996) Exposure
Time (1995) Exposure
Time (1994) ExposureTime (1996)
Time (1995)
Time (1994)
Explore if “indicated” groupings are consistent over time or random parts of the data
Age
350
400
450
500
550
600
650
700
lt17
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31-32
33-35
36-40
41-45
46-50
51-55
56-60
61-65
66-70
71-75
75-80
81+0
5
10
15
20
25
30
35ExposureGroupedUngrouped
- View of consistency without groupings
- View of consistency with groupings
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Build ModelsGroup Rating Levels
Goodness of fit tests (e.g., Chi-Squared) can be used to determine the explanatory power of a variable
Chi-Squared
- Null hypothesis is that the models with and without the grouping are the same
Model With Without
Deviance 8,906.4414 8,909.6226 Degrees of Freedom 18,469 18,475 Scale Parameter 0.4822 0.4823
Chi Square Test 78.6%
More Complex: Without GroupingReject<5%
Simpler: With GroupingAccept>30%
??????5%-30%
Indicated ModelH0Score
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
18
CAS Predictive Modeling
Oct 2006
Building the “Best” Model
Modeling is an iterative process
Variate: can the signal for a given variable be represented well by a curve?
Simplify
- Exclude
- Group
- Variate Complicate
- Include
- Interactions
Review Model• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Build ModelsAdd Variates
Curves can be applied to continuous variables, but not categorical variables- Continuous variables have a numerical relationship
between the different levels
Latitude/longitudeTerritoryGeography
Premium changeGenderRetention
IncomeOccupationCommercial Lines
Age of DriverVehicle UsageAuto
Amount of InsuranceType of HO AlarmHomeowners
ContinuousCategorical
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
19
CAS Predictive Modeling
Oct 2006
Build ModelsAdd Variates
Cost of Car
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220
5
10
15
20
25
30
View parameters and standard errors for sensibility of variate
- Standard error of parameter differences can identify smooth progression of parameters
- Variates can be very helpful at smoothing out non-sensible results
Vehicle Group (1)
Vehicle Group (2)
Vehicle Group (3)
Vehicle Group (4)
Vehicle Group (5)
Vehicle Group (6)
Vehicle Group (7)
Vehicle Group (8)
Vehicle Group (9)
Vehicle Group (10)
Vehicle Group (1)
Vehicle Group (2) 52.9
Vehicle Group (3) 74.8 88.5
Vehicle Group (4) 93.6 59.0 133.8
Vehicle Group (5) 123.8 22.4 21.0 20.6
Vehicle Group (6) 86.9 19.8 17.5 16.5 123.1
Vehicle Group (7) 129.3 22.4 20.8 20.0 1,051.2 105.6
Vehicle Group (8) 61.8 16.5 13.0 10.9 46.2 76.9 41.1
Vehicle Group (9) 56.6 16.0 12.8 10.9 39.9 59.0 35.9 170.1
Vehicle Group (10) 42.4 14.7 12.2 11.1 27.6 33.6 25.8 43.3 55.5
Vehicle Group (11) 34.3 13.2 11.0 10.0 21.0 23.9 19.9 26.9 31.1 76.6
Vehicle Group (12) 20.1 9.4 7.5 6.7 10.7 11.2 10.2 10.8 11.6 16.7
Vehicle Group (13) 23.0 9.9 7.5 6.5 11.4 12.0 10.8 11.3 12.5 20.3
Vehicle Group (14) 15.9 7.7 5.7 4.8 7.5 7.5 7.0 6.7 7.2 10.2
Vehicle Group (15) 24.3 10.0 7.3 5.9 11.3 11.8 10.5 10.4 11.7 21.2
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Build ModelsAdd VariatesCheck consistency of curve over time or random parts of the dataset
Vehicle Group
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Vehicle Group
0
10
20
30
40
50
60
70
80
90
100Time (1996) ExposureTime (1995) ExposureTime (1994) ExposureTime (1996)Time (1995)Time (1994)
Vehicle Group
100
300
500
700
900
1100
1300
1500
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220
5
10
15
20
25
30Exposure3rd Degree CurveUnsimplified
- Check to see the consistency of that curve fit to different parts of the data
- After choosing the curve fits the data
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
20
CAS Predictive Modeling
Oct 2006
Build ModelsAdd Variates
Model No Curve Curve
Deviance 8,906.4460 9,020.2270 Degrees of Freedom 18,469 18,487 Scale Parameter 0.4822 0.4879
Chi Square Test 0.0%
Goodness of fit tests (e.g., Chi-Squared) can be used to determine the appropriateness of a variate
Chi-Squared
- Null hypothesis is that the models with and without the variate are the same
More Complex: No CurveReject<5%
Simpler: With CurveAccept>30%
??????5%-30%
Indicated ModelH0Score
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Build ModelsAdd Variates
Variates tend not to perform as well with regards to Type III testingIf variates are not fitting the data well, the modeler can increase the responsiveness- Increase the power of
the variate- Create multiple variates- Use combination of
groupings and variates- Fit splines
Rescaled Predicted Values - Policyholder Age
0.5
0.7
0.9
1.1
1.3
1.5
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+0
2
4
6
8
10
12
14
16
18
Rescaled Predicted Values - Policyholder Age
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+0
2
4
6
8
10
12
14
16
18
3rd degree variatedoes not fit well
Using two variatesimproves fit, but still some serious issues
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
21
CAS Predictive Modeling
Oct 2006
Building the “Best” Model
Modeling is an iterative process
Interaction: does the effect of one variable vary by level of another variable?
Simplify
- Exclude
- Group
- Variate Complicate
- Include
- Interaction
Review Model• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Predicted Values
0.500
0.700
0.900
1.100
1.300
1.500
1.700
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age
Policyholder Sex (Male)
Policyholder Sex (Female)
Build ModelsInclude Interactions
Predicted Values
0.500
1.000
1.500
2.000
2.500
3.000
3.500
4.000
4.500
5.000
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age
Policyholder Sex (Male)
Policyholder Sex (Female)
Full Interaction Model:
Age + Gender + Age.Gender
Relationship between males and females changes at different ages.
Simple Model: Age + Gender
Relationship between males and females is a constant at each age.
Relationship of between levels of 1 variable may vary by different levels of another variable (e.g., response correlation)
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
22
CAS Predictive Modeling
Oct 2006
Build ModelsIdentify Potential Interactions
Actual Frequencies (Gender x Vehicle Age)
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15+
Vehicle Age
0
10
20
30
40
50
60
70
80
90Female ExpMale ExpFemaleMale
- Actual frequencies support relationship between male and female is basically constant for each vehicle age
Patterns of actual results will highlight potential interactions
Actual Frequencies: Age by Gender
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0.0009
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39
40-44
45-49
50-54
55-59
60-64
65-69
70+
Policyholder Age
0
10
20
30
40
50Female Exp
Male Exp
Female
Male- Actual frequencies show relationship between male and female is very different for youthfuls and adults
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Predicted Values: Female by Age
-0.0001
0.0001
0.0003
0.0005
0.0007
0.0009
0.0011
0.0013
Policyholder Age0
5
10
15
20
25ExposureFemale+ SE- SE
Predicted Values: Male by Age
-0.0001
0.0001
0.0003
0.0005
0.0007
0.0009
0.0011
0.0013
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39
40-44
45-49
50-54
55-59
60-64
65-69
70+
Policyholder Age
0
5
10
15
20
25
30
35ExposureMale+ SE- SE
Build ModelsInclude Interactions
Interaction Term ValueStandard
ErrorStandard Error (%) Weight
Female.16 -1.0235 0.78776 77.0 13,761Female.17 -0.6174 0.24463 39.6 185,915Female.18 -0.3981 0.11267 28.3 739,500Female.19 -0.3382 0.07265 21.5 2,362,139Female.20 -0.2112 0.06333 30.0 4,081,775Female.21 -0.1384 0.05947 43.0 5,163,074Female.22 -0.1467 0.05704 38.9 6,055,119Female.23 -0.0782 0.05703 73.0 6,763,300Female.24 -0.1536 0.05706 37.1 6,300,270Female.25 -0.0972 0.05906 60.7 4,927,417Female.26 -0.0431 0.06031 139.9 4,269,244Female.27 0.0544 0.06364 116.9 3,672,472Female.28 -0.0727 0.06477 89.1 3,438,810Female.29 -0.0483 0.06761 140.0 2,970,306Female.30 -0.0254 0.06693 263.3 3,027,278Female.31 -0.0318 0.06849 215.1 2,724,535Female.32 0.0033 0.07270 2,175.0 2,329,283Female.33-35 -0.1597 0.07709 48.3 1,967,739Female.36-39 -0.0376 0.07947 211.3 1,670,130Female.40-44 0.0467 0.05185 111.1 6,166,191Female.45-49 0.0297 0.05174 174.3 6,877,522Female.50-54 0.0325 0.05973 183.8 3,957,251Female.55-59 -0.0264 0.07412 281.0 1,998,839Female.60-64 0.0228 0.09824 431.3 959,502Female.65-69 -0.0168 0.13252 787.8 528,632Female.70+ 0.1593 0.12038 75.6 602,694
View parameters and standard errors
- Graphically- In tabular format
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
23
CAS Predictive Modeling
Oct 2006
Build ModelsInclude Interactions
Age by Gender (Most Recent Year)
0
0.2
0.4
0.6
0.8
1
1.2
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+
P o lic yho lde r A ge
0
5
10
15
20
P o licyho lder Sex (Female) Expo sure
P o licyho lder Sex (M ale) Expo sure
P o licyho lder Sex (Female)
P o licyho lder Sex (M ale)
Age by Gender
0.1
0.3
0.5
0.7
0.9
1.1
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39
40-44
45-49
50-54
55-59
60-64
65-69
70+
P o licyho lder A ge
0
10
20
30
40
50
Policyholder Sex (Female) Exposure
Policyholder Sex (M ale) Exposure
Policyholder Sex (Female)
Policyholder Sex (M ale)
Age by Gender (Oldest Year)
0
0.2
0.4
0.6
0.8
1
1.2
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39
40-44
45-49
50-54
55-59
60-64
65-69
70+
P o licyho lder A ge
0
5
10
15
20
25
30
35
Explore if “indicated” interaction is consistent over time or random parts of the data
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Build ModelsInclude Interactions
More Complex: With InteractionReject<5%
Simpler: Without InteractionAccept>30%
??????5%-30%
Indicated ModelH0Score
Goodness of fit tests (e.g., Chi-Squared) can be used to determine the explanatory power of an interaction
Chi-Squared
- Null hypothesis is that the models with and without the interaction are the same
Model Simple Model W/ Interaction
Deviance 224,667.0000 224,771.0000 Degrees of Freedom 83 109 Scale Parameter 1.1615 1.1655
Chi Square Test 0.0%
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
24
CAS Predictive Modeling
Oct 2006
Rescaled Predicted Values [LossYear (1996)]
0
0.2
0.4
0.6
0.8
1
1.2
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39
40-44
45-49
50-54
55-59
60-64
65-69
70+
P o licyho lder A ge
0
5
10
15
20
25
30
35
Build ModelsSimplify Interactions
Age and Gender (Male at Base)
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+
P o licyho lder A ge
0
10
20
30
40
50
Age and Gender Predicted Values
0.00005
0.00015
0.00025
0.00035
0.00045
0.00055
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+
P o licyho lder A ge
0
10
20
30
40
50
Predicted Values
0
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39
40-44
45-49
50-54
55-59
60-64
65-69
70+
P o licyho lder A ge
0
10
20
30
40
50
Complex relationships can be simplified using curves, groups, etc.
Curve
CurveAges
Grouped
Males same as Females
Male/Female Relativity
same for 21-24
Male/Female Relativity
varies by Age
- Simplify the age curve (i.e., male curve since male is base level)
- Simplify the relationship between males and females
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Building the “Best” Model
Modeling is an iterative process
Once models have been built, essential to validate the models
Simplify
- Exclude
- Group
- Variate Complicate
- Include
- Interaction
Review Model• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
25
CAS Predictive Modeling
Oct 2006
Validate ModelResidual Analysis
Re-check residuals to ensure appropriate shape
Gammar Error/Log Link (Studentized Standardized Deviance Residuals)
-8
-6
-4
-2
0
2
4
6
8
10
12
14
16
10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5Transformed Fitted Value
> 1
> 2
> 4
> 6
> 8
> 9
> 11
> 13
> 15
> 16
> 18
> 20
> 22
> 24
> 25
> 27
> 31
Studentized Standardized Deviance Residuals by Policyholder Age
-6
-4
-2
0
2
4
6
8
10
lt 17 17 18 19 20 21 22 23 24 25 26 27 28 29 30
31-32
33-35
36-40
41-45
46-50
51-55
56-60
61-65
66-70
71-75
75-80 81
+
- Is the contour plot symmetric?
- Does the Box-Whisker show symmetry across levels?
- Are fitted results reasonable?
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006
Validate ModelGains Curves
Compare predictiveness of models
Gains Curve
161.85
5,161.85
10,161.85
15,161.85
20,161.85
25,161.85
30,161.85
35,161.85
40,161.85
345,262 20,345,262 40,345,262 60,345,262 80,345,262 100,345,262 120,345,262 140,345,262 160,345,262 180,345,262
Cumulative Exposure
Cum
ulat
ive
Fitte
d V
alue
Reference
Model 1Gini coefficient = 0.1607Model 2Gini coefficient = 0.1226
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
26
CAS Predictive Modeling
Oct 2006
Validate ModelHold-out Samples
Hold-out samples are effective at validating model- Determine estimates based on part of dataset- Uses estimates to predict other part of dataset
Data Split Data
Train Data
Build Models
ComparePredictions to Actuals
Test Data
Test/Training
Predictions should be close to actuals for populated cells
Larger companies may consider 3 splits1. Build models2. Fit parameters3. Validate
models/parametersSmaller companies may consider a sampling approach
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
CAS Predictive Modeling
Oct 2006Combine Predictive Models
CW Historical Data
Coverage/COLClaim Counts
Exposures Characteristics
Coverage/COLLoss $ Claim
Counts Characteristics
Frequency Models
By Coverage/COL
Severity Models
By Coverage/COL
CW Predictive Models
Once signal determined, can implement business restrictions- Split variables into rating and underwriting- Incorporate parameter restrictions (e.g., cap relativities)- Incorporate structural restrictions (e.g., convert to mixed
additive/multiplicative structure)
ModeledPure Premiums
By Coverage/COL
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
27
CAS Predictive Modeling
Oct 2006Summary
GLMs can be a powerful tool modeling tool with significant advantages over traditional techniquesRegardless of what is being modeled, the goal is to remove the “noise” and find the “signal” in the dataWhen modeling risk, it is ideal to- Model frequency and severity separately - Model by coverage or cause of loss - Use all available data and worry about constraints later
Modeling is a multi-step iterative process requiring the modeler to use statistical and practical tests and apply judgment
• Background
• Modeling Steps
1. Get Data
2. Iniitial Sels
3. Test Error/Link
4. Preliminary Investigation
5. Build Models
6. Validate Models
• Summary
• Overall Strategy
7. Combine Models
7Combine
Models
6Validate Models
5Build Model
Structure
4Prelim Invest
1 Get Clean Data
2Make Initial
Selections
3Test Error Structure
& Link Function
CAS Predictive Modeling
Oct 2006
GLM III will cover:
Testing the link function
The Tweedie distribution
Splines-theory and practice
Reference models
Aliasing/near-aliasing
Combining models across claim types
Restricted models
Model validation
Thanks for coming, if you would like a copy of these slides:Give me your name/email after the session
Call me at 210.826.2878
Email me at [email protected]