CAS Predictive Modeling Seminar Evaluating Predictive Models

20
CAS Predictive Modeling Seminar Evaluating Predictive Models Glenn Meyers ISO Innovative Analytics October 5, 2006

description

CAS Predictive Modeling Seminar Evaluating Predictive Models. Glenn Meyers ISO Innovative Analytics October 5, 2006. Choosing Models. Predicting losses for individual insurance policies involves: Millions of policy records Hundreds (or thousands) of variables - PowerPoint PPT Presentation

Transcript of CAS Predictive Modeling Seminar Evaluating Predictive Models

Page 1: CAS Predictive Modeling Seminar Evaluating Predictive Models

CAS Predictive Modeling Seminar

Evaluating Predictive Models

Glenn MeyersISO Innovative Analytics

October 5, 2006

Page 2: CAS Predictive Modeling Seminar Evaluating Predictive Models

Choosing Models

• Predicting losses for individual insurance policies involves:– Millions of policy records– Hundreds (or thousands) of variables

• There are a number of models that provide good predictions– GLM, GAM, CART, MARS, Neural Nets, etc.

• Business objectives influence choice of model

Page 3: CAS Predictive Modeling Seminar Evaluating Predictive Models

The Modeling Process

• Modeling process involves dimension reduction techniques– Clustering, Principal Components, Factor

Analysis– Building submodels and using predicted

values as input into a higher level model

• The modeling cycle– 1. Build model with training data– 2. Evaluate model with test data– 3. Identify improvements in models and data– 4. Go back to Step 1

Page 4: CAS Predictive Modeling Seminar Evaluating Predictive Models

Hidden Parameters

• Classic model building methods correct for the number of parameters using “degrees of freedom.”

• The model exploration process “eats up degrees of freedom” in ways that cannot be captured by formal model adjustments.

• In essence the “test” data gets merged into the “training” data.

Page 5: CAS Predictive Modeling Seminar Evaluating Predictive Models

What Is Significant?

• Statistical packages will often identify improvements that are “statistically significant” but not “practically significant.”

• This talk is about determining when a model identifies “practically significant” improvements.

• Illustrate how to do this on a real example.

Page 6: CAS Predictive Modeling Seminar Evaluating Predictive Models

The ExampleA Personal Auto Model Under Development

Preliminary Results• Input – Address of insured vehicle• Output – Address Specific Loss Cost

– 30 year old, single car with no SDIP points– 500 deductible or 25/50/25 policy limits– Symbol 8, model year 2006– etc.

• Model derived from over 1,200 variables reflecting weather, traffic, demographic, topographical and economic conditions.

Page 7: CAS Predictive Modeling Seminar Evaluating Predictive Models

Difference Between

Address Specific and ISO Territory Loss Cost

Page 8: CAS Predictive Modeling Seminar Evaluating Predictive Models

Differences AboundSome Questions to Ask

• Can the model output be used to improve insurer underwriting results?

• Are the results statistically significant?

Define ELI

Address Specific Loss CostExpected Loss Index

ISO Territory Loss Cost

Page 9: CAS Predictive Modeling Seminar Evaluating Predictive Models

Use Expected Loss Index for Risk Selection

Expected Loss Index Loss Ratio %Less than 75% 69.7Between 75 and 100% 85.8Between 100 and 125% 109.7Greater than 125% 159.5

Denominator = Full ISO Loss Cost

Page 10: CAS Predictive Modeling Seminar Evaluating Predictive Models

Propose a Standard Way of Evaluating Lift – The Gini Index

• Originally proposed by Corrado Gini in 1912

• Most often used to measure income and/or wealth inequality– Search for “Gini” in wikipedia.org

• In insurance underwriting, we want to evaluate systematic methods of finding “loss” inequality.

Page 11: CAS Predictive Modeling Seminar Evaluating Predictive Models

Gini Index

• Look at set of policy records below cutoff point, ELI < 1.

• This set of records accounts for 59% of total ISO (full) loss cost.

• This set of records accounts for 48% of total loss.

• 1 − 48/59 → 19% reduction in loss ratio.

Page 12: CAS Predictive Modeling Seminar Evaluating Predictive Models

Gini Index

• Do this calculation for other cutoff points.

• The results make up the what we call the Lorenz Curve

Page 13: CAS Predictive Modeling Seminar Evaluating Predictive Models

Gini Index

• If ELI is random, the Lorenz curve will be on the diagonal line.

• The Gini index is the percentage of the area under the “random” line that is above the Lorenz curve.

• Higher Gini means better predictive model.

Page 14: CAS Predictive Modeling Seminar Evaluating Predictive Models

A Gini Index Thought Experiment

• If we had the ability to predict who will have losses, what would the Gini index be?

• It would be 100% if only one risk had all the losses

Page 15: CAS Predictive Modeling Seminar Evaluating Predictive Models

Bodily Injury

Page 16: CAS Predictive Modeling Seminar Evaluating Predictive Models

Property Damage

Page 17: CAS Predictive Modeling Seminar Evaluating Predictive Models

Collision

Page 18: CAS Predictive Modeling Seminar Evaluating Predictive Models

Statistical Significance

• How much random fluctuation is in the Gini index calculation?

• Use bootstrapping to evaluate– Take a random sample of records, with

replacement.– Calculate Gini index for the sample.– Repeat 250 times.

• Plot a histogram of the results.

Page 19: CAS Predictive Modeling Seminar Evaluating Predictive Models

Bootstrap Results

Page 20: CAS Predictive Modeling Seminar Evaluating Predictive Models

Summary

• Standard tests of statistical significance are suspect.

– Informal model selection process– Statistical/Practical significance

• Propose Gini index as a test of practical significance.

• Divide data into three samples1. Training – Used to fit models2. Test – Used to evaluate fits3. Holdout – “Final” evaluation

R2