Cutting Edge Tools for Pricing and Underwriting Seminar · 2013-04-02 · 4/2/2013 3 Background...
Transcript of Cutting Edge Tools for Pricing and Underwriting Seminar · 2013-04-02 · 4/2/2013 3 Background...
4/2/2013
1
Cutting Edge Tools for Pricing and Underwriting SeminarNeedles in Haystacks – Reduction Techniques to Find
© 2011 Towers Watson. All rights reserved.
y qInformation in Data
Casualty Actuarial Society
by Serhat Guven
Fall 2011
Agenda
Background
Main Effects
Interactions
Residual Analysis
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
2
Other Alternatives
4/2/2013
2
Background
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
3
Background
Goals of predictive modeling
Produce a sensible model that explains recent historical experience and is likely to be predictive of future experience
Overall mean Best Models One parameter per
Response Variable
Systematic Component
Unsystematic Component= +
Signal: Function of the Rating Factors/Predictors
Noise: Reflects stochastic process
1. Separate the random components from the systematic components of the estimator
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Underfit:Predictive
Poor explanatory power
Overfit:Poor predictive power
Explains history
Overall mean observation2. Balance predictive power and explanatory effects
4
Model Complexity (number of parameters)
4/2/2013
3
Background
Goals of predictive modeling
Predict a response variable using a series of explanatory variables
Predictive Model
Response variablesL
Explanatory variablesAge AccidentsLimit ConvictionsRegion Credit score
ParametersValidation Statistics
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
5
LossesClaims
Retention
Larger data storage capabilities and greater access to external data enhances ability to identify the right rate for the risk
Many external factors have been found to be predictive of frequency and/or loss severity. Here are a few for auto…
Background
Major Convictions
Territory
Traffic Density
Garaged
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Age Minor Convictions
Traffic Density
Gender
Method of Payment
4/2/2013
4
Background
Component vs. Combined Modeling
Raw loss ratio modeling
Raw pure premium modeling
Standard risk collective
COMPONENT MODELS
Frequency
S it
COMBINE
Frequency Severityx
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
7
Severity
Poisson/ Negative Binomial
Gamma
Background
Component vs. Combined Modeling
Alternate collective
]
COMPONENT MODELS
Event Frequency
Severity
Coverage Propensity
Poisson/ Negative
Binomial
COMBINE
Event Frequency
Coverage Propensity Severityx x
Gamma
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
8
gBinomial
Challenge is dealing with low frequency coverages/perils
Further alternatives are to replace frequency concepts with probability
4/2/2013
5
Background
Component vs. Combined Modeling
Further alternatives replace frequency with probability
]
COMPONENT MODELS
Event Probability
Severity
Coverage Propensity
Binomial Binomial
COMBINE
Event Probability
Coverage Propensity Severityx x
Gamma
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
9
Moving away from defining a variance function
Background
Modeling is an iterative process
How does the analyst decide which factors are most
Review Modelwhich factors are most
valuable?
Parameters/standard errors
Consistency of patterns over time or random data sets
Type III statistical tests (e.g., chi-square tests)
J d t ( d th t dComplicateSimplify
Model
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
10
Judgment (e.g., do the trends make sense)
Focus of the section is on analysis of data NOT gathering and collecting
y
4/2/2013
6
Background
Greater availability of data requires a more efficient approach to analysis and selection
Brute force still has value Brute force still has value
Aligning with statistical theory – working smarter NOT harder
Different strategies employed
Main effects
Interactions
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Residual Analysis – there is still signal left
Other Alternatives11
Main Effects Identification
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
12
4/2/2013
7
Main Effects Identification
Classical Mining
Useful source of information
G d f P tt R iti Good for Pattern Recognition
Quick and easy
Methodological flaws
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
13
Methodological flaws
Limited by data quantity
No statistical framework
Studying distributional biases
Identify potential duplication of predictors
Main Effects Identification
predictors
Identify potential aliasing
Issues
Curse of dimensionality
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
14
Curse of dimensionality
Defining correlation
4/2/2013
8
Principle Components
Transform correlated predictors into primary effects
Main Effects Identification
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
15
Issues
Interpretability
Categorical Effects
Main Effects Identification
Stepwise Regression
Forward regression
Iteration 1 Iteration 2 … Model Stable
Base Line A A + M A + M + C + …
A + B A + M + B
A + C A + M + C
A + D A + M + D
… …
Selection M C NONE
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
16
Useful search routine when dealing with large number of related factors
4/2/2013
9
Main Effects Identification
Stepwise Regression
Backward Elimination
Iteration 1 Iteration 2 … Model Stable
Base Line A + B + C + D A + B + D A + D
A + C + D A + B
A + B + D A + D
A + B + C
Removal C B NONE
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
17
More appropriate to judge variable importance of a more complete model
Full main effects models are difficult to fit
Main Effects Identification
Stepwise Regression
Adaptive Regression
Iteration 1 Iteration 2
Forward Backward Forward Backward
Base Line A A + M A + M A + M + C
A + B A A + M + B A + M
A + C A + M + C A + C
A + D A + M + D
… …
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
18
Full set of permutations developed
Computationally intensive
Selection M NONE C NONE
4/2/2013
10
Main Effects Identification
Stepwise Regression Issues
Computationally intensive –
N d t h d d f t ft t li h i i f l— Need strong hardware and fast software to accomplish in a meaningful way
Selection of test for statistical significance
— Chi and F
– Violating normality assumptions
– Tendency to overfit— AIC Family
– Parameter penalization concerns
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
– Produce factor selection similar to that of a decision tree
Selected factors need more rigorous testing
19
Main Effects Identification
Balance
Aggregate average observed vs. aggregate average fitted
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
20
Tricky to deal with distributional biases
Often requires more rigor for confirmation
4/2/2013
11
Main Effects Identification
Decision Trees
Factor selection
F t ti Factor creation
Model localization
Node 4PREMIUM_INVITED
W=17351000
Node 6WI_AVG_123_RATIO
W=109087000
Node 3CONTENTS_HISTORY$
W = 126438.000N = 126438
Node 9PAYMENT_METHOD2$
W=85669 000
Node 11PREMIUM_INVITED
W=22202000
Node 8PREMIUM_RATIOW = 107871.000
N = 107871
Node 2AA_POL_DURW = 234309.000
N = 234309
TerminalNode 13
W = 233048.000
Node 15AA_POL_DURW=72394 000
Node 17AA_POL_DURW=26347000
Node 14PREMIUM_RATIOW = 98741.000
N = 98741
Node 13PREMIUM_INVITEDW = 331789.000
N = 331789
Node 1AA_POL_DURW = 566098.000
N = 566098
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
21
TerminalNode 1
W = 6626.000
TerminalNode 2
W = 5208.000
TerminalNode 3
W = 5517.000
Node 5WI_AVG_123_RATIO
W = 10725.000N = 10725
W = 17351.000N = 17351
TerminalNode 4
W = 1698.000
TerminalNode 5
W = 44999.000
Node 7VEHICLE_AGE_RN
W = 46697.000N = 46697
TerminalNode 6
W = 62390.000
W = 109087.000N = 109087
TerminalNode 7
W = 33234.000
TerminalNode 8
W = 8092.000
TerminalNode 9
W = 44343.000
Node 10CONTENTS_HISTORY$
W = 52435.000N = 52435
W = 85669.000N = 85669
TerminalNode 10
W = 1317.000
TerminalNode 11
W = 5311.000
Node 12PAYMENT_METHOD2$
W = 6628.000N = 6628
TerminalNode 12
W = 15574.000
W = 22202.000N = 22202
TerminalNode 14
W = 16540.000
TerminalNode 15
W = 24985.000
Node 16PAYMENT_METHOD2$
W = 41525.000N = 41525
TerminalNode 16
W = 30869.000
W = 72394.000N = 72394
TerminalNode 17
W = 13616.000
TerminalNode 18
W = 7197.000
TerminalNode 19
W = 5534.000
Node 18PREMIUM_RATIOW = 12731.000
N = 12731
W = 26347.000N = 26347
Main Effects Identification
Decision Trees Issues
Tendency to over fit
P f ‘b tt ’ l ifi ti t th th i t Performs ‘better’ as a classification tree rather than a regression tree
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
22
4/2/2013
12
Even after items have been identified additional testing is needed
Parameter/Standard Errors Time Testing
Main Effects Identification
Parameter/Standard Errors Time Testing
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Type III
Model With Without Deviance 8,906.44 8,909.62 Degrees of Freedom 18,469.00 18,475.00 Scale Parameter 0.48 0.48
Chi-Square Test 0.79
23
Interaction Identification
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
24
4/2/2013
13
Interaction Identification
Stepwise Regression
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
25
Need to specify direction, test, and stopping conditions
Computationally inefficient
Balance
Approach is to compare aggregate average observed values vs. t fitt d l
Interaction Identification
aggregate average fitted values
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Imbalance suggests need for interaction
4/2/2013
14
Balance
Observed valueClaims
Interaction Identification
β)h(xμy iii
i
ii Exposures
Claimsy
Fitted value (assumes simple model structure)
Weighted average observed value and weighted average fitted values for Class k
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
kii
kiii
k Exposures
Exposures x yA
kii
kiii
k Exposures
Exposures x yE
ˆˆ
Balance
For each combination of cells for two rating factors derive the following:
Interaction Identification
Male Female
Youthful DYM DYF
Adult DAM DAF
Mature DMM DMF
Seniors DSM DSF
Such that: 2kkk
k
EA x ExposuresD
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
kk E
D
Then:
Age Gender
kDQ
Follows a chi squared distribution with (n-1)*(m-1) degrees of freedom
4/2/2013
15
Interaction Identification
Balance
Chi squared test then run for every two way combination
Framework allows for ranking of potential constructs
Rank Factor 1 Factor 2 Chi Test1 Driving Restriction Age 0.00002 Age Gender 0.00003 Driving Restriction NCD 0.00004 NCD Gender 0.00005 NCD Age 0.00016 Protected NCD Gender 0.00027 Driving Restriction Gender 0.00048 Driving Restriction Protected NCD 0.00069 L Y D i i R t i ti 0 0008
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
9 LossYear Driving Restriction 0.000810 LossYear Gender 0.013211 Vehicle Age NCD 0.017612 Driving Restriction Vehicle Category 0.019513 LossYear Protected NCD 0.042514 Vehicle Category Gender 0.067015 Rating Area Gender 0.1185… … … …
Interaction Identification
Balance
Advantages
Can quickly identify areas in the model where interactions are needed
“Exponential” effect of distributional biases can magnify the importance of one interaction structure vs. another
Disadvantages
Sensitive to noise of severity
Li it d id t i lifi ti
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Limited guidance as to simplification
4/2/2013
16
Decision Trees
What would an interaction look like in a decision tree?
Interaction Identification
Splitter on factor A
Splitter on factor B
Splitter on factor B
Splitter on factor C
Splitter on factor C
Splitter on factor C
Splitter on factor C
Splitter on factor A
Splitter on factor B
Splitter on factor C
Splitter on factor D
Splitter on factor G
Splitter on factor F
Splitter on factor E
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
The left hand tree has similar structures down each branch and so is unlikely to indicate interactions
The right hand tree has different structures depending on which branch is traversed. This might have interactions.
Decision Trees
Will not guarantee interactions
Interaction Identification
Provides additional clues as to what interactions to test.
This tree may have the following interactions
A x B, A x C, A x D, A x E, A x F,
A x G, B x D, B x E, C x F, C x G,
A x B x D, A x B x E,
Splitter on factor A
Splitter on factor B
Splitter on factor C
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
A x B x D, A x B x E,
A x C x F, A x C x G
The list of candidate interactions can grow quickly!
Splitter on factor D
Splitter on factor G
Splitter on factor F
Splitter on factor E
4/2/2013
17
Decision Trees
Case study: auto renewals model
Interaction Identification
Balance test identified 23 interactions
Tree identified cross holdings and tenure interaction not from the balance test
Node 4 Node 6
Node 3CONTENTS_HISTORY$
W = 126438.000N = 126438
Node 9 Node 11
Node 8PREMIUM_RATIOW = 107871.000
N = 107871
Node 2AA_POL_DURW = 234309.000
N = 234309
TerminalNode 13
W = 233048.000
Node 15 Node 17
Node 14PREMIUM_RATIOW = 98741.000
N = 98741
Node 13PREMIUM_INVITEDW = 331789.000
N = 331789
Node 1AA_POL_DURW = 566098.000
N = 566098
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
TerminalNode 1
W = 6626.000
TerminalNode 2
W = 5208.000
TerminalNode 3
W = 5517.000
Node 5WI_AVG_123_RATIO
W = 10725.000N = 10725
Node 4PREMIUM_INVITED
W = 17351.000N = 17351
TerminalNode 4
W = 1698.000
TerminalNode 5
W = 44999.000
Node 7VEHICLE_AGE_RN
W = 46697.000N = 46697
TerminalNode 6
W = 62390.000
Node 6WI_AVG_123_RATIO
W = 109087.000N = 109087
TerminalNode 7
W = 33234.000
TerminalNode 8
W = 8092.000
TerminalNode 9
W = 44343.000
Node 10CONTENTS_HISTORY$
W = 52435.000N = 52435
Node 9PAYMENT_METHOD2$
W = 85669.000N = 85669
TerminalNode 10
W = 1317.000
TerminalNode 11
W = 5311.000
Node 12PAYMENT_METHOD2$
W = 6628.000N = 6628
TerminalNode 12
W = 15574.000
Node 11PREMIUM_INVITED
W = 22202.000N = 22202
TerminalNode 14
W = 16540.000
TerminalNode 15
W = 24985.000
Node 16PAYMENT_METHOD2$
W = 41525.000N = 41525
TerminalNode 16
W = 30869.000
Node 15AA_POL_DURW = 72394.000
N = 72394
TerminalNode 17
W = 13616.000
TerminalNode 18
W = 7197.000
TerminalNode 19
W = 5534.000
Node 18PREMIUM_RATIOW = 12731.000
N = 12731
Node 17AA_POL_DURW = 26347.000
N = 26347
Decision Trees
Advantages
Interaction Identification
Quickly identify potential n-way interactions
Suggests areas of localization
Disadvantages
Growth in complexity
Better performance when response is structured as a discrete (i.e. binomial/multinomial) construct
Li it d id t i lifi ti
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Limited guidance as to simplification
4/2/2013
18
Saddles
Quadrant saddle: revisiting an simple main effect model
Interaction Identification
400
600
800
1000
1200
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
1 3 5 7 9
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
S1
S5
S9
S13
S17
S21
S25
S29
S33
S37
-400
-200
0
200
Interaction Identification
Saddles
Quadrant saddle: interaction terms twist the paper
-200
0
200
400
600
800
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
1 3 5 7 9
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
S1
S5
S9
S13
S17
S21
S25
S29
S33
S37
-800
-600
-400
4/2/2013
19
Interaction Identification
Transforming predictors into single parameter variatesR
espo
nse
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Factor Levels
Interaction Identification
Transforming predictors into single parameter variates
Res
pons
e
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Factor Levels
4/2/2013
20
Interaction Identification
Case study: vehicle value x rating area
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Interaction Identification
Full interaction is very noisy
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
4/2/2013
21
Interaction Identification
Different quadrants to be tested
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Interaction Identification
Focus on higher valued vehicles in certain areas
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
4/2/2013
22
Interaction Identification
Systematically study different twists
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Interaction Identification
Systematically study different twists
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
4/2/2013
23
Interaction Identification
Systematically study different twists
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Interaction Identification
Systematically study different twists
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
4/2/2013
24
Interaction Identification
Saddles
Advantages
Identifies and simplifies interactions via the transformation
Useful to find local 3, 4, and 5 way interactions of frequency data
Useful in finding interactions on severity data
Allow for interactions on high dimensional factors
Disadvantages
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Difficulties in dealing with interactions where main effects are not part of the model
Difficulties in dealing with interactions with low dimensionality
Residual Analysis
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
48
4/2/2013
25
Parametric Signal Non parametric Analysis
Residual Analysis
General approach
Factor A Factor B Factor C Factor E Factor F
Parametric SignalLeftover
Signal
p yLow $ 1
1 4
4
5
1 0
1 1
1 2
1 3
6
7
8
9
2
3
Factor D
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
Expected Cost
High $1 71 81 9
2 0
1 5
1 6
Residual Analysis
Super factors
After a GLM model is constructed use decision trees to model the residuals to see if any pattern exists
If a pattern is discovered, go back to the model structure and incorporate the findings
Test to see if model structure was inadvertently over-simplified
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
50
4/2/2013
26
Residual Analysis
Principle of Locality
Risks that share similar characteristics should have similar experienceexperience
Adjacency Smoothing
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
51
Distance Adjacency
Residual Analysis
Case Study – Vehicle Similarity
Instead of using latitude/longitude to build relationships, use vehicle dimensionsvehicle dimensions
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
4/2/2013
27
Diagnostics:
Smoothing and diagnostics used to identify signal in the vehicle residual
Residual Analysis
QQ Plot: P-Values vs Uniform
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
Uniform
P-V
alu
es
P-Values Smoothing 1 (20)
P-Values Smoothing 2 (40)
P-Values Smoothing 3 (60)
P-Values Smoothing 4 (80)
P-Values Smoothing 5 (100)
P-Values Time Period Selector Node 1
Uniform Line
QQ plot on smoothed residual indicates potential signal
If no signal, then all estimated cdfs
would be above uniform cdf
Smoothing 4 (80)
2 2
2.4
200,000
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
0 2 4 6 8 10 12 14 16 18 20 22
Smoothing 4 (80) Band
No
de
We
igh
t
Consistency tests further support signal identification
Both time and data consistency
tests should be performed
Residual Analysis
Advantages
Identifies complex patterns that parametric model could not
C id i ifi t lift t d l t t Can provide significant lift to model structure
Disadvantages
Over fit potential is significant
Interpretability
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
4/2/2013
28
Other Alternatives
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
55
Other Alternatives
Noise Reduction
Use the data to solve for parameters
Sample
ModelData Input
Solve forP t
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
56
Sample Parameters
4/2/2013
29
Other Alternatives
Noise Reduction
Tested against a holdout
Sample
ModelData Input
Solve forP t Hold Out
Hold OutTest
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
57
Sample Parameters Hold Out
Practical Issues
What is a good fit?
What do you do if there is not a good fit
Other Alternatives
Noise Reduction
Tested against a holdout
Sample
ModelData Input
Hold Out
Hold OutTest
Circular
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
58
Sample Hold Out
Changes in model create an issue
CircularReference
4/2/2013
30
Other Alternatives
Noise Reduction
Use holdout WITHOUT using it
Sample
ModelData Input
Hold Out
Hold OutTest
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
59
Sample Hold Out
NoiseReduced
Model
Solve forParameters
• Non parametric
– Neural Networks,
– Genetic Algorithms
Other Alternatives
Genetic Algorithms,
– Decision Trees, etc.
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
60
4/2/2013
31
Summary
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
61
Summary
Many tools can be used to find signal
Patterns can be found with new factors or interactions between existing factors
Underfitting and Overfitting are two sides of the same problem
Statistics allow us to work smarter NOT harder
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
62
4/2/2013
32
Contact Details
Serhat Guven
Towers Watson
Senior Consultant
200 Concord Plaza Suite 420
San Antonio, TX 78216
210.826.2878
towerswatson.com© 2011 Towers Watson. All rights reserved. Proprietary and Confidential. For Towers Watson and Towers Watson client use only.
63