By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

28
CS697 By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Transcript of By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Page 1: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

CS697By

Torna Omar Soro

Modeling risk of accidents in the Property and Casualty Insurance

Industry.

Page 2: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

DEPENDENT VARIABLE: Number of claims (0, 1, 2)

ATTRIBUTES: 1. Total number of vehicles on a policy 2. Total number of drivers on a policy 3. Anti-theft 4. Driver with training (1 if driver has training, 0 otherwise) 5. Age of the oldest driver 6. Age of the youngest driver 7. Territory (driver’s location) (Take the Log)

◦ Cost of territory (numerical value) 8. Sdip: The Safe Driver Insurance Plan 9. Credit Score flag (1 if driver has credit score, 0 otherwise) 10. Credit score (numerical value) (take the Log) 11. Business Source (1 if it’s a book transfer, 0 walking) 12. Group Insurance flag ( 1 if driver is from a group insurance, 0 otherwise)

Features

Page 3: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Distribution of Claims

Page 4: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Dependent variable: Distribution

Page 5: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Where is the expected value (mean) of y. And An unusual property:

Count data: Poisson Regression

,!

)Pr(r

ery

r

,.....2,1,0r

)1).....(2)(1(! rrrr

)var()( yyE

0ikkii xxx

i e ....22110

ikkiii xxx ....log 22110

This model can be estimated by maximum likelihood.

,!

)|(i

yi

ii y

exyf

i

),(~ ii Poissony

Page 6: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Overdispersion ( often Var(y) > E(y) ) or underdispersion.

While overdispersion doesn’t bias the coefficients, it does lead to underestimates of the standard errors.

Overdispersion also implies that conventional MLE are not efficient.

One Reason for Overdispersion: Excess Zero

Poisson Regression: Issue

Page 7: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

The state wildlife biologists want to model how many fish are being caught by fishermen at a state park.

Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught.

Some visitors do not fish, but there is no data on whether a person fished or not.

Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.

Poisson Regression Issue: Excess Zero: Example 1:

Page 8: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Property and Casualty Insurance: The excess zeros may come from different sources :

1. Censor data create more zeros

2. Some drivers drive less or occasionally. They prefer taking public transportation.

Poisson Regression Issue: Excess Zero: Example 2:

Page 9: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

It’s a generalization of the Poisson Model Allows for correction of overdispersion

A disturbance term is included in the model which accounts for the overdispersion.

has a standard gamma distribution is a constant (Poisson: )

Solution 1: Negative Bionomial

,!

)Pr(r

ery

r

)(~ ii Poissony

iikkiii xxx ....log 22110

)exp( i 0

Page 10: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Alternate Response to modeling Overdispersion

Some zeros result from fishing and not catching any fish. In the case of insurance, the Zeros may result from driving and not causing accidents.

Some zeros result from not fishing at all. In the case of insurance some zeros result from not driving a lot. Censor data may also result in more zeros.

Zero-inflated models allow one to model each process separately.

Solution 2: Zero-Inflated Poisson (zip)

Page 11: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

the zip model has two parts: a Poisson count model and the Logit model for predicting excess zeros.

Solution 2: Zero-Inflated Poisson

0~iy With probability iq

)(~ ii Poissony With probability iq1

iX

A logit model is used with a count model

Page 12: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Solution 2: Zero-Inflated Poisson

Page 13: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Solution 2: ZIP (SUMMARY)

Page 14: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Poisson Results

Page 15: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

We have a large degree of freedom (DF) relative to the deviance: 55595>19119: Underdispersion(less variation in the model)Deviance = 19119 and DF = 55595

Overestimate of standard errors

MLE not efficient

Inadequate fit of the Poisson Model

Estimation of negative binomial and Zero inflated poisson

Poisson Results:

Page 16: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Negative Binomial

Page 17: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Quality of fit DEVIANCE AIC (Akaike Information

Criteria):

Poisson 19119 25703

Negative Bionomial (NB) 16222 25581

Compare Poisson and NB

AIC (NB) < AIC (Poisson)

NB is better

AIC = 2k – 2ln(L)K= number of parameters in the modelL = maximum value of the likelihood function

The preferred model is the one with the minimum AIC

AIC rewards goodness of fit and imposes a penalty that is an increase function of the number of estimated Parameters. The penalty discourages overfitting.

Page 18: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Zero Inflated-Poisson

Page 19: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Zero Inflated-Poisson Cont.

Page 20: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Poisson Model vs ZIP: Vuong test

The Vuong test is a likelihood-ratio based test for model selection

Page 21: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Given our unbalanced we cannot used SVM Conditional Random Field is a particular

case of Log linear models.

Conditional Random Field

Page 22: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Log-Linear A log linear model can be written as:

special case : Poisson:

Where the partition

Given x, the label predicted by the model is:

Is called a feature-function.

'

)',(exp),(y j jj yxFwwxZ

,!

)Pr(r

ery

r

),();|Pr(

,

wxZ

ewxy

j jj yxFw

yxFwwxypy jj

jyy ,maxarg;|maxargˆ yxFj ,

Page 23: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Log-Linear - Feature A feature-function is any mapping:

RYXFj :

Page 24: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Linear-chain CRF A case of Multilevel Given a sentence: each word can be tag as:

noun, verb, adjective, preposition, etc…

Fj is a sum along the sentence, for i = 1 to i = n where n is the length of

Conditional Random Fields (CRF)

),();|Pr(

,

wxZ

ewxy

j jj yxFw

i

iijj ixyyfyxF ),,,(, 1

Page 25: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

1: CRF++ 2. CRFSGD (Stochastic gradient Descent) 3. Mallet :Umass Amherst

APPLICATION-Packages:

Page 26: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

The current CRF may not be best suited for a model where the response variable is a count due to the way the feature functions are being built. The feature functions describe the interactions between response variables and covarites.

I found (below) an extension of the CRF that can be

applied to Count model. I may tried this one.

Eunho Yang, Pradeep K Ravikumar, Genevera I Allen, Zhandong Liu UT Austin; UT Austin; Rice University; Baylor College of Medicine: “Conditional Random Fields via Univariate Exponential Families” 2013 (Neural Information Processing Systems Foundation).

Application of CRF to Count Model

Page 27: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

They introduced a “novel subclass of CRFs”, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families.

This allows them to derive novel multivariate CRFs given any univariate exponential distribution, including the Poisson, negative binomial, and exponential distributions.

Application of CRF to Count Model

Page 28: By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

http://www.ats.ucla.edu/stat/r/dae/zipoisson.htm http://videolectures.net/cikm08_elkan_llmacrf/ http://nips.cc/Conferences/2013/Program/event.php?ID=38

11

References