DATA SCIENCE FOR RISK MANAGEMENT - RIMS Handouts/RIMS 16/TLT001/TLT001_TLT0… · DATA SCIENCE FOR...
Transcript of DATA SCIENCE FOR RISK MANAGEMENT - RIMS Handouts/RIMS 16/TLT001/TLT001_TLT0… · DATA SCIENCE FOR...
Types of Data
Structured Unstructured
Internal • Claims history
• Safety data
• Adjuster notes
• Surveillance
videos
External • Financial data
• Labor statistics
• News reports
• Social media text
Example - Commercial Fleet Telematics Data
Seatbelt use
Braking
Driver Passengers
Speed Left turns
AccelerationRoute
Mileage
Data Science Techniques for Risk Management
• Association rules
• Clustering
• Classification
• Regression
• Text mining
• Social network analysis
Training Data – WC Claims Fraud
Name Age
Body part
previously
injured
Attorney
involvement Witness
Fraudule
nt Claim
Anna 35 Y Y N Y
Carlos 42 N N Y N
David 53 N N N N
Jason 27 Y Y N Y
Sonia 32 N Y Y N
Attributes
Insta
nce
s
Class
Label
Gregory 45 Y Y Y ?
New Instance
Information Gain from Various Attributes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prev. InjuredBody Part
Age Attorneyinvolvement
Days to ReportClaim
Day of week Witness
Classification Tree – WC Claim Fraud
Previously
Injured Body
Part
AgeAttorney
involvement
No Yes
Number of
Medical Visits
< 3
< 40 =>
40
YesNo
Day of
week
Other than MondayMonday
Witness
YesNo
Prob.
Fraud
= .80
Prob.
Fraud
= .02
> 3
Days to
Report Claim
< 1 > 1
Classification as a Set of Rules
If (body part previously injured) AND (an attorney is involved) AND (day of week is Monday) AND (no witness) THEN Class =
Fraud Likely – Refer for Further Investigation
If (body part not previously injured) AND (age less than 40) AND (number of medical visits less than 3) AND (claim reported within 1
day) THEN Class =Fraud Highly Unlikely
Evaluating a Predictive Model
0%
20%
40%
60%
80%
100%
120%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
% c
orr
ectly ide
ntified
% test sample
Baseline Model
(20%, 40%)
Text Mining – Adjusters’ Notes
Claim Co-morbidity Commute >50 mi.
Current
prescriptions Provider A Class Label
001 1 1 0 1 1
002 1 1 1 0 0
003 0 0 0 0 0
004 0 0 1 1 0
005 1 1 0 1 1
Training Data
New Instance
2237 1 1 1 0 ?