1. · Web viewKolmogorov–Smirnov test statistic. Its p value is listed in the next column. Figure...
Transcript of 1. · Web viewKolmogorov–Smirnov test statistic. Its p value is listed in the next column. Figure...
CAS Individual Claim SimulatorValidation Report
ReservePrism
April 2018
1
Table of Contents1. Background.............................................................................................................................................3
2. Validation Method..................................................................................................................................3
3. Fitting......................................................................................................................................................4
3.1 Distribution Fitting.............................................................................................................................5
3.2 Copula Fitting...................................................................................................................................13
3.3 Exposure Index................................................................................................................................15
3.4 Report Lag Impact on Frequency.....................................................................................................15
4. Simulation.............................................................................................................................................15
4.1 Open Claim Loss Development........................................................................................................17
4.2 Claim Reopenness............................................................................................................................18
4.3 Distribution......................................................................................................................................19
4.4 Copula..............................................................................................................................................21
4.5 Exposure Index................................................................................................................................22
4.6 LAE...................................................................................................................................................23
4.7 Deductible and Limit........................................................................................................................23
Appendix. Simulation Module Test R Code..............................................................................................25
2
Test Claim Data
CAS Simulator Fitting Module
Report Lag
Settlement Lag
Frequency
Severity
Report Lag, Settlement Lag, and
Severity Copula
Frequency Copula
Distribution
Dependency
Check against assumptions used in the test data
1. BackgroundThis document explains the validation efforts made for the CAS Individual Claim Simulator by the development team. The development team tested the general reasonability of fitting and simulation results as well as individual modeling choices. Test data is generated by the ReservePrism, a validated commercial software for loss fitting and simulation. Blind tests were used so that the tester did not know the assumptions used in the data generation but had to try different distributions/copulas.
While the developers performed many tests and believe in the correctness of the simulator, small errors may still exist. The developers and the sponsor make no guarantee that the simulator is error-free and can meet users’ specific purpose.
2. Validation MethodTwo areas are focused in the validation tests: fitting function and simulation function. For the fitting function, the goal is to make sure that the best fitted distribution and copula are consistent with the assumptions used in test data generation. Test data is fed into the fitting module of the simulator. Fitted results are then compared to the assumptions used in data generation.
Figure 1. Fitting Module Validation Process
For the simulation function, the goal is to make sure the distribution/copula of simulated individual claims follows the simulation assumptions. Simulated data from the CAS Individual Claim simulator are
3
Simulation Assumptions
CAS Simulator Simulation
Module
Report Lag
Settlement Lag
Frequency
Severity
Report Lag, Settlement Lag, and
Severity Copula
Frequency Copula Check against the
simulation assumptions
Simulated Claim Data
R Program
Reopen Probability
Loss Development
LAE
Exposure Index
Severity Index
Deductible
Limit
analyzed using R programs to derive the distribution and copula. Maximum likelihood method (MLE) is used and the results are compared to simulation assumptions.
Figure 2. Simulation Module Validation Process
3. FittingThe fitting module tries to find the best distribution or copula fit based on claim data. In this test, the test data is simulated from ReservePrism, a commercial claim simulation software. The assumptions used in test data generation were kept secrete from the tester. Two business lines’ claim data are simulated: Home and Auto. The following assumptions are used in test data generation.
Table 1. Test Data Assumption
Business Line Home AutoClaim Type Dwelling PDAnnual Frequency Poisson ( = 1200) Poisson ( = 2000)Exposure Index Level 8% Annual IncreaseReport Lag Weibull (shape=9.5, scale=800) Exponential (=0.0109589)Settlement Lag Weibull (shape=6.5, scale=180) Exponential (=0.002739726)Severity Lognormal (=12.1664,
=0.5326)Lognormal (=9, =0.6)
Severity Index Level LevelCorrelation between Severity and Settlement Lag
Frank (alpha=50) Independent
4
Deductible 20000 1000Limit Limit Prob.
200000 37.5%300000 62.5%
Limit Prob.8000 20%15000 30%20000 50%
Frequency Correlation 85%
The test data is then fed into the simulator for fitting using maximum likelihood estimation (MLE). The results are described below.
3.1 Distribution FittingReport lag, settlement lag, frequency and severity are covered in distribution fitting. The fitting module tries a list of distributions to find out the best fit based on criteria such as Akaike information criterion (AIC) and Bayesian information criterion (BIC).
Report Lag
Table 2 lists the validation result for report lag. The fitting module is able to find the correct distribution and estimated distribution parameters are close to the assumptions in Table 1.
Table 2. Report Lag Fitting Result
LoB Distribution
Parameter Standard Deviation1
DoF2 KS Test3
p value
Log likelihood
AIC BIC
Home
Normal mean:128.845; sd:685.675;
NA 9608 0.7 0 -75,717 151,439
151,453
Lognormal meanlog:6.62; sdlog:0.137;
0.0014; 0.001;
9608 0.07 0 -58,192 116,387
116,402
Pareto Fitting UnsuccessfulWeibull shape:9.276;
scale:798.361; 0.0737; 0.9247;
9608 0.01 0.6 -57,366 114,735
114,750
Gamma shape:53.66; scale:14.096;
0.7167; 0.189;
9608 1 0 -57,980 115,963
115,977
Uniform Fitting UnsuccessfulExponential Fitting Unsuccessful
Auto
Normal mean:87.639; sd:86.405;
0.5038; 0.3531;
29410
0.16 0 -173,057 346,117
346,134
Lognormal meanlog:3.89; sdlog:1.277;
0.0074; 0.0053;
29410
0.08 0 -163,297 326,598
326,615
Pareto Fitting UnsuccessfulWeibull Fitting UnsuccessfulGamma shape:0.993;
scale:88.19; 0.0072; 0.8238;
29410
1 0 -160,921 321,846
321,863
Uniform Fitting UnsuccessfulExponential
rate:0.011; 1e-04; 29411
0.01 0 -160,922 321,846
321,854
Notes:1. Standard deviation of the parameter estimation.
5
2. Degree of freedom, which is the number of data points – number of parameters3. Kolmogorov–Smirnov test statistic. Its p value is listed in the next column.
Figure 1 and Figure 2 compare the data and the chosen distributions for report lag.
Figure 1. Report Lag Fitting for Home Line
Figure 2. Report Lag Fitting for Auto Line
6
Settlement Lag
Table 3 lists the validation result for settlement lag. The fitting module is able to find the correct distribution and estimated distribution parameters are close to the assumptions in Table 1.
Table 3. Settlement Lag Fitting Result
LoB Distribution
Parameter Standard Deviation1
DoF2 KS Test3
p value
Log likelihood
AIC BIC
Home
Normal mean:167.03; sd:29.992;
0.3151; 0.221;
9057 0.04 0 -43,714 87,432 87,446
Lognormal meanlog:5.098; sdlog:0.198;
0.0021; 0.0015;
9057 0.08 0 -44,383 88,769 88,783
Pareto Fitting UnsuccessfulWeibull shape:6.479;
scale:179.004; 0.0532; 0.3056;
9057 0.01 0.11 -43,554 87,112 87,126
Gamma shape:27.457; scale:6.071;
0.3994; 0.0891;
9057 1 0 -44,091 88,187 88,201
Uniform Fitting UnsuccessfulExponential rate:0.006; 1e-04; 9058 0.45 0 -55,410 110,82
1110,82
8
Auto
Normal mean:77.587; sd:411.109;
5.5655; 4.3357;
25414 0.43 0 -187,547 375,098
375,114
Lognormal meanlog:5.166; sdlog:1.282;
0.008; 0.0057;
25414 0.08 0 -173,671 347,346
347,362
Pareto Fitting Unsuccessful
7
Weibull shape:1.005; scale:311.572;
0.0049; 2.0458; 25414 0.01 0.23 -171,332 342,66
9342,68
5Gamma shape:1.008;
scale:309.914; 0.0079; 3.1273;
25414 1 0 -171,333 342,670
342,687
Uniform Fitting UnsuccessfulExponential
rate:0.003; 0; 25415 0.01 0.03 -171,333 342,669
342,677
Notes:
1. Standard deviation of the parameter estimation.2. Degree of freedom, which is the number of data points – number of parameters3. Kolmogorov–Smirnov test statistic. Its p value is listed in the next column.
Figure 3 and Figure 4 compare the data and the chosen distributions for settlement lag.
Figure 3. Settlement Lag Fitting for Home Line
8
Figure 4. Settlement Lag Fitting for Auto Line
Monthly Frequency
9
Table 4 lists the validation result for monthly frequency. The fitting module found negative binomial distribution is a slightly better fit than Poisson distribution based on AIC and BIC. The assumption is Poisson distribution. The fitting for frequency is not as perfect as for other variables such as report lag. It is because the number of observations is much smaller for monthly frequency. For 10 years’ experience data, only 120 data points are available for monthly frequency but thousands for other variables. However, the estimated Poisson distribution parameters are close to the assumptions in Table 1.
Table 4. Monthly Frequency Fitting Result
LoB Distribution Parameter Standard Deviation1
DoF2 Chi-Sq Test3
p value
Log likelihood
AIC BIC
Home
Poisson lambda:101.067; 4 0.9858; 103 18,377 0 -439 881 884Negative Binomial
size:101.61; prob:0.501;
27.6725; 0.0682;
102 122 0 -424 851 857
Geometric prob:0.01; 9e-04; 103 218 0 -585 1,171 1,174
Auto
Poisson lambda:169.017; 5 1.1868; 119 12 0.38 -457 916 919Negative Binomial
size:2102.025; prob:0.926;
260.2602; 0.0085;
118 14 0.24 -459 922 927
Geometric prob:0.006; 5e-04; 119 148 0 -736 1,474 1,477Notes:
1. Standard deviation of the parameter estimation.2. Degree of freedom, which is the number of data points – number of parameters3. Chi-Square test statistic. Its p value is listed in the next column.4. The annual frequency assumption is Poisson with equals 1200, which is close to monthly frequency
estimate 101.067 × 12 = 1,213.5. The annual frequency assumption is Poisson with equals 2000, which is close to monthly frequency
estimate 169.017 × 12 = 2,028.
By looking at the comparison graphs (Figure 5 and Figure 6), not much differences can be told between Poisson distribution fitting and Negative Binomial distribution fitting.
10
Figure 5. Monthly Frequency Fitting for Home Line
Poisson Distribution Negative Binomial Distribution
Figure 6. Monthly Frequency Fitting for Auto Line
Poisson Distribution Negative Binomial Distribution
Severity
Table 5 lists the validation result for severity. With the presence of deductible and limit, the loss data is truncated. The underlying severity distribution before deductible and limit is derived using MLE. AIC and BIC of successfully fitted distributions are too close to have a decisive conclusion of the best distribution.
11
However, the estimated distribution parameters (Lognormal distribution) are close to the assumptions in Table 1.
Table 5. Severity Fitting Result
LoB Distribution Parameter Converge1 DoF2 Log likelihood
AIC BIC
Home
Normal mean:182872.53313; sd:69683.70117;
successful convergence
9057 -113,510 227,025 245,139
Lognormal meanlog:12.05007; sdlog:0.41117;
successful convergence
9057 -113,962 227,927 246,041
Pareto
Fitting UnsuccessfulWeibullGammaUniformExponential rate:1e-05; successful
convergence9058 -117,875 235,753 244,810
Auto
Normal mean:7582.14576; sd:5121.41155;
successful convergence
25414
-247,315 494,633 545,461
Lognormal meanlog:8.92714; sdlog:0.52153;
successful convergence
25414
-246,412 492,828 543,656
Pareto
Fitting UnsuccessfulWeibullGammaUniformExponential rate:0.00013; successful
convergence2541
5-252,438 504,877 530,291
Notes:
1. Convergence status of the fitting.2. Degree of freedom, which is the number of data points – number of parameters
Figure 7 and Figure 8 compare the loss data and the truncated Lognormal distributions for severity. The deductible and limit may not be uniform for all claims. An average deductible and limit is used to show the truncated distribution. Therefore, discrepancy may be spotted at both ends because of the averaging. However, it is only for the ease of presentation.
12
Figure 7. Severity Fitting for Home Line
Figure 8. Severity Fitting for Auto Line
13
3.2 Copula FittingThe fitting module can estimate the relationship among severity, settlement lag and report lag within each business line/claim type. It can also estimate the relationship of monthly frequency among business lines.
Copula among severity, report lag and settlement lag
Table 6 shows the fitting results for copula among severity and lags.
Table 6. Severity, Report Lag and Settlement Lag Copula Fitting Result
LoB Copula Parameter1 Standard Deviation2 DoF3 Sn4 p value
Home
normal 0.9034; -0.0154; -0.0147 0.0025;0.0106;0.0101
3.4615 0.8284
clayton 0.4507 0.0065 23.4856
0.0025
gumbel 1.2397 0.0048 27.1433
0.5597
frank 2.0104 0.0318 22.1095
0.0025
joe 1.2767 0.007 36.2521
0.4851
t 0.9219; -0.0162; -0.0162 0.0019;0.0119;0.0123
7.729163
NA NA
Auto
normal 0.003; 9e-04; -0.0162 0.0059;0.0059;0.0059
NA NA
clayton 0 NA NA NAgumbel 1 0.0015 NA NAfrank 0 NA NA NAjoe 1 0.002 NA NAt 0.003; 9e-04; -0.0163 0.0059;0.0059;0.005
9363.773
5NA NA
Notes:1. Parameter: copula parameter. For normal and t copula, the first parameter is the correlation between
severity and settlement lag, the second one is between severity and report lag, and the third one is between settlement lag and report lag.
2. Standard Deviation of the parameter estimation.3. Degree of freedom, which is the number of data points – number of parameters.4. Sn: Cramer-von Mises Statistic. p-value of Sn test is listed in the next column.
In test data generation, the Home line assumes a frank copula between severity and settlement lag with parameter equals 50. The report lag is assumed to be independent from severity and settlement lag. It uses three copulas here: one frank copula and two independent copulas. Since the simulator fit the relationship among severity, report lag and settlement lag together using one copula to be more consistent and comprehensive, normal copula is found to be more appropriate to describe the relationship. Report lag is found to have near zero correlation with severity and settlement lag, as indicated by the independency assumption. Loss size after deductible and limit and settlement lag has an estimated correlation of 90.3%. The p value of the statistical test is 0.83 which does not deny the
14
hypothesis of a 90.3% correlation. Considering the impact of deductible and limit, this is close to the correlation (around 99%) derived from a frank copula with parameter equals 50.
The Auto line assumes severity, report lag and settlement lag are independent from each other. The fitting result shows near-zero correlation as well.
Figure 9 and Figure 10 compares the test data with the chosen copula for each business line.
Figure 9. Copula Fitting for Home Line
Lognormal
Notes: Margin 1: Severity with Lognormal distribution Margin 2: Settlement Lag with Weibull distribution Margin 3: Report Lag with Weibull distribution
Figure 10. Copula Fitting for Auto Line
Notes: Margin 1: Severity with Lognormal distribution Margin 2: Settlement Lag with Exponential distribution Margin 3: Report Lag with Exponential distribution
Frequency Copula
15
Lognormal
Table 7 shows the copula fitting result for the monthly frequency between the two business lines. The assumption is that the annual frequencies follow a normal (Gaussian) copula with a correlation coefficient of 85%. The fitting result shows that the correlation coefficient lies in the range of [50.5%, 68.2%] ([-2,+2]). The discrepancy is likely to be caused by insufficient number of data points. Only 10 pairs of annual frequencies are generated based on the assumed normal copula. A longer history is likely to reduce the discrepancy.
Table 7. Frequency Copula Fitting Result
Copula Parameter1 Standard Deviation2
DoF3 Sn4 p value
normal 0.5936 0.0444 0.1159 0.0025clayton 0.7311 0.1827 0.3237 0.0025gumbel 1.7872 0.1134 0.0664 0.0423frank 4.6165 0.6222 0.093 0.0025joe 2.2788 0.1507 0.0594 0.0672t 0.6131 0.0626 4.74955242 NA NA
3.3 Exposure IndexThe impact of exposure index needs to be considered in frequency distribution fitting. Frequency data is normalized by removing the impact of exposure change by time.
The test data assumes an 8% business volume increase for 10 years for Auto line, with 2000 expected claims in the first year. The fitting results indicate an expected number of 2028 claims per year after removing the impact of business volume change. This indicates exposure index has been reflected in the fitting module approximately. Exposure index can incorporate not only business volume but also other cyclical patterns such as underwriting cycle and seasonality as well.
3.4 Report Lag Impact on FrequencyGiven that some business lines may have very long report lag, experience data may be much truncated as quite a few IBNR claims are not observable for recent accident years. The test data is generated to have considerably long report lags. The home line has an expected report lag of 91 days. The Auto line has an expected report lag of 759 days. The fitting module considers the possibility of IBNR claims when adjusting the frequency data. The result of frequency distribution fitting in Section 3.1 shows that the impact of report lag has been appropriately taken into account.
4. SimulationTo make sure the simulation module is working properly, the data simulated by the CAS Individual Claim Simulator is tested against the simulation assumptions. Continuing with the fitting test, two lines (Home and Auto) are simulated 100 times for open claim loss development, claim reopen, IBNR and future claims. Table 8 lists the simulation assumptions.
16
Table 8. Simulation Assumptions
Business Line Home AutoClaim Type Dwelling PD
Open ClaimOpen Claim Loss Development
Dev. Year
Mean Dev. Factor
Volatility
0 1.2 0.0511 1.15 0.0422 1.1 0.0413 1.05 0.0934 & + 1 0
Dev . factor=e0.001+0.01×dev . year+0.008 e
dev. Year: development yeare: random variable follows standard normal distribution
Claim ReopenReopen Probability Dev.
YearProb.
0 0.021 0.0152 0.013 0.0054 & + 0
Dev. Year
Prob.
0 0.021 0.0152 0.013 0.0054 & + 0
Reopen Claim Loss Development
Dev. Year
Mean Dev. Factor
Volatility
0 1.05 0.0951 1.1 0.0842 1.05 0.073 1.06 0.0784 1.07 0.0255 1.08 0.0796 1.09 0.0137 1.06 0.0538 & + 1 0
Dev. Year
Mean Dev. Factor
Volatility
0 1.05 0.0951 1.1 0.0842 1.05 0.073 1.06 0.0784 1.07 0.0255 1.08 0.0796 1.09 0.0137 1.06 0.0538 & + 1 0
Reopen Lag Exponential (=0.005) Exponential (=0.005)Resettlement Lag Exponential (=0.01) Exponential (=0.005)
IBNR/Future ClaimMonthly Frequency Poisson ( = 101.067) Negative Binomial (size=2102.025,
prob=0.926)Exposure Index Level 8% Annual IncreaseReport Lag Weibull (shape=9.276,
scale=798.361)Exponential (=0.011)
17
Settlement Lag Weibull (shape=6.479, scale=179.004)
Weibull (shape=1.005, scale=311.572)
Severity Lognormal (=12.05007, =0.41117) Lognormal (=8.92714, =0.52153)Severity Index Level Level till 2017 and 3% annual increase
thereafterCorrelation among Severity, Settlement Lag, and Report Lag
Normal CopulaSeverity Settle.
LagReport Lag
Severity
1
Settle. Lag
0.9034 1
Report Lag
-0.0154 -0.0147
1
Normal CopulaSeverity Settle.
LagReport Lag
Severity
1
Settle. Lag
0.003 1
Report Lag
0.009 -0.0162
1
Deductible 20000 1000Limit Limit Prob.
200000 37.5%300000 62.5%
Limit Prob.8000 20%15000 30%20000 50%
LAE LAE=5+0.01×dev . year+0.005×inccured loss+5 edev. Year: development yeare: random variable follows standard normal distribution
LAE=log (1.05+0.01e )
Frequency Correlation
Normal copula (0.6131)
The remainder of this section summarizes the testing result, with R code performing the testing included in the appendix.
4.1 Open Claim Loss DevelopmentOpen claim loss development patterns derived from simulated data match the simulation assumptions quite well.
For the Home line, a loss development factor table based on development year is assumed. For the Auto line, an exponential regression function is assumed. The testing result is shown in Table 9.
Table 9. Open Claim Loss Development Testing Result
Business Line Assumption Test ResultHome Dev.
YearMean Dev. Factor
Volatility
0 1.2 0.0511 1.15 0.0422 1.1 0.0413 1.05 0.0934 & + 1 0
Dev. Year
Mean Dev. Factor
Volatility
0 NA NA1 1.1513 0.039852 1.1002 0.041773 NA NA4 & + NA NA
18
Auto Dev . factor=e0.001+0.01×dev . year+¿0×incurred Loss+0.008 e
Dev . factor=e0.001001+0.01004 ×dev . year
−1.8e-08×incurredLoss+0.007991e
Adjusted-R2: 58.4%t test on intercept and parameter for dev. Year has a p value less than 2.2e-16
4.2 Claim ReopennessFor claim reopenness, simulated reopen probability, resettlement lag and reopen loss development have been tested against simulation assumptions and they match well, as shown from Table 10 to Table 12.
Reopen Probability
Table 10. Claim Reopen Probability Testing Result
Business Line Assumption Test ResultHome Dev.
YearProb.
0 0.021 0.0152 0.013 0.0054 & + 0
Dev. Year
Prob. Volatility
0 0.01959 0.00331 0.01480 0.00312 0.00898 0.00303 0.00536 0.00194 & + 0 0
Auto Dev. Year
Prob.
0 0.021 0.0152 0.013 0.0054 & + 0
Dev. Year
Prob. Volatility
0 0.01901 0.00241 0.01517 0.00222 0.01069 0.00153 0.00520 0.00134 & + 0
Resettlement Lag
Table 11. Resettlement Lag Simulation Testing Result
Business Line Home AutoAssumption Exponential (=0.01) Exponential (=0.005)Test Result Exponential (=0.00965)
Standard error of estimate:0.000394
Exponential (=0.00532)Standard error of estimate:0.000121
K-S Test against Assumption
Statistic: 0.0401p-value: 0.3026
Statistic: 0.0424p-value: 0.0034
19
Reopen Loss Development
Table 12. Claim Reopen Loss Development Testing Result
Business Line Assumption Test ResultHome Dev.
YearMean Dev. Factor
Volatility
0 1.05 0.0951 1.1 0.0842 1.05 0.073 1.06 0.0784 1.07 0.0255 1.08 0.0796 1.09 0.0137 1.06 0.0538 & + 1 0
Dev. Year
Mean Dev. Factor
Volatility
0 NA NA1 NA NA2 1.0488 0.05293 1.0648 0.06104 1.0727 0.02415 1.0651 0.06566 1.0837 0.01017 NA NA8 & + NA NA
Auto Dev. Year
Mean Dev. Factor
Volatility
0 1.05 0.0951 1.1 0.0842 1.05 0.073 1.06 0.0784 1.07 0.0255 1.08 0.0796 1.09 0.0137 1.06 0.0538 & + 1 0
Dev. Year
Mean Dev. Factor
Volatility
0 1.0451 0.081031 1.0906 0.074332 1.0404 0.068363 1.0520 0.079814 1.0670 0.023445 1.0574 0.065146 1.0795 0.013847 1.0679 0.059888 & + NA NA
Simulated reopen lag (from the last close date to reopen date) cannot be directly compared to the simulation assumption as it is truncated at the evaluation date (2017-12-31). Reopen dates are checked to make sure they are later than the evaluation date. As expected, simulated reopen dates for old closed claims are very close to the evaluation date, because the reopen date lag is assumed to have a mean of 200 days.
4.3 DistributionReport lag, settlement lag, frequency and severity distributions derived from the simulated data match the simulation assumptions quite well, as shown from Table 13 to Table 16.
20
Report Lag
Table 13. Report Lag Simulation Testing Result
Business Line Home AutoAssumption Weibull (shape=9.276,
scale=798.361)Exponential (=0.011)
Test Result Weibull (shape=8.787, scale=768.0863)Standard error of estimate:0.0623,0.8348
Exponential (=0.0153)
Standard error of estimate:0.00007
K-S Test against Assumption
Statistic: 0.1311p-value: 0
Statistic: 0.2787044p-value: 0
Settlement Lag
Table 14. Settlement Lag Simulation Testing Result
Business Line Home AutoAssumption Weibull (shape=6.479,
scale=179.004)Weibull (shape=1.005, scale=311.572)
Test Result Weibull (shape=6.4828, scale=179.1017)Standard error of estimate:0.0260, 0.1499
Weibull (shape=1.0032, scale=311.3343)Standard error of estimate:0.0033, 1.3829
K-S Test against Assumption
Statistic: 0.0106p-value: 0.0004
Statistic: 0.0032p-value: 0.6001
Frequency
Table 15. Frequency Simulation Testing Result
Business Line Home AutoAssumption Poisson ( = 101.067) Negative Binomial (size=2102.025,
prob=0.926)Mean = 168, Variance = 181
Test Result Poisson ( = 100.6617)
Standard error of estimate:0.2896
Negative Binomial (size=1165.012, prob=0.8747)Mean = 167, Variance = 191
Chi-Squared Test Statistic: 13.827 Statistic: 31.63137
21
against Assumption p-value: 0.2427 p-value: 0.0009
Severity
Table 16. Severity Simulation Testing Result
Business Line Home AutoAssumption Lognormal (=12.05007,
=0.41117)Lognormal (=8.92714, =0.52153)
Test Result Lognormal (=12.05186, =0.41016)Standard error of estimate:0.0021, 0.0015
Lognormal (=8.926003, =0.5196)
Standard error of estimate:0.0022, 0.0016
K-S Test against Assumption
Statistic:0.0052p-value:0.2603
Statistic:0.0030p-value:0.6879
4.4 CopulaFrequency Copula
It is assumed that the monthly frequency between the Home line and the Auto line follows a normal copula with correlation coefficient equals 61.31%. The simulated data exhibits the similar relationship. The fitted normal copula has a correlation coefficient of 63.7% with an standard error of 1.7%. A goodness-of-fit test based on Cramer-von Mises Statistic is also performed with the test statistic equals 0.0152 and the p-value equals 0.898. This means that the test does not deny the hypothesis that the simulated data has the same frequency relationship as assumed.
Severity, Settlement Lag and Report Lag
The testing results for copula among severity, settlement lag and report lag also show that the simulator reflect the copula assumptions properly.
Table 17. Severity, Settlement Lag and Report Lag Copula
Business Line Home AutoAssumption Normal Copula
Severity Settle. Lag
Report Lag
Severity
1
Settle. Lag
0.9034 1
Report Lag
-0.0154 -0.0147
1
Normal CopulaSeverity Settle.
LagReport Lag
Severity
1
Settle. Lag
0.003 1
Report Lag
0.009 -0.0162
1
Test Result Normal Copula Severity Settle. Report
Normal CopulaSeverity Settle. Report
22
Lag LagSeverity 1Settle. Lag
0.90256 1
Report Lag
-0.0330 -0.0343
1
Lag LagSeverity 1Settle. Lag
0.00004 1
Report Lag
-0.0094 -0.021 1
Goodness-of-fit Test
Test Statistic: 0.01412p-value: 0.9776
Test Statistic: 0.01752p-value: 0.9975
4.5 Exposure IndexThe Home line assumes a level exposure index which means that the business volume stays unchanged. The average exposure index based on simulated data fluctuates around the level line, as shown in Figure 11.
Figure 11. Home Line Exposure Index Test
23
The Auto line assumes an 8% annual increase in exposure. The average exposure index based on simulated data indicates the same pattern, as shown in Figure 12.
Figure 12. Auto Line Exposure Index Test
4.6 LAEIn the simulation, it is assumed that the Home line LAE = 5+0.01×dev . year+0.005×inccured loss+5e. The testing result shows that Home line LAE = 5.001+0.01361×dev . year+0.005×inccured loss+4.997 e. The adjusted R2 equals 99.97%.
The Auto line LAE is assumed to follow log (1.05+0.01 e ). The testing results shows that Auto line LAE = log (1.05+0.00002×dev . year+0×inccured loss+0.01046 e ), which is very close to the assumption.
4.7 Deductible and LimitThe simulated data is tested to make sure that
Ultimate Loss = Min(Max(Deductible, Severity), Deductible + Limit) – Deductible
Here, severity is the loss before deductible and loss.
The distributions of deductible and limit derived from the simulated data match the simulation assumptions very well, as shown in Table 18.
Table 18. Deductible and Limit Simulation Testing Result
24
Business Line Item Assumption Test ResultHome Deductible 20000 20000
Limit Limit Prob.200000 37.5%300000 62.5%
Limit Prob.200000 37.5%300000 62.5%
Auto Deductible 1000 1000Limit Limit Prob.
8000 20%15000 30%20000 50%
Limit Prob.8000 19.9%15000 30.3%20000 49.8%
In addition to the tests mentioned above, some basic checks of the simulated data have been conducted as well.
1. Report date >= Occurrence date2. Settlement date >= Report date3. Claim reopen date >= Settlement date (Last close date)4. Claim reopen date >= Evaluation date (2017-12-31)5. Resettlement date >= Claim reopen date6. IBNR Report date > Evaluation date (2017-12-31)7. Future claim occurrence date is between evaluation date (2017-12-31) and simulation end date
(2008-12-31).
With the satisfactory testing results, the simulator captures the simulation assumptions well.
25
Appendix. Simulation Module Test R CodeThe following R code is used to test the simulation module as described in Section 4. Simulation.
###########################Simulation Module Test##############################################
#Read in simulated datasetwd("C:/temp/CAS/test")simdata <- read.csv("sim.csv")simdata <- simdata[simdata$Sim<11,] #You may want to use the first 10 portfolio simulations to save timegc()
###########################Basic Check#################################################Report date >= Occurrence datesum(as.numeric(as.Date(simdata$reportDate)-as.Date(simdata$occurrenceDate))<0)#Settlement date >= Report datesum(as.numeric(as.Date(simdata$settlementDate)-as.Date(simdata$reportDate))<0)#Reopen date >= Settlement datefitdata <- simdata[!is.na(simdata$reopenDate),]sum(as.numeric(as.Date(fitdata$reopenDate)-as.Date(fitdata$settlementDate))<0)#Reopen date >= Evaluation date (2017-12-31)sum(as.numeric(as.Date(fitdata$reopenDate)-as.Date("2017-12-31"))<0)#Resettle date >= Reopen datefitdata <- simdata[!is.na(simdata$reopenDate),]sum(as.numeric(as.Date(fitdata$resettleDate)-as.Date(fitdata$reopenDate))<0)#IBNR Report date > Evaluation date (2017-12-31)fitdata <- simdata[simdata$status == "IBNR",]sum(as.numeric(as.Date(fitdata$reportDate)-as.Date("2017-12-31"))<0)#UPR Occurrence date > Evaluation date (2017-12-31)fitdata <- simdata[simdata$status == "UPR",]sum(as.numeric(as.Date(fitdata$occurrenceDate)-as.Date("2017-12-31"))<0)#UPR Occurrence date <= Future date (2018-12-31)sum(as.numeric(as.Date(fitdata$occurrenceDate)-as.Date("2018-12-31"))>0)
###########################Test Open Claim Loss Development#############################openc <- simdata[simdata[,"status"]=="OPEN",]###Development year at the valuation date where incurred losses are recordedopenc[,"devYears"] <- ceiling(as.numeric(as.Date("2017-12-31")-as.Date(openc$occurrenceDate))/365)###Development year at the settlement dateopenc[,"settleYears"] <- ceiling(as.numeric(as.Date(openc$settlementDate)-as.Date(openc$occurrenceDate))/365)###Cumulative development factors from valuation date to settlement dateopenc[,"cdf"] <- openc$ultimateLoss/openc$incurredLoss
###Function to calculate expected cumulative development factors
26
CumDevFac <- function(devYears,settleYears,meanDevFac){nDevFac<-pmin(length(meanDevFac),settleYears-1)n<-length(meanDevFac)result<-vector()for (i in c(1:length(nDevFac))) {
if (is.na(nDevFac[i]) == TRUE){result <- c(result,NA)
} else {if(devYears[i]==settleYears[i]){
result <- c(result,1)} else {
result <- c(result,prod(meanDevFac[pmin(devYears[i],nDevFac[i]):nDevFac[i]]))}
}}result
}
###Calculate expected cumulative development factors###This is for Home LinemeanDevFac <- c(1.2,1.15,1.1,1.05,1)#This is the assumed expected mean development factor.openc[,"excdf"] <- CumDevFac(openc$devYears,openc$settleYears,meanDevFac)exagg<-aggregate(excdf ~ devYears + settleYears, data = openc[openc[,"LoB"]=="Home",], mean)#This is the mean development factor from the simulated data.agg<-aggregate(cdf ~ devYears + settleYears, data = openc[openc[,"LoB"]=="Home",], mean)#This is the standard deviation of the development factor from the simulated data.aggsd<-aggregate(cdf ~ devYears + settleYears, data = openc[openc[,"LoB"]=="Home",], sd)
#This is the expected mean development factor from the simulated data.openc[,"expectedcdf"] <- openc$expectedLoss/openc$incurredLossexeagg<-aggregate(expectedcdf ~ devYears + settleYears, data = openc[openc[,"LoB"]=="Home",], mean)
###This is for Auto Lineregdata <- openc[openc[,"LoB"]=="Auto",]
#Function to calculate severity indexgetindex <- function(monthlyindex,startDate,dates) {
years <- as.numeric(substr(as.character(dates),1,4))months <- as.numeric(substr(as.character(dates),6,7))startyear <- as.numeric(substr(as.character(startDate),1,4))startmonth <- as.numeric(substr(as.character(startDate),6,7))indices <- pmax(1,pmin(360,(years-startyear)*12+(months-startmonth)+1))monthlyindex[indices]
}
#Calculate severity index startDate <- as.Date("2008-01-01")monthlyindex <- c(rep(1,120),cumprod(c(1*1.03^(1/12),rep(1.03^(1/12),239))))
27
severityindex <- getindex(monthlyindex, startDate, regdata$settlementDate)
#Expected cumulative development factor after detrendingregdata[,"expectedcdf"] <- regdata$expectedLoss/regdata$incurredLoss/severityindex
regdata <- regdata[,colnames(regdata) %in% c("cdf","devYears","settleYears","incurredLoss","osRatio","expectedcdf")]regdata$cdf <- log(regdata$cdf/severityindex)f <- as.formula(cdf ~ devYears + incurredLoss + osRatio)exeagg<-aggregate(expectedcdf ~ devYears + settleYears, data = regdata, mean)lm <- lm(f,data = regdata)summary(lm)
###########################Test Reopen Probability##############################Home Lineclosedata <- simdata[simdata$LoB == "Home" & simdata$status=="CLOSED", ]#Close year at the valuation datecloselags <- as.numeric(as.Date("2017-12-31") - as.Date(closedata[,"settlementDate"]))closedata[,"closeYears"] <- pmax(1,ceiling(closelags/365))
#Calculate reopen probability by development year for each simulationreopendata <- closedata[!is.na(closedata$reopenDate), ]agg<-aggregate(ClaimID ~ closeYears + Sim, data = closedata, length)reopenagg<-aggregate(ClaimID ~ closeYears + Sim, data = reopendata, length)agg$reopenProb <- 0for (i in c(1:nrow(agg))){
for (j in c(1:nrow(reopenagg))){if (agg[i,1] == reopenagg[j,1] & agg[i,2] == reopenagg[j,2]) {
agg[i,4] = reopenagg[j,3]/agg[i,3]}
}}meanprob <- aggregate(reopenProb ~ closeYears, data = agg, mean)meanprobsdprob <- aggregate(reopenProb ~ closeYears, data = agg, sd)sdprob
###Auto Lineclosedata <- simdata[simdata$LoB == "Auto" & simdata$status=="CLOSED", ]#Close year at the valuation datecloselags <- as.numeric(as.Date("2017-12-31") - as.Date(closedata[,"settlementDate"]))closedata[,"closeYears"] <- pmax(1,ceiling(closelags/365))
#Calculate reopen probability by development year for each simulationreopendata <- closedata[!is.na(closedata$reopenDate), ]agg<-aggregate(ClaimID ~ closeYears + Sim, data = closedata, length)reopenagg<-aggregate(ClaimID ~ closeYears + Sim, data = reopendata, length)agg$reopenProb <- 0
28
for (i in c(1:nrow(agg))){for (j in c(1:nrow(reopenagg))){
if (agg[i,1] == reopenagg[j,1] & agg[i,2] == reopenagg[j,2]) {agg[i,4] = reopenagg[j,3]/agg[i,3]
}}
}meanprob <- aggregate(reopenProb ~ closeYears, data = agg, mean)meanprobsdprob <- aggregate(reopenProb ~ closeYears, data = agg, sd)sdprob
###########################Test Resettlement Lag################################Home Line#Get resettlement lag datafitdata <- simdata[simdata$LoB == "Home" & simdata$status=="CLOSED" & !is.na(simdata$reopenDate), ]resettlementLags <- as.numeric(as.Date(fitdata[,"resettleDate"])-as.Date(fitdata[,"reopenDate"]))resettlementLags <- ifelse(resettlementLags==0,runif(1),resettlementLags)rm(fitdata)gc()#Resettlement lag distribution fittingfit<-fitdist(resettlementLags, distr="exp", method="mle", discrete=FALSE)
summary(fit)
#Distribution fitting k-s test#This is to make sure there is no tie in the data for k-s testx<-resettlementLags + max(abs(resettlementLags))*0.0001*runif(length(resettlementLags),0,1)#k-s testz<-ks.test(x,"pexp", 0.01)z$statisticz$p.value
###Auto Line#Get resettlement lag datafitdata <- simdata[simdata$LoB == "Auto" & simdata$status=="CLOSED" & !is.na(simdata$reopenDate), ]resettlementLags <- as.numeric(as.Date(fitdata[,"resettleDate"])-as.Date(fitdata[,"reopenDate"]))resettlementLags <- ifelse(resettlementLags==0,runif(1),resettlementLags)rm(fitdata)gc()#Resettlement lag distribution fittingfit<-fitdist(resettlementLags, distr="exp", method="mle", discrete=FALSE)
summary(fit)
#Distribution fitting k-s test#This is to make sure there is no tie in the data for k-s testx<-resettlementLags + max(abs(resettlementLags))*0.0001*runif(length(resettlementLags),0,1)#k-s test
29
z<-ks.test(x,"pexp", 0.005)z$statisticz$p.value
###########################Test Reopen Claim Loss Development###########################Function to calculate expected cumulative development factorsCumDevFac <- function(devYears,settleYears,meanDevFac){
nDevFac<-pmin(length(meanDevFac),settleYears-1)n<-length(meanDevFac)result<-vector()for (i in c(1:length(nDevFac))) {
if (is.na(nDevFac[i]) == TRUE){result <- c(result,NA)
} else {if(devYears[i]==settleYears[i]){
result <- c(result,1)} else {
result <- c(result,prod(meanDevFac[pmin(devYears[i],nDevFac[i]):nDevFac[i]]))}
}}result
}
###Home Line#Get reopen claim datafitdata <- simdata[simdata$LoB == "Home" & simdata$status=="CLOSED" & !is.na(simdata$reopenDate), ]#Development year at the valuation datefitdata[,"devYears"] <- ceiling(as.numeric(as.Date("2017-12-31")-as.Date(fitdata$occurrenceDate))/365)#Development year at the resettlement datefitdata[,"settleYears"] <- ceiling(as.numeric(as.Date(fitdata$resettleDate)-as.Date(fitdata$occurrenceDate))/365)#Cumulative development factors from valuation date to resettlement datefitdata[,"cdf"] <- fitdata$reopenLoss/fitdata$incurredLoss
#Calculate expected cumulative development factorsmeanDevFac <- c(1.05,1.1,1.05,1.06,1.07,1.08,1.09,1.06,1)#This is the assumed expected mean development factor.fitdata[,"excdf"] <- CumDevFac(fitdata$devYears,fitdata$settleYears,meanDevFac)exagg<-aggregate(excdf ~ devYears + settleYears, data = fitdata, mean)#This is the mean development factor from the simulated data.agg<-aggregate(cdf ~ devYears + settleYears, data = fitdata, mean)#This is the standard deviation of the development factor from the simulated data.aggsd<-aggregate(cdf ~ devYears + settleYears, data = fitdata, sd)
#This is the expected mean development factor from the simulated data.fitdata[,"expectedcdf"] <- fitdata$expectedLoss/fitdata$incurredLossexeagg<-aggregate(expectedcdf ~ devYears + settleYears, data = fitdata, mean)
###Auto Line
30
#Get reopen claim datafitdata <- simdata[simdata$LoB == "Auto" & simdata$status=="CLOSED" & !is.na(simdata$reopenDate), ]#Development year at the valuation datefitdata[,"devYears"] <- ceiling(as.numeric(as.Date("2017-12-31")-as.Date(fitdata$occurrenceDate))/365)#Development year at the resettlement datefitdata[,"settleYears"] <- ceiling(as.numeric(as.Date(fitdata$resettleDate)-as.Date(fitdata$occurrenceDate))/365)
#Function to calculate severity indexgetindex <- function(monthlyindex,startDate,dates) {
years <- as.numeric(substr(as.character(dates),1,4))months <- as.numeric(substr(as.character(dates),6,7))startyear <- as.numeric(substr(as.character(startDate),1,4))startmonth <- as.numeric(substr(as.character(startDate),6,7))indices <- pmax(1,pmin(360,(years-startyear)*12+(months-startmonth)+1))monthlyindex[indices]
}
#Calculate severity index startDate <- as.Date("2008-01-01")monthlyindex <- c(rep(1,120),cumprod(c(1*1.03^(1/12),rep(1.03^(1/12),239))))severityindex <- getindex(monthlyindex, startDate, fitdata$resettleDate)
#Cumulative development factors from valuation date to resettlement datefitdata[,"cdf"] <- fitdata$reopenLoss/fitdata$incurredLoss/severityindex
#Calculate expected cumulative development factorsmeanDevFac <- c(1.05,1.1,1.05,1.06,1.07,1.08,1.09,1.06,1)#This is the assumed expected mean development factor.fitdata[,"excdf"] <- CumDevFac(fitdata$devYears,fitdata$settleYears,meanDevFac)exagg<-aggregate(excdf ~ devYears + settleYears, data = fitdata, mean)#This is the mean development factor from the simulated data.agg<-aggregate(cdf ~ devYears + settleYears, data = fitdata, mean)#This is the standard deviation of the development factor from the simulated data.aggsd<-aggregate(cdf ~ devYears + settleYears, data = fitdata, sd)
#This is the expected mean development factor from the simulated data.fitdata[,"expectedcdf"] <- fitdata$expectedLoss/fitdata$incurredLoss/severityindexexeagg<-aggregate(expectedcdf ~ devYears + settleYears, data = fitdata, mean)
###########################Test Report Lag#########################################################Home Line#Get report lag datafitdata <- simdata[simdata$LoB == "Home" & simdata$status=="UPR", ]reportLags <- as.numeric(as.Date(fitdata[,"reportDate"])-as.Date(fitdata[,"occurrenceDate"]))reportLags <- ifelse(reportLags==0,runif(1),reportLags)rm(fitdata)gc()#Report lag distribution fitting
31
fit<-fitdist(reportLags, distr="weibull", method="mle", discrete=FALSE)
summary(fit)
#Distribution fitting k-s test#This is to make sure there is no tie in the data for k-s testx<-reportLags + max(abs(reportLags))*0.0001*runif(length(reportLags),0,1)#k-s testz<-ks.test(x,"pweibull", 9.276, 798.361)z$statisticz$p.value
###Auto Line#Get report lag datafitdata <- simdata[simdata$LoB == "Auto" & simdata$status=="UPR", ]reportLags <- as.numeric(as.Date(fitdata[,"reportDate"])-as.Date(fitdata[,"occurrenceDate"]))reportLags <- ifelse(reportLags==0,runif(1),reportLags)rm(fitdata)gc()#Report lag distribution fittingfit<-fitdist(reportLags, distr="exp", method="mle", discrete=FALSE)
summary(fit)
#Distribution fitting k-s test#This is to make sure there is no tie in the data for k-s testx<-reportLags + max(abs(reportLags))*0.0001*runif(length(reportLags),0,1)#k-s testz<-ks.test(x,"pexp", 0.011)z$statisticz$p.value
###########################Test Settlement Lag##################################Home Line#Get settlement lag datafitdata <- simdata[simdata$LoB == "Home" & (simdata$status=="IBNR" | simdata$status=="UPR"), ]settlementLags <- as.numeric(as.Date(fitdata[,"settlementDate"])-as.Date(fitdata[,"reportDate"]))settlementLags <- ifelse(settlementLags==0,runif(1),settlementLags)rm(fitdata)gc()#Settlement lag distribution fittingfit<-fitdist(settlementLags, distr="weibull", method="mle", discrete=FALSE)
summary(fit)
#Distribution fitting k-s test#This is to make sure there is no tie in the data for k-s testx<-settlementLags + max(abs(settlementLags))*0.0001*runif(length(settlementLags),0,1)#k-s test
32
z<-ks.test(x,"pweibull", 6.479, 179.004)z$statisticz$p.value
###Auto Line#Get settlement lag datafitdata <- simdata[simdata$LoB == "Auto" & (simdata$status=="IBNR" | simdata$status=="UPR"), ]settlementLags <- as.numeric(as.Date(fitdata[,"settlementDate"])-as.Date(fitdata[,"reportDate"]))settlementLags <- ifelse(settlementLags==0,runif(1),settlementLags)rm(fitdata)gc()#Settlement lag distribution fittingfit<-fitdist(settlementLags, distr="weibull", method="mle", discrete=FALSE)
summary(fit)
#Distribution fitting k-s test#This is to make sure there is no tie in the data for k-s testx<-settlementLags + max(abs(settlementLags))*0.0001*runif(length(settlementLags),0,1)#k-s testz<-ks.test(x,"pweibull", 1.005, 311.572)z$statisticz$p.value
###########################Test Severity Distribution################################Home Line#Get severity datafitdata <- simdata[simdata$LoB == "Home" & (simdata$status=="IBNR" | simdata$status=="UPR"), ]severity <- fitdata$totalLossrm(fitdata)gc()#Severity distribution fittingfit<-fitdist(severity, distr="lnorm", method="mle", discrete=FALSE)
summary(fit)
#Distribution fitting k-s test#This is to make sure there is no tie in the data for k-s testx<-severity + max(abs(severity))*0.0001*runif(length(severity),0,1)#k-s testz<-ks.test(x,"plnorm", 12.05007, 0.41117)z$statisticz$p.value
###Auto Line#Get severity datafitdata <- simdata[simdata$LoB == "Auto" & (simdata$status=="IBNR" | simdata$status=="UPR"), ]severity <- fitdata$totalLoss#Remove severity trend
33
#Function to calculate severity indexgetindex <- function(monthlyindex,startDate,dates) {
years <- as.numeric(substr(as.character(dates),1,4))months <- as.numeric(substr(as.character(dates),6,7))startyear <- as.numeric(substr(as.character(startDate),1,4))startmonth <- as.numeric(substr(as.character(startDate),6,7))indices <- pmax(1,pmin(360,(years-startyear)*12+(months-startmonth)+1))monthlyindex[indices]
}
#Calculate severity index startDate <- as.Date("2008-01-01")monthlyindex <- c(rep(1,120),cumprod(c(1*1.03^(1/12),rep(1.03^(1/12),239))))severityindex <- getindex(monthlyindex, startDate, fitdata$settlementDate)severity <- severity/severityindex
rm(fitdata)gc()
#Severity distribution fittingfit<-fitdist(severity, distr="lnorm", method="mle", discrete=FALSE)
summary(fit)
#Distribution fitting k-s test#This is to make sure there is no tie in the data for k-s testx<-severity + max(abs(severity))*0.0001*runif(length(severity),0,1)#k-s testz<-ks.test(x,"plnorm", 8.92714, 0.52153)z$statisticz$p.value
###########################Test Severity, Settlement Lag and Report Lag Copula##########################Home Line#Get data (Only future claims as severity, settlement lag and report lag are all simulatedfitdata <- simdata[simdata$LoB == "Home" & simdata$status == "UPR", ]fitdata$reportLag <- as.numeric(as.Date(fitdata[,"reportDate"])-as.Date(fitdata[,"occurrenceDate"]))fitdata$settlementLag <- as.numeric(as.Date(fitdata[,"settlementDate"])-as.Date(fitdata[,"reportDate"]))
#Remove severity trend#Function to calculate severity indexgetindex <- function(monthlyindex,startDate,dates) {
years <- as.numeric(substr(as.character(dates),1,4))months <- as.numeric(substr(as.character(dates),6,7))startyear <- as.numeric(substr(as.character(startDate),1,4))startmonth <- as.numeric(substr(as.character(startDate),6,7))indices <- pmax(1,pmin(360,(years-startyear)*12+(months-startmonth)+1))monthlyindex[indices]
}
34
#Calculate severity index startDate <- as.Date("2008-01-01")monthlyindex <- c(rep(1,360))severityindex <- getindex(monthlyindex, startDate, fitdata$settlementDate)fitdata$severity <- fitdata$totalLoss/severityindex
fitdata <- fitdata[,colnames(fitdata) %in% c("severity", "settlementLag", "reportLag")]fitdata <- fitdata[,c(3,2,1)]gc()
library(copula)cop <- normalCopula(c(0,0,0), dim=3, dispstr="un")u <- pobs(fitdata)fitcop <- fitCopula(cop, u)
assumedcop <- normalCopula(c(0.9034,-0.0154,-0.0147), dim=3, dispstr="un")gof <- gofCopula(assumedcop, u, N=200, simulation="mult", method="Sn", ties=FALSE, hideWarnings = TRUE)
###Auto Line#Get data (Only future claims as severity, settlement lag and report lag are all simulatedfitdata <- simdata[simdata$LoB == "Auto" & simdata$status == "UPR", ]fitdata$reportLag <- as.numeric(as.Date(fitdata[,"reportDate"])-as.Date(fitdata[,"occurrenceDate"]))fitdata$settlementLag <- as.numeric(as.Date(fitdata[,"settlementDate"])-as.Date(fitdata[,"reportDate"]))
#Remove severity trend#Function to calculate severity indexgetindex <- function(monthlyindex,startDate,dates) {
years <- as.numeric(substr(as.character(dates),1,4))months <- as.numeric(substr(as.character(dates),6,7))startyear <- as.numeric(substr(as.character(startDate),1,4))startmonth <- as.numeric(substr(as.character(startDate),6,7))indices <- pmax(1,pmin(360,(years-startyear)*12+(months-startmonth)+1))monthlyindex[indices]
}
#Calculate severity index startDate <- as.Date("2008-01-01")monthlyindex <- c(rep(1,120),cumprod(c(1*1.03^(1/12),rep(1.03^(1/12),239))))severityindex <- getindex(monthlyindex, startDate, fitdata$settlementDate)fitdata$severity <- fitdata$totalLoss/severityindex
fitdata <- fitdata[,colnames(fitdata) %in% c("severity", "settlementLag", "reportLag")]fitdata <- fitdata[,c(3,2,1)]gc()
library(copula)cop <- normalCopula(c(0,0,0), dim=3, dispstr="un")u <- pobs(fitdata)
35
fitcop <- fitCopula(cop, u)
assumedcop <- normalCopula(c(0.003,0.0009,-0.0162), dim=3, dispstr="un")u <- u[1:10000,] #Too much data. Using only the first 10000 to avoid memory limit break.gof <- gofCopula(assumedcop, u, N=200, simulation="mult", method="Sn", ties=FALSE, hideWarnings = TRUE)
###########################Test Frequency Distribution#############################Home Line#Get frequency data. Future claims are used because closed and open claims are fixed based on claim data and will underestimate the volatility.fitdata <- simdata[simdata$LoB == "Home" & simdata$status == "UPR", ]fitdata$index <- (as.numeric(substr(as.character(fitdata$occurrenceDate),1,4))-2008)*12+as.numeric(substr(as.character(fitdata$occurrenceDate),6,7))freqagg <- aggregate(ClaimID ~ index + Sim, data = fitdata, length)frequency <- freqagg[,3]rm(fitdata)gc()#Frequency distribution fittingfit<-fitdist(frequency, distr="pois", method="mle", discrete=TRUE)
summary(fit)
#Distribution fitting chi squared testl <- 101.067 #Assumptionx <- frequencym = mean(x)s=sqrt(var(x))
mybreak<-c(m-4*s, m-3*s,m-2*s,m-s, m-s/2, m-s/4,m,m+s/4,m+s/2,m+s,m+2*s,m+3*s,m+4*s)mybreak <-mybreak[mybreak>=0]mybreak<-unique(round(mybreak))
mycut<-cut(x,breaks = mybreak)empirical<-as.vector(table(mycut))mybreak2<-mybreak[seq(2, length(mybreak), by=1)]mybreak1<-mybreak[seq(1, length(mybreak)-1, by=1)]
prob<- ppois(mybreak2, l)-ppois(mybreak1, l)z<-chisq.test(empirical, p=prob, rescale.p=TRUE)z$statisticz$p.value
###Auto Line#Get frequency datafitdata <- simdata[simdata$LoB == "Auto" & simdata$status == "UPR", ]fitdata$index <- (as.numeric(substr(as.character(fitdata$occurrenceDate),1,4))-2008)*12+as.numeric(substr(as.character(fitdata$occurrenceDate),6,7))freqagg <- aggregate(ClaimID ~ index + Sim, data = fitdata, length)
36
#Remove frequency trend (exposure index)freqagg[,3] <- freqagg[,3]/(1.08^(freqagg[,1]/12))
frequency <- round(freqagg[,3])rm(fitdata)gc()#Frequency distribution fittingfit<-fitdist(frequency, distr="nbinom", method="mle", discrete=TRUE, lower = c(0, 0))
summary(fit)
#Distribution fitting chi squared testsize <- 2102.025 #Assumptionp <- 0.926 #Assumptionx <- frequencym = mean(x)s=sqrt(var(x))
mybreak<-c(m-4*s, m-3*s,m-2*s,m-s, m-s/2, m-s/4,m,m+s/4,m+s/2,m+s,m+2*s,m+3*s,m+4*s)mybreak <-mybreak[mybreak>=0]mybreak<-unique(round(mybreak))
mycut<-cut(x,breaks = mybreak)empirical<-as.vector(table(mycut))mybreak2<-mybreak[seq(2, length(mybreak), by=1)]mybreak1<-mybreak[seq(1, length(mybreak)-1, by=1)]
prob<- pnbinom(mybreak2, size, p)-pnbinom(mybreak1, size, p)z<-chisq.test(empirical, p=prob, rescale.p=TRUE)z$statisticz$p.value
###########################Test Exposure Index#########################################Home Line#Get frequency datafitdata <- simdata[simdata$LoB == "Home", ]fitdata$index <- (as.numeric(substr(as.character(fitdata$occurrenceDate),1,4))-2008)*12+as.numeric(substr(as.character(fitdata$occurrenceDate),6,7))freqagg <- aggregate(ClaimID ~ index + Sim, data = fitdata, length)frequency <- aggregate(ClaimID ~ index, data = freqagg, mean)rm(fitdata)gc()
monthlyindex <- c(rep(1,360))
simindex <- c(rep(NA,360))
for (i in c(1:360)){
37
for (j in c(1:nrow(frequency))){if (frequency[j,1]==i) {
simindex[i]=frequency[j,2]/frequency[1,2]}
}}
plot(simindex[1:132],xlab="Month",ylab="Index")lines(monthlyindex[1:132],col="green")
###Auto Line#Get frequency datafitdata <- simdata[simdata$LoB == "Auto", ]fitdata$index <- (as.numeric(substr(as.character(fitdata$occurrenceDate),1,4))-2008)*12+as.numeric(substr(as.character(fitdata$occurrenceDate),6,7))freqagg <- aggregate(ClaimID ~ index + Sim, data = fitdata, length)frequency <- aggregate(ClaimID ~ index, data = freqagg, mean)rm(fitdata)gc()
monthlyindex <- cumprod(c(1,rep(1.08^(1/12),359)))
simindex <- c(rep(NA,360))
for (i in c(1:360)){for (j in c(1:nrow(frequency))){
if (frequency[j,1]==i) {simindex[i]=frequency[j,2]/frequency[1,2]
}}
}
plot(simindex[1:132],xlab="Month",ylab="Index")lines(monthlyindex[1:132],col="green")
###########################Test Frequency Copula###################################Get frequency data (only future claims as open and closed claims are fixed, not simulated)fitdata <- simdata[simdata$status == "UPR",]fitdata$index <- (as.numeric(substr(as.character(fitdata$occurrenceDate),1,4))-2008)*12+as.numeric(substr(as.character(fitdata$occurrenceDate),6,7))freqagg <- aggregate(ClaimID ~ index + Sim + LoB, data = fitdata, length)home <- freqagg[freqagg$LoB=="Home",]auto <- freqagg[freqagg$LoB=="Auto",]home$auto <- ifelse(home$index==auto$index & home$Sim==auto$Sim,auto$ClaimID,NA)
#Remove Auto Frequency Trendhome$auto <- round(home$auto/(1.08^(home$index/12)))
fitdata <- home[,colnames(home) %in% c("ClaimID", "auto")]
38
gc()
library(copula)cop <- normalCopula(c(0), dim=2, dispstr="un")u <- pobs(fitdata)fitcop <- fitCopula(cop, u)
assumedcop <- normalCopula(c(0.6131), dim=2, dispstr="un")gof <- gofCopula(assumedcop, u, N=200, simulation="mult", method="Sn", ties=FALSE, hideWarnings = TRUE)
###########################Test Deductible and Limit##################################Check if ultimate Loss is smaller than LimitultimateLosses <- ifelse(is.na(simdata$resettleDate), simdata$ultimateLoss, simdata$reopenLoss)sum(ultimateLosses > simdata$Limit)#Check if ultimate loss = min(max(deductible, total loss),deductible + Limit)-deductiblefitdata <- simdata[simdata$status == "IBNR" | simdata$status=="UPR", ]summary(fitdata$ultimateLoss - (pmin(pmax(fitdata$Deductible,fitdata$totalLoss),fitdata$Deductible+fitdata$Limit)-fitdata$Deductible))rm(fitdata)gc()
###Home Line#Get deductible and limit datadeductibles <- simdata[simdata$LoB == "Home",]$Deductiblefor (i in unique(deductibles)){
if (length(deductibles[deductibles==i])/length(deductibles)>0.1){print(paste0(i," ",round(length(deductibles[deductibles==i])/length(deductibles),3)))
}}
limits <- simdata[simdata$LoB == "Home",]$Limitfor (i in unique(limits)){
if (length(limits[limits==i])/length(limits)>0.1){print(paste0(i," ",round(length(limits[limits==i])/length(limits),3)))
}}
###Auto Line#Get deductible and limit datadeductibles <- simdata[simdata$LoB == "Auto",]$Deductiblefor (i in unique(deductibles)){
if (length(deductibles[deductibles==i])/length(deductibles)>0.1){print(paste0(i," ",round(length(deductibles[deductibles==i])/length(deductibles),3)))
}}
limits <- simdata[simdata$LoB == "Auto",]$Limitfor (i in unique(limits)){
39
if (length(limits[limits==i])/length(limits)>0.1){print(paste0(i," ",round(length(limits[limits==i])/length(limits),3)))
}}
###########################Test LAE#########################################################fitdata <- simdata[simdata["status"]=="IBNR" | simdata[,"status"]=="UPR",]###Development year at the valuation date where incurred losses are recordedfitdata[,"devYears"] <- ceiling(as.numeric(as.Date(fitdata$reportDate)-as.Date(fitdata$occurrenceDate))/365)
###This is for Home Lineregdata <- fitdata[fitdata[,"LoB"]=="Home",]
regdata <- regdata[,colnames(regdata) %in% c("ultimateLAE","expectedLAE","devYears","incurredLoss","osRatio")]f <- as.formula(ultimateLAE ~ devYears + incurredLoss + osRatio)lm <- lm(f,data = regdata)summary(lm)
f <- as.formula(expectedLAE ~ devYears + incurredLoss + osRatio)lm <- lm(f,data = regdata)summary(lm)
###This is for Auto Lineregdata <- fitdata[fitdata[,"LoB"]=="Auto",]
regdata <- regdata[,colnames(regdata) %in% c("ultimateLAE","expectedLAE","devYears","incurredLoss","osRatio")]regdata$ultimateLAE <- exp(regdata$ultimateLAE)regdata$expectedLAE <- exp(regdata$expectedLAE)f <- as.formula(ultimateLAE ~ devYears + incurredLoss + osRatio)lm <- lm(f,data = regdata)summary(lm)
f <- as.formula(expectedLAE ~ devYears + incurredLoss + osRatio)lm <- lm(f,data = regdata)summary(lm)
40