Multivariate probit models for conditional claim-types

Insurance: Mathematics and Economics 44 (2009) 214–228

Contents lists available at ScienceDirect

Insurance: Mathematics and Economics

journal homepage: www.elsevier.com/locate/ime

Multivariate probit models for conditional claim-typesGary Young a,∗, Emiliano A. Valdez b, Robert Kohn ca Access Economics Pty Ltd, Level 1, 39 Brisbane Avenue, Barton, ACT 2600, Australiab Department of Mathematics, College of Liberal Arts & Sciences, University of Connecticut, Storrs, CT 06269-3009, USAc Schools of Economics and Finance, Australian School of Business, University of New South Wales, Sydney, 2052, Australia

a r t i c l e i n f o

Keywords:CorrelationInsuranceMultinomial logit

a b s t r a c t

This paper considers statistical modeling of the types of claim in a portfolio of insurance policies. Forsome classes of insurance contracts, in a particular period, it is possible to have a record of whetheror not there is a claim on the policy, the types of claims made on the policy, and the amount of claimsarising from each of the types. A typical example is automobile insurance where in the event of aclaim, we are able to observe the amounts that arise from say injury to oneself, damage to one’s ownproperty, damage to a third party’s property, and injury to a third party. Modeling the frequency andthe severity components of the claims can be handled using traditional actuarial procedures. However,modeling the claim-type component is less known and in this paper, we recommend analyzing thedistribution of these claim-types using multivariate probit models, which can be viewed as latentvariable threshold models for the analysis of multivariate binary data. A recent article by Valdezand Frees [Valdez, E.A., Frees, E.W., Longitudinal modeling of Singapore motor insurance. Universityof New South Wales and the University of Wisconsin-Madison. Working Paper. Dated 28 December2005, available from: http://wwwdocs.fce.unsw.edu.au/actuarial/research/papers/2006/Valdez-Frees-2005.pdf] considered this decomposition to extend the traditional model by including the conditionalclaim-type component, and proposed the multinomial logit model to empirically estimate thiscomponent. However, it is well known in the literature that this type of model assumes independenceacross the different outcomes. We investigate the appropriateness of fitting a multivariate probit modelto the conditional claim-type component in which the outcomes may in fact be correlated, with possibleinclusion of important covariates. Our estimation results show thatwhen the outcomes are correlated, themultinomial logit model produces substantially different predictions relative to the true predictions; andsecond, through a simulation analysis, we find that even in ideal conditions underwhich the outcomes areindependent, multinomial logit is still a poor approximation to the true underlying outcome probabilitiesrelative to themultivariate probitmodel. The results of this paper serve to highlight the trade-off betweentractability and flexibility when choosing the appropriate model.

© 2008 Elsevier B.V. All rights reserved.

1. Introduction and previous literature

Consider a portfolio of insurance policies with J possible claim-types. Denote Ij to be the indicator variable that a contract fromthis portfolio had a claim of type j where j = 1, 2, . . . , J . Denotethe vector

M =(I1, I2, . . . , IJ

)′to be the vector of claim-types for this policy. One can think ofthe zero vector as the vector of claim-types that gives rise tothe situation when there is no claim. Thus, there are 2J possible

∗ Corresponding author.E-mail addresses: [email protected] (G. Young),

[email protected] (E.A. Valdez), [email protected] (R. Kohn).

0167-6687/$ – see front matter© 2008 Elsevier B.V. All rights reserved.doi:10.1016/j.insmatheco.2008.11.004

combinations of claim-type vectors M . This random variable isclearly a discrete random variable with possible values describingthe different possible combinations of Ij. In this paper, we restrictour considerations to observations in which a claim was made sothat we simply have 2J − 1 claim-types. For our purposes, we callour observations conditional claim-type components.As a simple illustration, consider a portfolio of automobile

insurance policieswherewe can observe twopossible claim-types:personal injury and damage to property. Denote the case j = 1for personal injury and j = 2 for damage to property. Thus,there are 4 possible combinations of M: (0, 0) which correspondsto the case where there is no claim, (1, 0) which corresponds tothe case where there is claim due to personal injury only, (0, 1)which corresponds to the case where there is a claim due todamage to property only, and (1, 1) which corresponds to thecase where there is a claim arising from personal injury as well

http://www.elsevier.com/locate/ime

http://www.elsevier.com/locate/ime

http://wwwdocs.fce.unsw.edu.au/actuarial/research/papers/2006/Valdez-Frees-2005.pdf


mailto:[email protected]



http://dx.doi.org/10.1016/j.insmatheco.2008.11.004

G. Young et al. / Insurance: Mathematics and Economics 44 (2009) 214–228 215

as property damage. Our model formulation does not preclude usfromconsidering the possible dependency between the occurrenceof personal injury and the occurrence of damage to property. Onewould suspect some positive form of dependency. In the case ofan automobile accident, when there is damage to property, thechances are high that therewill also be personal injuries associatedwith it.As is already well known in the actuarial literature, the ag-

gregate loss distribution is obtained by its traditional decompo-sition into the frequency and severity components, where eachcomponent is then modeled separately. There has been almostno attempt in the actuarial literature to model the conditionalclaim-type component of the aggregate loss distribution. Recently,however, Valdez and Frees (2005), in a working paper, proposed ahierarchical model structure for the estimation and prediction ofthe aggregate loss distribution using a highly detailed, micro-level,automobile insurance dataset. To illustrate this decomposition,consider a risk class i at calendar year t for which the potential ob-servable responses for observational unit {it} consist of:

Kit the indicator of a claim within a year;Mit,j the type of claim, available for each claim,

j = 1, . . . ,Nt ;Ait,jk the loss amount, available for each claim,

j = 1, . . . ,Nt , andfor each type of claim k = 1, . . . ,m.

where Nt denotes the number of claims within the calendar year t .In the notation used, f (K) denotes the frequency component

and is equal to the probability of having a claim (or no claim)in a given calendar year; f (M|K) denotes the conditional claim-type component and is equal to the probability of having a claim-type ofM, given K ; and f (A|K ,M) denotes the conditional severitycomponent, and is equal to the probability density of the claimvector A given K and M. Here, conditional on having observed aclaim, the random variableM describes the combination observed.Each combination observed is anm-tuplet of the form (i1, . . . , im),where each ik, for k = 1, . . . ,m, is equal to one if the kthtype of claim is observed and zero otherwise. Thus, the m-tuplet(1, 0, . . . , 0) means that there was a claim with respect to ‘‘firsttype of claim’’ only, and similarly, (1, 1, . . . , 1)means that all typesof claims were observed.Suppose that we have multinomial observations such thatMi is

the observation on unit (or individual) i, withMi takingm possiblevalues. We assume for now that the Mi are generated by themultinomial logistic model

Pr(Mi = j|xi) =exp(x′iβj)m∑k=1exp(x′iβk)

. (1)

The work of McFadden (1974) shows that we can express Eq. (1) inlatent variable form as follows. Let

Uij = x′iβj + εij j = 1, . . . ,m (2)

where the εij are are independent and have type I extreme-valuedistribution. ByMcFadden (1974),Mi = j if and only ofUij > Uik forall k 6= j. See Amemiya (1985) and Maddala (1983) for a proof. Aswith any multinomial model, parameter identification is achievedby taking one of βj as zero.In the MNL framework under which the latent variables

depend on covariate values that vary across individuals but notacross alternatives, the assumption of independent and identicallydistributed εij for j = 1, . . . ,m implies that the ratio of theprobability of outcome j to the probability of some other outcomek is independent of every other alternative l, l = 1, . . . ,m, l 6=j, k. This property is known as the Independence of Irrelevant

Alternatives (IIA). Valdez and Frees (2005) use a single covariate,gross premiums, which varies across individuals but not acrossdifferent outcomes. Their results suggest that higher levels ofpremium are associated with higher probabilities of observingeach of the remaining six claim-type combinations relative to thebase outcome ofM = 2, but only two of these coefficient estimatesare precisely estimated at the traditional levels of significance. Inthe context of the IIA assumption, it is not difficult to envisagescenarios in which the stochastic component associated withobserving a particular claim-type combination may in fact becorrelatedwith that of another combination. In these scenarios, theMNL is inappropriate.Often one’s preferred probabilistic choicemodel stems not only

from considerations which pertain largely to the type of data onehas, but also those of tractability and flexibility. In relation tothe latter two considerations, McFadden (1981, p. 217) commentsthat, for a given parameterization of the chosen probabilisticchoice model, there should be ‘‘sufficient flexibility to capturepatterns of substitution between alternatives’’ and the chosenparameterization must also be computationally tractable. Thus,in the present context of modeling the claim-type combinationwhich manifests as a discrete, and unordered, categorical variable,these two considerations naturally point towards the use ofthe multinomial probit model, hereafter referred to as MNP,as a natural alternative to the MNL model. The formulationof the MNP model is similar to that of the MNL under therandomutility framework, with the exception that the unobserveddisturbances of each outcome, for m mutually exclusive andexhaustive outcomes in the choice set, now have a multivariateNormal distributionwith a zeromeanvector and covariancematrix

6 =

σ21 · · · σ1J...

. . ....

σJ1 · · · σ 2J

, (3)

instead of being independently and identically distributed accord-ing to the type I extreme value distribution under the MNL model.Here, σsj is the correlation between εs of outcome s and εj of out-come j; that is, the correlation between the unobservables influ-encing the utilities of outcomes s and j. Formally, when expressedin utility differences, there are at most m(m − 1)/2 free param-eters in the covariance matrix which can be estimated (Daganzo,1979). Thus, for a trinomial probit case wherem = 3, for example,there are at most three elements of the covariance matrix whichwe can estimate; these elements correspond to the lower trian-gular (or upper triangular) off-diagonal elements in 6, where thevariances on the diagonal are normalized to one. It is clear fromthis formulation, then, that theMNP relaxes the stringent IIA prop-erty inherent in MNL and is therefore more flexible in its spec-ification of the covariance matrix. However, the viability of theMNP as a framework formodelingmultinomial choice has receivedmuch attention in the literature. Juxtaposed against this flexibilityare issues of tractability in the MNP model. Weeks (1997) bringsto light the trade-off between the tractability and flexibility con-siderations alluded to by McFadden (1981). Here, tractability con-siderations pertain to the evaluation of high dimensional Normalintegrals at each iteration of the optimization procedure, and isparticularly restrictive for a large number of alternatives.1 How-ever, with the development of a computationally practical ap-proach of simulated method of moments estimator, along withconvenient reparameterization methods, this issue is now ofa much lesser concern. In addition, the key papers of Bunch

1 Some authors consider more than three or four alternatives as ‘‘large’’;seeMaddala (1983, p. 63), for example.

216 G. Young et al. / Insurance: Mathematics and Economics 44 (2009) 214–228

(1991), Bolduc (1992) and Keane (1992) stress yet another prac-tical problem with estimating the MNP; not all parameters in theMNP may be identified and thus estimated in the MNP even if thedata are well behaved. More specifically, parameter identificationis, in the words of Keane (1992, p. 193), ‘‘tenuous’’ or ‘‘fragile’’ inthe absence of exclusion restrictions. What this means is that thepractical estimation of the MNP requires that certain covariatesdo not affect the utility levels of certain outcomes, despite therebeing no requirement for such restrictions under formal identifi-cation of MNP models. The implication here is that estimation ofthe MNP model is likely to be problematic if the outcomes are be-ing explained by characteristics which vary for each individual butnot across the different outcomes. The characteristics which wepropose to use vary for each policyholder, but not across differentclaim-type combinations. These practical considerations form themotivation for using the multivariate probit model to estimate theclaim-type component. As shall be seen, our choice of this model isalsomore parsimonious in its application to the insurance portfolioconsidered in the paper.The multivariate probit model has never been considered in

the context of modeling motor insurance claims. In particular,considerations of an additional claim-type component of theaggregate loss distribution have been very limited, with Valdezand Frees (2005) providing a clear example of such with the use ofthe MNL discrete choice model. A related, but somewhat remote,example which considers the claim-type component is that ofPinquet (1998) who considers two types of claims, claims at faultand those not at fault. Pinquet (1998) models the frequency (claimcount) component for each of these two types using a fixed effectPoisson model with covariates. In the general model with q typesof claims, the fixed effect is assumed to be time-invariant foreach policyholder i and each type of claim j for j ∈ q (Pinquet,1998, p. 208). Thus, for a portfolio of p insurance policies at timet , the number of claims, N , associated with each claim-type j is arandom variable which follows a Poisson distribution with meanparameter

λitj = exp(θ′jxitj + u

ij

)specified as a function of ‘‘rating’’ factors observed at the fleet orindividual vehicle level, and

N itj ∼ Poisson(λitj ), i = 1, . . . , p j = 1, . . . , q t = 1, . . . , T ,

where xitj and θj are vectors of covariates and parameters, re-spectively, which differ for each claim-type j, and uj is the time-invariant fixed effect which accounts for the heterogeneity in thedistribution. For q claim-types then, the associated claims fre-quency is modeled from a q-variate Poisson distribution (Pinquet,1998, p. 208).The approach taken in this article to model the conditional

claim-type component is different from Valdez and Frees (2005)because we do not assume independence of the disturbancesarising from each claim-type. It is also different from Pinquet(1998) because we focus on modeling the joint probability ofobserving a particular claim-type combination, conditional onthere being at least one claim. Furthermore, in this paper, weexplore the modeling of the multivariate claim-types arisingfrom automobile insurance policies. However, there are otherpossible situations and other possible contexts for which themodel formulation proposed in this paper can be applied. Take, forexample, the case of modeling operational risks which is receivinga lot of attention as of late. Some classify operational losses intoseveral types including losses arising fromprocess risk, people risk,system risk, business strategy risk and external environment risk.See Saunders et al. (2003). The traditional (actuarial) approach ofmodeling operational losses is to consider the frequency of loss andthe severity of the loss. If the company is able to classify losses

according to the various types, itmay detect possible dependencieson the types of losses and answer the question: how do losses fromone type of risk affect the losses arising from the other types?For the empirical investigation section of this paper, we con-

sider the claims experience data derived from vehicle insuranceportfolios of general insurance companies in Singapore. The pri-mary source of this data is the General Insurance Association ofSingapore, an organization consisting of most of the general in-surers in Singapore. The observations are from each policyholderover a period of nine years: January 1993 until December 2001.To provide focus, we restrict our considerations to policies froma single insurance company with the unit of observation beinga registered vehicle insured. We further break down these regis-tered vehicles according to their exposure in each calendar year1993–2001.Moreover, records fromour databases show thatwhena claim payment ismade, we can also identify the type of claim. Forour data, there are three types: (1) claims for injury to a party otherthan the insured, (2) claims for damages to the insured includinginjury, property damage, fire and theft, and (3) claims for propertydamage to a party other than the insured.This paper is organized as follows. Section 2 introduces the

theoretical foundations and motivations of the multivariate probitmodel in analyzing potentially correlated multivariate outcomes.Here, we also provide a summary of a selection of literaturewhich has used the multivariate probit model. Section 3 discussesthe data used in the analysis and identifies its stylized featureswhich motivates our preferred model of choice. We estimate amultivariate probitmodel for claim-type anddiscuss the importantimplications of the results. Section 4 provides the numerical resultsof our simulation which substantiate the conclusions reached inSection 5.

2. The formulation of the multivariate probit model

This section details the specification of the multivariate probitmodel, hereafter referred to as MVP, that is used to fit thedistribution of different claim-types. To reiterate, conditional onthere being at least one claim, we observe any 2J −1 combinationsof the J different claim-types. We begin by first defining thenotation consistent with that used in the introduction. Let I◦jdenote the underlying latent response associated with the jth typeof claim, for j = 1, . . . , J , and Ij denote the binary responseoutcome associated with the same type. Using the indicatorfunction, Ij is equal to one if there is a claim with respect to the jthtype, and zero otherwise. Therefore, our MVP may be specified asa linear combination of a deterministic and stochastic componentsas follows:

I◦1 = x′β1 + ε1, for I1 = I{I◦1>0}I◦2 = x′β2 + ε2, for I2 = I{I◦2>0}

......

I◦J = x′βJ + εJ , for IJ = I{I◦J >0}

(4)

where x = (1, x1, . . . , xp)′ is a vector of p covariates which donot differ for each claim-type (the deterministic component) andβj = (βj0, βj1, . . . , βjp)

′ is a corresponding vector of parameters,including an intercept, which we seek to estimate. Note thatthe observation subscript i has been suppressed for notationalconvenience. The stochastic component, εj, may be thought ofas consisting of those unobservable factors which explain themarginal probability of making a type j claim. Each εj is drawnfrom a J-variate Normal distribution with zero conditional meanand variance normalized to unity (for reasons of parameter


Table 1Sample space of the conditional claim-type variableM .

M 1 2 3 4 5 6 7

Binary triplet (1, 0, 0) (0, 1, 0) (0, 0, 1) (1, 1, 0) (1, 0, 1) (0, 1, 1) (1, 1, 1)

identifiability), where ε ∼ N (0,6), and the covariance matrix 6is given by

6 =

1 ρ12 · · · ρ1Jρ21 1 · · · ρ2J...

.... . .

...ρJ1 ρJ2 · · · 1

. (5)

Of particular interest are the off-diagonal elements in the co-variance matrix, ρsj, which represents the unobserved correlationbetween the stochastic component of the sth and the jth types ofclaim. Moreover, because of symmetry in covariances, we neces-sarily have ρsj = ρjs. As we saw previously, this covariance matrixis similar to that of the MNP, except the variances here are nor-malized to unity. We stress that our motivation for the joint esti-mation of correlated claim-types is not from the potential gain inefficiency, but of the ability to estimate the joint probabilities ofthe outcomes.Note that in this formulation of the MVP model, we can

derive marginal probabilities directly. For instance, the marginalprobability of observing the jth type of claim can be expressed as

Pr(Ij = 1) = Φ(x′βj), for j = 1, . . . , J (6)

where Φ(·) denotes the cumulative distribution function of thestandard Normal. Moreover, the joint probability of observing allpossible types of claim comes from a J-variate standard Normaldistribution

Pr(I1 = 1, . . . , IJ = 1) = ΦJ(x′β1, . . . , x′βJ;6), (7)

where 6 is the covariance matrix.Multivariate correlated binary observations arise in numerous

contexts. An oft-cited example in the literature on animal studiesis that of the ‘‘litter effect’’, where there is a greater tendency oflikeness in individual responseswithin a litter relative to responsesacross different litters. Another example is that of the responsesof different and separable physiological systems of an organismto the exposure of stimuli. It is perceived that the biologicalresponse of one physiological system to an injection of stimulimay be correlated with that of another physiological system.These examples point towards a central issue; that in an analysisof correlated quantal response data, one must account for thecorrelation structure between different levels of response if, apriori, there is a perceived possibility that these responses mayin fact be correlated. This was the motivating theme behind thedevelopment of the multivariate probit model.The seminal paper by Ashford and Sowden (1970) marks

the development of the multivariate probit model. The authorsgeneralized the univariate probit model for binary responses inconsideration of a multi-level, vector-valued, response structureto different physiological systems of an organism. The quantalresponse of each system ismanifested as an underlying continuouslatent variable which is discretized subject to a thresholdspecification. Since this seminal paper, other works have appliedthe MVP in various contexts; see, for example, the worksof Gibbons and Wilcox-Gök (1998) and Balia and Jones (2004).

3. Empirical investigation

3.1. Data characteristics

The data used in our empirical investigation to illustrate fittingthe multivariate probit models has been sourced from the GeneralInsurance Association of Singapore (GIA), an organization of allgeneral insurers in the country, whose primary objectives are,among others, to ‘‘foster public confidence in and respect for theinsurance industry’’ and to ‘‘establish a sound insurance structureand promotion of greater efficiency within the industry’’. Themotor insurance industry is the single largest class of insurancein the country, comprising approximately a third of the marketshare of the entire general insurance market. Each motor vehiclemust have a valid insurance policy for it to be legally operated,where the minimum required is that of coverage against thirdparty personal injury. Moreover, there are three major types ofcoverage available, which consist of third party, third party fire andtheft, and comprehensive.To give a general overview of the size of the data, the complete

data set consists of over five million records from some forty-nineinsurance companies. Each record is an observation correspondingto the claims experience of a registered motor vehicle in a givencalendar year. The data spans over a period of ten calendar years,starting from 1 January 1993 until 31 December 2002. The lengthof a typical motor vehicle insurance policy is one year. Becauseinsurance policies are very infrequently purchased at the start ofthe year, a peculiar feature of this data set is the treatment of thelength of coverage as a fraction of a calendar year; this is known as‘exposure’, and is the period (as a fraction of a calendar year) duringwhich a policyholder had insurance coverage.To provide focus, we restrict our consideration to the claims

experience of automobile insurance policies of one randomly-selected general insurer in our dataset. See Valdez and Frees (2005,p. 3) for full details of description of the dataset. The trivariatebinary response vector consists of the following three types ofclaims:

(1) claims for personal injury to a party other than the insured(Injury);

(2) claims for damages to the insured, including personal injury,property damage, and fire and theft (Own); and

(3) claims for property damage to a party other than the insured(Property).

All possible claim-type combinations that we observe from thisdataset are illustrated in Table 1. Furthermore, the probit modelis now a trivariate probit model which we hereafter refer to as theTVP model.Although the data consist of observations that included a

vector of driver, policy and vehicle-specific characteristics, wefind that in our model investigation, only the level of grosspremiums, adjusted for the length of exposure, provides thesingle most important covariate. Premium calculation principlesand, in practice, experience rating schemes determine thelevel of premium as a function of the policyholder’s historicalclaims experience, as well as various policy and vehicle-specificcharacteristics. Therefore, it is possible that including thesecovariates, in addition to premium, may give rise to collinearity.Table 2 provides summary statistics of premiums by claim-typecombinations. From Table 2, we observe that, generally, higher


Table 2Summary statistics for premium sorted by claim-type.

M Binary triplet Frequency % Mean Std. dev. Minimum Maximum

1 (1, 0, 0) 152 6.80 0.743 0.516 0.015 4.2532 (0, 1, 0) 211 9.44 1.153 1.163 0.012 6.7783 (0, 0, 1) 1673 74.82 0.719 0.513 0.007 6.6884 (1, 1, 0) 30 1.34 1.192 1.318 0.039 4.5705 (1, 0, 1) 119 5.32 0.856 0.841 0.006 5.0266 (0, 1, 1) 33 1.48 0.953 1.240 0.080 6.4577 (1, 1, 1) 18 0.81 1.132 0.949 0.023 4.045

The binary triplets represent the possible claim-type combinations. Each element in the triplet is a binary variable for a claim-type (Injury, Own, Property), which is equalto one if there was a claim with respect to that type and zero otherwise. Units are in thousands of Singaporean dollars.

Table 3Sample unconditional and conditional probabilities.

‘‘Injury’’ ‘‘Own’’ ‘‘Property’’

Prob(·) 0.143 0.131 0.824Prob(·|I = 1) 1.000 0.150 0.429Prob(·|O = 1) 0.164 1.000 0.175Prob(·|P = 1) 0.074 0.028 1.000Prob(·|I = 1,O = 1) 1.000 1.000 0.375Prob(·|I = 1, P = 1) 1.000 0.131 1.000Prob(·|O = 1, P = 1) 0.353 1.000 1.000

average premiums are associated with multiple claim-types. Inparticular, the average premium associated with personal injuryto a third party (M = 1) and that associated with an insuredsown injuries and damages (M = 2) are higher than the averagepremium associated with third party property damage (M =

3). Those who claim with respect to all three types face thethird highest average premium; this is an odd result whichmay be simply be attributable to random variation in the data.Policyholders who claim with respect to ‘‘Injury’’ and and ‘‘Own’’damages (M = 4), face the highest average premium, which isfollowedby claimswith respect to ‘‘Own’’ injuries only (M = 2). Animportant characteristic of this distribution is that, in any binarytriplet, a claim with respect to ‘‘Own’’ injuries and ‘‘Property’’damages entails a higher average premium, relative to the otherclaim-types.We can also deduce from Table 2 the sample joint andmarginal

probabilities. The probability of observing ‘‘Property’’ damage isthe highest (82.42%) among all types of claim, with the other twotypeswith relatively the samemarginal probabilities; themarginalprobabilities of observing ‘‘Injury’’ is 14.27%while that of observing‘‘Own’’ damages is 13.06%. The probability of observing all threetypes at once is at the rate of 0.81%. From the table, we also findthat the sample probability of observing only ‘‘Injury’’ claims is6.80%, only ‘‘Own’’ damages is 9.44% and only ‘‘Property’’ damagesis 74.82% (the highest among all three types of claim.Although these empirical joint and marginal probabilities

provide interesting results, the sample conditional probabilitiesin Table 3 provide an interesting indication of the existenceof possible dependence between the types of claim. Considerfor example the case of ‘‘Property’’ damages. The unconditionalprobability of a claim with ‘‘Property’’ damages is 0.82. However,among all accidents with ‘‘Injury’’ claims, the sample probabilitywith ‘‘Property’’ damages is 0.43; among all accidents with‘‘Own’’ damages, the probability would be only 0.18. This meansthat the probability of a claim with ‘‘Property’’ damages couldbe substantially reduced if there is additional information thatanother type of claim would have occurred.

3.2. Estimation and discussion of results

Estimation was carried out in Stata r©. To estimate the TVPmodel, we used the command -mvprobit- coded by Cappellariand Jenkins (2003); the MNLmodels were estimated bymaximum

likelihood using -mlogit-. The multivariate probit model wasestimated using the method of simulated maximum likelihood(SML) using a smooth recursive simulator, known as the GHKsimulator, to evaluate multivariate Normal probabilities. Thespecific details of this algorithm are omitted here, but see Train(2003, p. 126–37), Greene (2003, p. 932–33) and the referencescited therein. Furthermore, the SML estimator is asymptoticallyconsistent as the number of observations and the number of drawstend to infinity. Cappellari and Jenkins (2003) recommend that, solong as the number of draws, R, is greater than the square root ofthe sample size,

√N , then parameter estimates obtained through

-mvprobit- are robust to different initial seed values. We adoptthis ‘‘rule of thumb’’ in our estimation.Under the multivariate probit framework, the variances of

the disturbances are normalized to unity. In the context of ourinsurance data, however, we conjecture that the variation in theprobability of observing a particular outcome may in fact varywith different levels of premium. For example, there may beless variation in the probability of observing ‘‘extreme outcomes’’for lower levels of premium, as well as for higher levels ofpremium, but a larger variation in the probability of outcomeswhich fall in-between. In the context of simple univariate non-linear models, such as probit and logit, heteroskedasticity causesparameter estimates to be inconsistent (Davidson andMacKinnon,1984). A Lagrange multiplier (LM) test was proposed by Davidsonand MacKinnon (1984) to test for heteroskedasticity of a knownfunctional form; but the applicability of their testing procedure hasnot, to our knowledge, been verified in a multivariate context. As apreliminary analysis, we used the HETTEST procedure in Shazamto test for heteroskedasticity using the Davidson and MacKinnon(1984) LM2 test statistic.2 We ran three separate univariate probitregression on the policies for each claim-type using Premiumas the only covariate. In all three tests, the null hypothesis ofhomoskedasticity was strongly rejected in favor of heteroskedasticdisturbances.3 However, it is important to note that these tests donot point definitively towards the presence of heteroskedasticity,but may, instead, pick up some other form of misspecification(Greene, 2003, p. 681). Furthermore, computation of a robustcovariance matrix in this particular setting is unclear and will beinappropriate if the heteroskedasticity is of an unknown form.Finally, after fitting MNL models to the policies, the Small andHsiao (1985) test was used to test the hypothesis that the IIAproperty holds. In all of these tests, there were sufficient evidenceat the 1% significance level leading to unequivocal rejections of thehypothesis that the IIA property holds.The unit of observation in our analysis is a registered vehicle

insured. Tables 4 and 6 summarize the results of fitting our

2 The code for the HETTEST procedure may be downloaded fromhttp://shazam.econ.ubc.ca/intro/logit3.htm.3 The LM2 statistics from the two probit regressions of Injury on Premium weresmall relative to the critical value.

http://shazam.econ.ubc.ca/intro/logit3.htm


Table 4Results of fitting the TVP model to the policies.

Coefficientestimate

Standarderror

z-statistic

p-value

InjuryPremium 0.109 0.0426 2.58 0.010Intercept −1.150 0.0485 −23.75 0.000OwnPremium 0.330 0.0409 7.98 0.000Intercept −1.397 0.0495 −28.08 0.000PropertyPremium −0.254 0.0379 −6.72 0.000Intercept 1.189 0.0438 27.13 0.000CorrelationCoefficientρ̂21 0.274 0.0373 7.62 0.000ρ̂31 −0.615 0.0258 −23.89 0.000ρ̂32 −0.844 0.0158 −54.14 0.000

R = 100 Log pseudolikelihood = −2265.3486, n = 2236

Likelihood Ratio Test H0 : ρ21 = ρ31 = ρ32 = 0, χ2(3) = 1007.25, p-value= 0.000;R is the number of pseudo-random draws. Premium is in thousands of Singaporeandollars.

TVP and MNL models, respectively. As expected, Premium is animportant predictor of claim-type; the coefficient estimates areboth statistically and economically significant, as well as havingthe a priori expected sign. A somewhat surprising result is thenegative marginal effect of Premium on the marginal probabilityof claiming with respect to ‘‘Property’’, which we have defined as aclaimwith respect to third party property damage. To evaluate themarginal effect of Premium on each of the marginal probabilities,we calculate the linear prediction on each claim-type j, x′β̂j, and,using ∂E(yj|x)/∂x1 = φ(x′β̂j)× β̂j1, we averaged out the marginaleffects for each observation (Greene, 2003, p. 668). Here, β̂j1 is thecoefficient estimate on Premium from the claim-type j equation,for j = 1, 2, 3, and φ(·) is the probability density function of astandard normal distribution with zero mean and unit variance.The average return to Premium for each claim-type is reportedin Table 5. These marginal effects suggest that, for every $1000increase in Premium, the marginal probabilities of making an‘‘Injury’’ and ‘‘Own’’ claim are increased by approximately 2.5%and 6.8%, respectively. For ‘‘Property’’ claims, however, a same$1000 increase in Premium is associated with a fall in its marginalprobability by approximately 6.2%.The estimated correlation coefficients, ρ̂sj, between each of the

three claim-types are statistically and economically significant.Surprisingly, the correlations between the disturbances of the‘‘Injury’’ and ‘‘Property’’ equations, and the ‘‘Own’’ and ‘‘Property’’equations are negative. This suggests that the unobservablefactors which increase the probability of claiming with respectto ‘‘Injury’’, for example, actually reduce the probability ofclaiming with respect to ‘‘Property’’; a similar interpretationalso applies to the negative correlation between ‘‘Own’’ and‘‘Property’’. The positive correlation between the ‘‘Injury’’ and‘‘Own’’ equations is intuitive. Here, unobservable factors whichincrease the probability of claiming with respect to ‘‘Injury’’ alsoincrease the probability of claiming with respect to ‘‘Own’’. Onecan think of latent driver characteristics, such as responsiveness orrestlessness, as important factors which influence the probability.Furthermore, the likelihood ratio test for independence betweenthe disturbances is strongly rejected, implying correlated binaryresponses between different claim-types.The MNL results are summarized under Table 6, where, like

the TVP, a single covariate was used to explain the conditionalclaim-type, M . Here, the omitted category is a claim with respectto ‘‘Property’’ damage only (M = 3). Again, we observe thatPremium is an important predictor of claim-type; in all but one

Table 5The average marginal effect of Premium on the marginal claim-type probabilities.

Claim-type Marginal effect

Injury 0.025Own 0.068Property −0.062

combination (M = 1), the coefficient on Premium is statisticallyand economically significant, aswell as having the a priori expectedsign. The coefficient estimate on Premium is the marginal effectof Premium on the log of the ratio of probabilities; therefore, onecan exponentiate the index function to produce a probability ofa given outcome relative to the omitted category. To illustrate,a 100 Singaporean dollar increase in Premium is associated withan increase in the probability of making an ‘‘Injury’’ only claim(M = 1) by approximately 1.1% (=(e0.109/10− 1)× 100%), relativeto the probability of making a ‘‘Property’’ only claim (M = 3). Incontrast, for two policies which differ by 100 Singaporean dollars,the more expensive policy is 6.42% (=(e0.622/10− 1)× 100%) morelikely to claim with respect to all three types of claim (M = 7),relative to the probability of claiming for ‘‘Property’’ damages only.The largest coefficient estimate on Premium is associated withclaiming with respect ‘‘Injury’’ and ‘‘Own’’ (M = 4). Here, a 100Singaporean dollar increase in Premium is expected to increase theprobability of claiming for ‘‘Injury’’ and ‘‘Own’’ by approximately6.83% (=(e0.661/10 − 1)× 100%), relative to the omitted category.

3.3. Comparing the results of the MNL and the MVP models

The individual and joint statistical significance of the correla-tion coefficients from the TVP is supporting evidence for correlatedlatent responses amongst different claim-types. As mentionedbefore, the MNL model assumes that the latent responses are in-dependent between different claim-type combinations. The impli-cation here is that the ratio of the probabilities of observing twodifferent claim-types is not affected by the presence of other claim-type combinations in the same choice set (the IIA property). Weuse predicted probabilities of each possible outcome as the basisfor comparing theMNL to the TVPmodels in the presence of cross-correlations in the latent responses. These probabilities are condi-tional on there being a claim.The predicted probabilities of each outcome from the TVP

model are computed from a simulation exercise, which drawsfrom a trivariate Normal distribution using pseudo-randomsequences derived froma standardUniformdensity; see Cappellariand Jenkins (2006). To illustrate the complexity of evaluatingthese probabilities, suppose we wish to compute the predictedprobability of observing a claim-type combination of the form(1, 0, 1); here, there is a claim with respect to ‘‘Injury’’ and‘‘Property’’, but none for ‘‘Own’’ damages. Now, ifwe letwj = qjx′β̂jand qj = 2Ij − 1 for j = 1, 2, 3, then the integral to be evaluated is

Pr(I1 = 1, I2 = 0, I3 = 1|w1, w2, w3)= Φ3 (w1, w2, w3; ρ21, ρ31, ρ32)

=

∫ x′β̂1

−∞

∫−x′β̂2

−∞

∫ x′β̂3

−∞

φ3 (ε1, ε2, ε3; ρ21, ρ31, ρ32) dε3dε2dε1,

(8)

where Φ3(·) and φ3(·) denote a trivariate Normal distributionfunction and probability density function, respectively, and theupper support is the linear prediction or index value correspondingto each claim-type. Note that the sign on each upper supportdepends on whether the observed binary outcome is one or zero,so that it is positive if the observed outcome is one, and negativeif the observed outcome is zero. This specification follows the


Table 6Results of fitting the MNL model to the policies.

M Binary triplet Intercept SlopeCoefficient estimate Standard error p-value Coefficient estimate Standard error p-value

1 (1, 0, 0) −2.460 0.137 0.000 0.085 0.145 0.5582 (0, 1, 0) −2.626 0.112 0.000 0.636 0.092 0.0004 (1, 1, 0) −4.605 0.262 0.000 0.661 0.165 0.0005 (1, 0, 1) −2.907 0.149 0.000 0.339 0.137 0.0146 (0, 1, 1) −4.305 0.287 0.000 0.467 0.240 0.0527 (1, 1, 1) −5.071 0.297 0.000 0.622 0.169 0.000

Log-pseudolikelihood = −2060.594, n = 2236, Pseudo R2 = 0.0171

Note: The response variable is the conditional claim-type,M; the single covariate is Premium (in thousands of Singaporean dollars); and the omitted category isM = 3.

Table 7Average predicted probabilities for each outcome.

M Outcome TVP (p̂1M ) MNL (p̂2M ) ∆M

(p̂1M−p̂2Mp̂1M

)%

1 (1, 0, 0) 0.037 0.068 −0.031 −83.782 (0, 1, 0) 0.079 0.094 −0.015 −18.993 (0, 0, 1) 0.734 0.748 −0.014 −1.914 (1, 1, 0) 0.049 0.013 0.036 73.475 (1, 0, 1) 0.071 0.053 0.018 25.356 (0, 1, 1) 0.039 0.015 0.024 61.547 (1, 1, 1) 0.001 0.008 −0.007 −700.00

parameterization of the bivariate case considered in Greene (2003,p. 710). Table 7 summarizes the average predicted probabilitiesfrom the TVP and the MNL models for each possible outcome.For each observation in the relevant subsample, we computed thepredicted probability of observing the given outcome, and thentook the average of the probabilities for the same outcome.Furthermore, we observe that the probability of claiming with

respect to ‘‘Property’’ damage only (M = 3) is the highest, giventhat there is a claim. This result is not surprising, since it is by farthe most frequently observed outcome in our sample; see Table 2.The two last columns in the table measure the difference betweenthe two sets of predicted probabilities and the proportionatedifferences, respectively. Here, we define

∆M = p̂1M − p̂2M (9)

to be the difference between the predicted probabilities of the TVPand MNL models for outcome M . These differences are small foroutcomesM = {2, 7} but relatively larger for outcomeM = 4. Forthe proportionate differences column, the largest divergences areobserved in outcomes M = {1, 7}. Fig. 1a through to g providegraphical representations of the distribution of the differencesin predicted probabilities for each outcome across Premium. Foroutcome M = 1, there is apparent fall in the differences betweenthe probabilities for higher amounts of premium. In contrast, foroutcomes M = {2, 4, 6, 7}, the differences increase over largeramounts of premium. There is no such apparent pattern in thedifferences in probabilities for outcomesM = {3, 5}; for outcomeM = 3, the differences fall for premium amounts up to $2500, butincrease thereafter. Similarly, for outcome M = 5, differences fallup to premium amounts close to $1400, but increase thereafter.The least probable outcome is a claim with respect to all threetypes of damages (M = 7). A peculiar result is that in spite ofits slightly higher observed frequency relative to outcome M =6, its predicted probability of occurrence is substantially smaller.Their predicted probabilities are substantially over-estimated, onaverage, by the MNL model.

4. Monte Carlo simulation

The estimation results reported in the previous section suggestthat fitting the MNL model to correlated binary response dataresults in considerable divergences in the predicted probabilities

relative to those produced by our TVPmodel. In particular, we sawthat these divergences varied systematically over premiums forvarious claim-type combinations; for outcomes M = {2, 4, 6, 7},the divergences tend to increase over higher amounts of premium,but not clearly so for the remaining outcomes. Moreover, theproportionate differences between the predicted probabilities aresubstantially higher for claim-type combinations in which therewas more than one type of claim made, with the exception ofoutcome M = 1; see the last column of Table 7. The aimof this section is to now substantiate these findings. Through acontrolled experimental design, we carry out a number of MonteCarlo simulations, whereby we fit the MNL and TVP models of thespecification found in Section 3 tomultivariate response outcomesdrawn from a randomly generated trivariate Normal distribution,using previously estimated coefficients as the true parameters.The idea is to investigate how good the MNL model fits when,by assumption, the true underlying model is an MVP. In addition,we investigate the extent of these divergences under variousexperimental values of the correlation coefficients in the trivariateNormal covariance matrix, which we denote, respectively, by{6F ,6H ,6Z }. Using the standard deviation of these divergences asthe standard error of the differences in the predictions, our resultsprovide strong evidence in support of the TVPmodel over theMNLmodel when the binary responses are potentially correlated.

4.1. The experimental design

We extract the vector of premium values from the policiesunder the same insurer and use this as the single covariate, whichis consistent with our TVP and MNL models fitted to the policiesin Section 3. There are 2236 premium amounts in this vector. Forthe purposes of our experiment, we assume that the underlyingdata generating process (DGP) for latent disturbances of eachclaim-type follows a trivariate Normal distributionwith zeromeanvector and covariance matrix 6 as defined in Eq. (5). Our methodof drawing from the multivariate Normal distribution followsthat of Cappellari and Jenkins (2006, pp. 10–11). Here, we usecoefficient estimates extracted from Table 4 as the a priori knowntrue population parameters. For each claim-type j, the underlyingmodel specification is given by

I◦j = x′βj + εj, for Ij = I{I◦j >0}, j = 1, 2, 3 (10)

where x = (1, Premium)′ is a vector of covariates which variesacross policyholders, andβ1 = (−1.150, 0.109)

′, β2 = (−1.397, 0.330)′,

β3 = (1.189,−0.254)′ (11)

are vectors of estimated parameters which we treat as the truepopulation parameters of the underlying DGP. Finally, to completethe specification, we have ε ∼ N (0,6F ), where the covariancematrix is

6F =

[ 10.274 1−0.615 −0.844 1

], (12)


(a)M = 1. (b)M = 2.

(c)M = 3. (d)M = 4.

(e)M = 5. (f)M = 6.

(g)M = 7.

Fig. 1. Differences in predicted probabilities for each outcome across premium.


and the subscript F denotes that the specification is of the fullcorrelations. Note that the upper triangular elements have beenomitted due to symmetry. Generating the trivariate realizations Ijis then straightforward by Eq. (10). In each replication, we have a2236 × 3 matrix of random disturbances drawn from a trivariateNormal distribution. We use this sample to estimate the TVP andMNL models. In summary, our methodology can be decomposedinto three steps:

(1) Using the trivariate probit parameter estimates from the fittedmodel as the ‘‘true’’ parameters, generate a simulated trivariatenormal distribution for the underlying latent responses foreach claim-type j, with correlation structure defined byρ̂21, ρ̂31 and ρ̂32. The observed response of each binaryoutcome, yj, is the realization of the underlying latent responsefor claim-type j. For each observation, we observe a claim-typecombination in the formof a binary triplet and a correspondingpremium amount.

(2) Estimate the TVP and MNL models using the random samplegenerated in step (1), with 2236 premium values as the singlecovariate. Note that observations for which the outcome isM = 0 have been excluded from estimation.

(3) Repeat steps (1) and (2) 100 times.

Fig. 2 shows the distribution of the MNL coefficient estimatesfor each outcomeM over the 100 replications. In each replication,M = 3 was treated as the base outcome, so that the coefficientestimate on Premium measures the return to Premium on theprobability of observing outcomes M = {1, 2, 4, 5, 6, 7} relativeto M = 3. Fig. 2a through to e show that estimates for theintercept and Premium coefficient converge to approximately tosimilar values over each replication. For outcome M = 7, Fig. 2fshows a more sporadic pattern in the coefficient estimates overthe replications. Similarly, Fig. 3 shows the distribution of theTVP estimates for each claim-type j over the 100 replications.Not surprisingly, these estimates are consistent with the truepopulation parameters as specified under our experimental design.

4.2. Predicted probabilities

In each replication r , we simulated the trivariate Normalprobabilities for each observation; these are stored under anew variable prob‘r’ in the simulated data set. These are thepredicted probabilities corresponding to the TVP model. For eachoutcome M , we then averaged the predicted probabilities overPremium. This results in seven mean predicted probabilities, witheach corresponding to an outcome. The same methodology isapplied to the MNL predicted probabilities, and these are storedin new variables M‘i’-1 through to M‘i’-7 in the data set. Thedifferences between these two sets of predicted probabilities foreach outcome are computed as follows. Define a variable ∆(r)M tobe the difference between the mean TVP (p̂1M ) and MNL (p̂2M )predicted probabilities for outcome M , so that at each replicationr , we have

∆(r)M = p̂

(r)1M − p̂

(r)2M , r = 1, . . . , 100. (13)

This same process is then repeated over 100 times so that foreach outcome M , there are 100 ∆M ’s. Our aim is to test thenull hypothesis that for the same outcome the mean predictedproduced by the TVP model is not different to that of theMNL model. This is equivalent to testing the null hypothesisthat µM = 0, where µM is the population mean of thedifference in the predictions. The alternative hypothesis is thatthe means of the differences are nonzero (µM 6= 0), sothat correlated binary responses should determine the model.Here, we appeal to the central limit theorem (CLT) so that thedifference in mean predicted probabilities (∆), for each outcome,

over the 100 replications is approximately standard Normal. Theimplication here is that even if the individual replications arenot Normally distributed, the mean of the replications will beNormally distributed by the CLT. As an aside, Fig. 4 shows thatthe replications for outcomes one through to six are normallydistributed,with the exception of∆7. Quantile–Quantile (QQ) plotsare used as a graphic representation of normality, so that if theobservations are drawn from a normal distribution, then theyshould not deviate from the 45◦ reference line. For ∆7, there areclear deviations from this reference line, but its non-normalitydoes not affect our inferences.Finally, the difference in sample means4 is still itself a sample

mean, so that the relevant test statistic for outcomeM is

∆M − µM

σ∆M /√n∼ N (0, 1), (14)

where∆M is the difference between the mean predicted probabil-ities of the TVP and MNL models, n is the number of simulationsrun, and µM is the population mean. Eq. (14) holds only if the ∆’sare drawn from independent replications. Recall our experimen-tal design where our sample consists of 2236 observations in eachreplication. In each subsequent replication, the premium amountswill be the same, so such that the∆’s can no longer be regarded asdrawn from independent samples. To account for this dependencein the samples, we estimate the variance of ∆M , Var(∆M), whilstallowing for covariances between replications. The derivation ofVar(∆M) is outlined below.Assume that for any outcome M , the variance of ∆(r)M , for

each replication r , and the covariance between ∆(r)M and ∆(s)M , for

replications r and s, are equal. Then, the variance of∆M is derivedby the following:

Var(∆M) =1n2Var

[n∑r=1

∆(r)M

]

=1n2

[nVar

(∆(r)M

)+

(n2

)Cov

(∆(r)M ,∆

(s)M

)]=1n

[Var

(∆(r)M

)+

(n− 12

)Cov

(∆(r)M ,∆

(s)M

)], (15)

where( n2

)=

n!2!(n−2)! denotes the number of combinations of

choosing any two different replications from n simulations. Boththe variance and covariance terms in Eq. (15) can be estimatedfrom the generated data where

Var[∆(r)M

]= E

[(∆(r)M

)2]− E

[∆(r)M

]2(16)

is the variance of∆M for each replication r , and

Cov[∆(r)M ,∆

(s)M

]= E

[∆(r)M ·∆

(s)M

]− E

[∆(r)M

]· E[∆(s)M

], (17)

is the covariance between replications r and s, for any r 6= s.

4.3. Results and discussion

The results of our simulation exercise are summarized in Table 8below. The square root of the estimated variance is the standarderror of the difference in mean predictions, and correspondsto taking the square root of the variance derived the previoussubsection. These results collectively suggest that implicationsof correlated disturbances from a multivariate outcome setting

4 That is, the difference between the mean TVP and MNL predicted probabilities.


(a)M = 1. (b)M = 2.

(c)M = 4. (d)M = 5.

(e)M = 6. (f)M = 7.

Fig. 2. MNL coefficient estimates relative to the base outcomeM = 3 over 100 simulations.


(a) Injury equation. (b) Own equation.

(c) Property equation. (d) Correlations: p̂21, p̂31, p̂32 .

Fig. 3. TVP coefficient estimates over 100 simulations.

Table 8Simulation results for the standard deviation of∆M , with full correlations specification.

∆M Mean Variance Standard error Test statistica p-valueb

∆1 −0.00667 9.52621× 10−8 0.00031 −21.52 0.000∆2 −0.00806 6.08742× 10−6 0.00247 −3.26 0.002∆3 −0.01194 5.64243× 10−7 0.00075 −15.92 0.000∆4 0.00357 7.76953× 10−6 0.00279 1.28 0.200∆5 0.00433 3.28590× 10−8 0.00018 24.06 0.000∆6 0.00590 2.00991× 10−7 0.00045 13.11 0.000∆7 −0.00019 1.07663× 10−8 0.00010 −1.90 0.058a The test statistic is∆M/

√Var(∆M ) ∼ N (0, 1).

b The p-value corresponds to the probability value of the relevant test statistic under a two-sided alternative.

should not be neglected; a finding of such correlation shoulddrive one’s decision to use a model which allows for flexibility inthe specification of the covariance matrix for the disturbances asopposed to another which conveniently assumes independence.Again, this brings to light the aforementioned trade-off betweenflexibility and tractability; see Section 1 for a discussion. Fromour experimental design in which we are able to specify apriori correlations in the underlying DGP, the means of the

MNL predicted probabilities of six from the seven outcomes arestatistically different from those of the TVP model. Specifically,there is strong evidence against the null hypothesis of no differencefor outcomes M = {1, 2, 3, 5, 6} at the 1% significance level; foroutcomes M = {4, 7}, however, the evidence is less convincing.Moreover, these results further substantiate those in Table 7.For an actuary seeking to model the conditional claim-type, the


(a)∆1 . (b)∆2 .

(c)∆3 . (d)∆4 .

Fig. 4. Quantile–Quantile plots for each∆M .

implication is that the TVP model should be preferred over MNLwhen the outcomes are correlated.

4.4. Experimental correlations

A benefit of our experimental design is the ability to specifya priori the true population parameters of the underlying DGP.In the previous section we saw that under a specification ofthe covariance matrix in Eq. (12) with full correlations derivedfrom our TVP results, differences between the TVP and MNLpredicted probabilities were found to be statistically significant.In the discussion which follows, we experiment with two otherspecifications for the covariance matrix, namely, one where thecorrelations are halved and the other where they are set tozero (that is, the error disturbances in each claim-type equationare orthogonal). In both of these experiments, the simulationmethodology is the same as before.

4.4.1. Half correlationsSuppose that each of the correlations between the claim-type

disturbances are halved, so that instead of the covariance matrix

in Eq. (12), we have

6H =

[ 10.137 1−0.308 −0.422 1

]. (18)

Here, the direction of the correlations ρsj remains unchanged,so that unobservables affecting the marginal probability of an‘‘Injury’’ claim remain positively correlated with unobservablesaffecting the marginal probability of an ‘‘Own’’ injuries anddamages claim (conversely for ‘‘Injury’’ and ‘‘Property’’ marginalprobabilities, and ‘‘Own’’ and ‘‘Property’’ marginal probabilities).The results of the simulation are summarized under Table 9. Theseresults suggest that, even if population correlations are halved,there still exist statistically significant differences between TVPandMNL predicted probabilities of the outcomes. In contrast to theresults in Table 8, all∆’s are statistically different from zero at the5% significance level.

4.4.2. Zero correlationsThe aim of this experiment is to determine whether the MNL

model can be used to approximate the outcomeprobabilities under


(e)∆5 . (f)∆6 .

(g)∆7 .

Fig. 4. (continued)

Table 9Simulation results for the standard deviation of∆M , with half correlations specification.

∆M Mean Variance Standard error Test Statistica p-valueb


√Var(∆M ) ∼ N (0, 1).


an artificial setup whereby the disturbances are orthogonal. If theMNL model can be used in place of the TVP model at all, onewould expect that the ideal conditions under which MNL may beappropriate would be such that the disturbances are independentand hence uncorrelated. In the previous two setups where weexperimented with full and half correlation specifications for thecovariance matrix, predicted probabilities of the MNL model werefound to be statistically different from those of the TVP model.

These results were a priori expected, and serve to reinforce ourchoice of the TVP over MNL when the disturbances are correlated.Suppose now that the correlation between each claim-type

disturbance is zero in the underlying population, so that theestimation of the TVPmodel is equivalent to the estimation of threeindependent univariate probit models. Therefore, instead of usingour TVP coefficient estimates as the true population parameters inthe experimental design, we use those estimated by three separate


Table 10Simulation results for the standard deviation of∆M , with zero correlations specification.

∆M Mean Variance Standard error Test statistica p-valueb


√Var(∆M ) ∼ N (0, 1).


probit regressions with all the covariance terms set to zero, suchthat6Z = I3, (19)where I3 is an identity matrix of dimension 3 × 3. Here, 6Zis consonant with either the independent probit5 model or theMNL, depending on the distributional assumptions placed on thedisturbances. The reason this follows is that both models assumethat the disturbances are independent and identically distributed;the independent probit model assumes multivariate normality,whereas MNL assumes that the errors are type I extreme value.Where the parameters in the covariancematrix are allowed to varyfreely, then we end up with the MNPmodel discussed in Section 1.The results of this simulation are summarized under Table 10.

The results suggest that, even under ideal zero correlationconditions, the MNL fails to ‘‘correctly’’ estimate the predictedprobabilities of each outcome; here, six out of seven ∆’s arestatistically significant at the 10% level. There are two primaryreasons for this finding. First, the TVP and MNL models are non-nestedmodels and are inherently different choice processes. In theTVP model, we have three distinct choices, where each choice isrepresented as a binary outcome in each of the three equations inthe setup of Eq. (4). By contrast, MNL models one single choice,over seven possible choices. Moreover, there are three sets ofparameters as well as three additional covariance terms to beestimated in the TVP model; under MNL, there are six sets ofparameters to be estimated, where the remaining choice is thenormalized base outcome. Second, whilst the covariance structureassumed in Eq. (19) implies independent outcomes, the results ofa multinomial choice model that is fitted to these generated datamay not necessarily reflect this independence. This is illustratedunder the MNP framework for three choices. Consider againthe latent model specification in Eq. (4) for three choices. In amultinomial choice model, only differences in utility matter. Thus,normalizing the first choice yields

I◦2 − I◦

1 = x′(β2 − β1

)+ ε2 − ε1

I◦3 − I◦

1 = x′(β3 − β1

)+ ε3 − ε1,

(20)

so that it is one dimension smaller than before. Now, following theparameterization in Hausman and Wise (1978), we defineη21 = ε2 − ε1 (21a)η31 = ε3 − ε1, (21b)where the joint distribution of ηj1, for j = 1, 2, is bivariate normal,with alternative-specific covariance matrix

�1 =

[σ 21 + σ

22 − 2σ12

σ 21 − σ13 − σ12 + σ23 σ 21 + σ23 − 2σ13

,

](22)

5 The term ‘independent probit’ as used in this context refers to the MNPsetup under which the covariance matrix is of the specification in Eq. (19), andis different from the independent univariate probit models mentioned in thepreceding paragraph. This usage is consistent with Hausman and Wise (1978,p. 412).

so that the probability of the first outcome is chosen is given byPr(I◦2 − I

◦

1 > η21, I◦3 − I◦

1 > η31). Here,�1 is the actual covariance

matrix to be estimated under the MNP framework for a three-choice setting. Clearly, then, even if the covariance terms in theinitial covariance matrix are set to zero (that is, σ12 = σ31 =σ32 = 0), the covariances in the estimated�1may still be nonzero.This result extends to the current seven-choice setting of the MNLunder consideration.

5. Summary and concluding remarks

The extension of the aggregate claims distribution to includethe conditional claim-type component provides an actuary withadditional information which is beneficial in two respects. First,it improves the accuracy in the prediction of future claims, andhence risks; and second, it facilitates a more equitable pricingof motor insurance contracts, whereby the amount of premiumto be charged can be determined commensurately with themost likely claim-type combination, in addition to other factors.Conditional on there being at least one claim, an actuary is ableto predict the precise combination of claim-types, given specificpolicy, driver and vehicle-specific characteristics. Our use of thegeneralmultivariate probitmodel is not limited tomotor insurancecontracts and can be generalized to other forms of insurancecontracts inwhichmultiple different claim-typesmay be observed.A benefit of our model is that it is flexible, allowing estimationof correlations between different claim-types. The following is asummary of the key findings of this paper.First, fitting a multivariate probit model to our data is superior

to the MNL model when the outcomes are correlated. The flexiblespecification of the multivariate probit model allows an actuaryto estimate the extent of correlation between different claim-types. On the other hand, whilst the MNL model is definitely morecomputationally tractable than the multivariate probit model, itsprimary drawback is the IIA assumption.Second, the results suggest that the amount of premium

charged is an important predictor of claim-type. Specifically, wefound that higher premiumswere associatedwith highermarginalprobabilities of claiming with respect to ‘‘Injury’’ and ‘‘Own’’ dam-ages, but the converse for ‘‘Property’’ damages. We found sta-tistically and economically significant correlations between allthree different types of claims; in particular, ‘‘Injury’’–‘‘Property’’and ‘‘Own’’–‘‘Property’’ were negatively correlated, whilst ‘‘In-jury’’–‘‘Own’’ were positively correlated. The latter suggests thatthose who claim for third party injury are also more likely to claimfor own personal injuries and property damage.Third, despite the finding of significant correlations between

different claim-types, the MNL model produces qualitativelyconsistent predictions compared to those of themultivariate probitmodel. In the event of a claim, a policyholder is most likely toclaim for third party property damages, followed by claims for ownpersonal injuries and damages.Finally, the adequacy ofMNL as an appropriatemodel for claim-

type was investigated under a controlled experiment where thetrue underlying DGP was known a priori. We substantiated the


deviations in the predictions between the two models throughan artificial setup in which three different specifications of themultivariate Normal covariance matrix were characteristic of theDGP: full, half and zero correlations. The results suggest that, evenin ideal conditions under which the claim-types are uncorrelated,MNL is still a less-favored approximation to the ‘‘true’’ underlyingoutcome probabilities relative to the multivariate probit modelwhen the MVP model is correct.We have assumed in this paper that the MVP model specifica-

tion is functionally correct and well specified and that our modelcontains the true underlying DGP in that all covariates which helppredict claim-type have been accounted for. We have mentionedbefore the possible presence of heteroskedasticity, and have testedfor heteroskedasticity under a univariate probit setting using theDavidson andMacKinnon (1984) LM2 test statistic.We suspect thata finding of heteroskedasticity in the univariate case carries for-ward to the multivariate case, so that the standard errors of ourestimates may in fact be incorrect. However, to explicitly modelthe form of the heteroskedasticity is beyond the scope of this pa-per.Moreover, due to limitations in the data which are exacerbated

by a large number of missing observations for potentiallyimportant covariates, there is a possibility that the coefficientestimates may have been affected by omitted variable bias. Whilstthis is unfortunate, the problem is primarily one of the data,over which we have no control. We have implicitly assumedthat the data were missing at random. A possible area of futureinvestigation would be to explicitly model this data imbalance.Finally, we acknowledge that our results pertain specifically

to the randomly chosen insurer. We did not pool historicalclaims data from multiple insurers so as to avoid the problemsof heterogeneity induced by the pooling of data which mayhave come from different underlying DGPs. Nonetheless, theimplications of our results are that under the presence ofpotentially correlated responses, the MVP model is superior tothe MNL model and thus should be preferred. Finally, there existsother alternatives to the multivariate probit model which are bothflexible as well as computationally tractable, but have not beenconsidered. An example is the class of generalized extreme-value(GEV) models, which assume that the unobservable component ofthe utility for all alternatives are jointly distributed as a generalizedextreme value. More importantly, models within this class areable to capture sources of correlation between outcomes and aretherefore not restrained by the IIA property. This is an interestingarea for future research.

Acknowledgments

The authors wish to thank the anonymous referee and theassociate editor (Roger J.A. Laeven) for their comments and

suggestions which helped in improving this version of the paper.Gary Young would like to thank Denzil Fiebig (University of NewSouthWales, Australia) for the helpful comments on the first draftof the paper. The research of Robert Kohn was partially supportedby the Australian Research Council through the Discovery GrantDP0667069.

References

Amemiya, T., 1985. Advanced Econometrics. Oxford, UK: Basil Blackwell.Ashford, J., Sowden, R, 1970. Multivariate probit analysis. Biometrics 26(3),535–546.

Balia, S., Jones, A.M., Mortality, lifestyle and socio-economic status. University ofYork. Working Paper. Dated October 2004.

Bolduc, D., 1992. Generalized autoregressive errors in the multinomial probitmodel. Transportation Research B 26B (2), 155–170.

Bunch, D.S., 1991. Estimability in the multinomial probit model. TransportationResearch B 25B (1), 1–12.

Cappellari, L., Jenkins, S.P., 2003. Multivariate probit regression using simulatedmaximum likelihood. The Stata Journal 3, 278–294.

Cappellari, L., Jenkins, S.P., 2006. Calculation of multivariate normal probabilitiesby simulation, with applications to maximum simulated likelihood estimation.ISER Working Paper 2006-16. University of Essex, Colchester.

Daganzo, C., 1979. Multinomial Probit: The Theory and its Application to DemandForecasting. Academic Press, NY.

Davidson, R., MacKinnon, J.G., 1984. Convenient specification tests for logit andprobit. Journal of Econometrics 25, 241–262.

Gibbons, R.D., Wilcox-Gök, V., 1998. Health service utilization and insurancecoverage: A multivariate probit approach. Journal of the American StatisticalAssociation 93 (441), 63–72.

Greene, W.H., 2003. Econometric Analysis, 5th edition. Prentice Hall, New Jersey.Hausman, J.A., Wise, D.A., 1978. A conditional probit model for qualitativechoice: Discrete decisions recognizing interdependence and heterogeneouspreferences. Econometrica 46(2), 403–426.

Keane, M.P., 1992. A note on identification in themultinomial probit model. Journalof Business & Economic Statistics 10(2), 193–200.

Maddala, G.S., 1983. Limited-Dependent and Qualitative Variables in Econometrics.Cambridge University Press, New York, NY.

McFadden, D., 1974. Conditional logit analysis of qualitative choice behavior.In: Zarembka, P. (Ed.), Frontiers of Econometrics. Academic Press, NY,pp. 105–142.

McFadden, D., 1981. Econometric models of probabilistic choice. In: Manski, C.,McFadden, D. (Eds.), Structural Analysis of Discrete Data with EconometricApplications. MIT Press, Cambridge, MA, pp. 198–272.

Pinquet, J., 1998. Designing optimal Bonus-Malus systems from different types ofclaims. ASTIN Bulletin 28(2), 205–220.

Saunders, A., Boudoukh, J., Allen, L., 2003. Understanding Market, Credit, andOperational Risk: The Value-At-Risk Approach. Blackwell Publishing, Oxford.

Small, K.A., Hsiao, C., 1985. Multinomial logit specification tests. InternationalEconomic Review 26, 619–627.

Train, K.E., 2003. Discrete Choice Models with Simulation. CambridgeUniversity Press, Cambridge, Pre-print version available fromhttp://elsa.berkeley.edu/books/choice2.html.

Valdez, E.A., Frees, E.W., 2005. Longitudinal modeling of Singapore motor in-surance. University of New South Wales and the University of Wisconsin-Madison. Working Paper. dated 28 December 2005. available from:http://wwwdocs.fce.unsw.edu.au/actuarial/research/papers/2006/Valdez-Frees-2005.pdf.

Weeks,M., 1997. Themultinomial probitmodel revisited: A discussion of parameterestimability, identification and specification testing. Journal of EconomicSurveys 11(3), 297–320.

http://elsa.berkeley.edu/books/choice2.html




Multivariate probit models for conditional claim-types

Documents

Transcript of Multivariate probit models for conditional claim-types