Robust Actuarial Risk Analysis -...

32
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=uaaj20 North American Actuarial Journal ISSN: 1092-0277 (Print) 2325-0453 (Online) Journal homepage: https://www.tandfonline.com/loi/uaaj20 Robust Actuarial Risk Analysis Jose Blanchet, Henry Lam, Qihe Tang & Zhongyi Yuan To cite this article: Jose Blanchet, Henry Lam, Qihe Tang & Zhongyi Yuan (2019) Robust Actuarial Risk Analysis, North American Actuarial Journal, 23:1, 33-63, DOI: 10.1080/10920277.2018.1504686 To link to this article: https://doi.org/10.1080/10920277.2018.1504686 Published online: 06 Feb 2019. Submit your article to this journal Article views: 129 View Crossmark data

Transcript of Robust Actuarial Risk Analysis -...

Page 1: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=uaaj20

North American Actuarial Journal

ISSN: 1092-0277 (Print) 2325-0453 (Online) Journal homepage: https://www.tandfonline.com/loi/uaaj20

Robust Actuarial Risk Analysis

Jose Blanchet, Henry Lam, Qihe Tang & Zhongyi Yuan

To cite this article: Jose Blanchet, Henry Lam, Qihe Tang & Zhongyi Yuan (2019)Robust Actuarial Risk Analysis, North American Actuarial Journal, 23:1, 33-63, DOI:10.1080/10920277.2018.1504686

To link to this article: https://doi.org/10.1080/10920277.2018.1504686

Published online: 06 Feb 2019.

Submit your article to this journal

Article views: 129

View Crossmark data

Page 2: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

Robust Actuarial Risk Analysis

Jose Blanchet,1 Henry Lam,2 Qihe Tang,3� and Zhongyi Yuan41Department of Management Science and Engineering, Stanford University, Stanford, California2Department of Industrial Engineering and Operations Research, Columbia University, New York, New York3Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa4Department of Risk Management, Pennsylvania State University, University Park, Pennsylvania

This article investigates techniques for the assessment of model error in the context of insurance risk analysis. The method-ology is based on finding robust estimates for actuarial quantities of interest, which are obtained by solving optimization problemsover the unknown probabilistic models, with constraints capturing potential nonparametric misspecification of the true model. Wedemonstrate the solution techniques and the interpretations of these optimization problems, and illustrate several examples,including calculating loss probabilities and conditional value-at-risk.

1. INTRODUCTIONThis article studies a methodology to quantify the impact of model assumptions that are uncertain or possibly incorrect in

actuarial risk analysis. The motivation of our study is that, in many situations, coming up with a highly accurate model is chal-lenging. This could be due to a lack of data, e.g., in describing a low-probability event, or modeling issues, for example, inaddressing a hidden and potentially sophisticated dependence structure among risk factors. Often times, actuaries and statisti-cians resort to models and calibration procedures that capture stylized features based on experience or expert knowledge, butbearing the risk of deviating too much from reality and thus leading to suboptimal decision making. The methodology studiedin this article aims to quantify the incurred errors from these approaches to an extent that we shall describe.

On a high level, the methodology we study in this article has the following characteristics:

(1) Its starting point is a baseline model that, for any good reasons (e.g., a balance of fidelity and tractability), is currentlyemployed by the actuary.

(2) The method employs optimization theory to find a bound for the underlying risk metric of interest, where the optimiza-tion is nonparametrically imposed over all probabilistic models that are in the neighborhood of the baseline model.

(3) Typically the worst-case model (i.e., the optimal solution obtained from the optimization procedure) can be written interms of the baseline model. So the proposed procedure can be understood as a correction to allow for the possibilityof model misspecification.

The key idea of the methodology lies in a suitable formulation of the optimization problem in (2) that leads to (3). This isan optimization cast over the space of models. This formulation has a feasible region described by a neighborhood of the base-line model, the latter believed to be close to a “true” or correct one—thanks to the expertise of the actuary. However, despitein the vicinity of the baseline, the location of the true model is unknown. The neighborhood that defines the feasible region,roughly speaking, serves to provide a more accessible indication of where the true model is. When this neighborhood happensto contain the true model, the optimization problem, namely a maximization or a minimization of the risk metric, will outputan upper or a lower bound for the corresponding true value.

Thus, instead of outputting a risk metric value obtained from a single baseline model, the bound from our optimization canbe viewed as a worst-case estimate from a collection of models that likely include the truth. In this sense, the bound is robust

Address correspondence to Henry Lam, 500 West 120th Street, New York, NY 10027. E-mail: [email protected]*During the review process of this paper, Qihe Tang was under a transition from the University of Iowa to UNSW Sydney.Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/uaaj.

33

North American Actuarial Journal, 23(1), 33–63, 2019# 2019 Society of ActuariesISSN: 1092-0277 print / 2325-0453 onlineDOI: 10.1080/10920277.2018.1504686

Page 3: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

to model misspecification (to the extent of the neighborhood we impose). Our approach can also be viewed as a robustificationof traditional scenario analysis or stress testing. Rather than testing the impact of risk metric due to specific shocks (by runningthe model at different scenarios or parameter values, for example), our bound is capable of capturing the impact to the riskmetric due to any perturbation of the model in the much bigger nonparametric space. We will illustrate these interpretations inseveral examples such as calculating loss probabilities and conditional Value-at-Risk.

The approach and the formulations that we discuss in this article have roots in areas such as economics, operations research, andstatistics. The notion of optimization over models appears in the context of decision making under ambiguity, that is, when parts ofthe underlying model are uncertain, the decision-maker resorts to optimizing decisions over the worst-case scenario, resulting in aminimax problem where the maximization is over the set of uncertain models. In the case of probabilistic models, this machineryinvolves an optimization over distributions, which is the framework we study in this article. In stochastic control, such an approachhas been used in deriving best control policies under ambiguous transition distributions (Petersen et al. 2000; Nilim and El Ghaoui2005; Iyengar 2005). In economics, the work of two Nobel laureates L. P. Hansen and T. J. Sargent Hansen and Sargent (2008)studies optimal decision making and its macroeconomic implications under ambiguity. Similar ideas have been used in quantitativefinance, including portfolio optimization (Glasserman and Xu 2013, 2014), as well as in physics and biology applications(Arampatzis et al. 2015a,b). In operations research, optimization over probabilistic models dates back to Scarf et al. (1958) in thecontext of inventory management, and has been used in Bertsimas and Popescu (2005), Birge and Dul�a (1991) and Smith (1995)when some moment information is assumed known. The literature of distributionally robust optimization, which has been growinglyactive in recent years, for example, Delage and Ye (2010), Goh and Sim (2010), Wiesemann et al. (2014), Jiang and Guan 2016,and Natarajan et al. (2008), studies reformulations and efficient algorithms to handle uncertainty in probabilistic assumptions in vari-ous stochastic optimization settings. The specific type of constraints we study, namely a neighborhood defined via statistical dis-tance, has been used in Ben-Tal et al. (2013), Love and Bayraksan (2015), Lam (2016b, 2018), and Hu and Hong (2013) under theumbrella of so-called /-divergence, which in particular includes the Kullback-Leibler divergence or the relative entropy that weemploy heavily. Other distances of recent interest include, for example, the Wasserstein distance (Esfahani and Kuhn 2018;Blanchet and Murthy 2016b). In the context of extreme risks relevant to actuarial applications, Blanchet and Murthy (2016a), Ataret al. (2015), Blanchet et al. (2014), and Dey and Juneja (2012) study the use of distances, for example, Renyi divergence, to capturemodel uncertainty in the tail, and Lam and Mottet (2017) and Mottet and Lam (2017) study the incorporation of tail shape informa-tion, along with the shape-constrained distributional optimization literature (Popescu 2005; Hanasusanto et al. 2017; Li et al. 2017;Van Parys et al. 2017). In the multivariate setting, Embrechts and Puccetti (2006), Puccetti and R€uschendorf (2013), Wang and Wang(2011), Dhara et al. (2017), Puccetti and R€uschendorf (2012), and Embrechts et al. (2013) study analytical solution and computationmethods for bounding worst-case copulas. Cox et al. (2013) investigates the use of moments and entropy maximization in portfolioselection to hedge downside risks of a mortality portfolio. Lastly, the calibration of the neighborhood size in distributionally robustoptimization has been investigated via the viewpoints of, for example, hypothesis testing (e.g., Gupta 2015), empirical likelihood andprofile inference (e.g., Wang et al. 2015; Lam and Zhou 2017; Blanchet and Kang 2016; Duchi et al. 2016; Lam 2016a; Gotoh et al.2018) and Bayesian (e.g, Bertsimas et al. 2018). For consistency, throughout this article, we will adopt the terminology from the oper-ations research literature and call our methodology distributionally robust optimization.

Our contribution in this article is to bring the combination of the ideas in the above areas to the attention of the actuarialcommunity. In addition, given that risk assessment is of special importance for actuaries, we also discuss the implications ofthe distributionally robust methodology in the setting of tail events beyond these past literatures. We will illustrate our model-ing approach and interpretations, how it can be used in actuarial problems, and solution methods employing a mix of analyticaltools, simulation, and convex optimization. We choose to present examples that are relatively simple for pedagogical purposes,but our goal is to convince the reader that the proposed methodology is substantially general.

2. BASIC DISTRIBUTIONALLY ROBUST PROBLEM FORMULATIONSuppose that, in a generic application, one is interested in computing a performance measure in the form of an expected

value of a function of underlying risk factors, that is, EtrueðhðXÞÞ, where X is a random variable (or random vector) taking val-ues in Rd, and h : Rd ! R is a performance function. The notation Etrueð�Þ denotes the expectation operator associated with anunderlying “true” or correct probabilistic model, which is unknown to the actuary.

To quantify model error, we propose, in the most basic form, to use the following pair of maximization and minimizationproblems, which we call the basic distributionally robust (BDR) problem formulation:

min=max E h Xð Þð Þs:t: D PkP0ð Þ � d:

(1)

34 J. BLANCHET ET AL.

Page 4: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

where

D PjjP0ð Þ ¼ EP logdPdP0

Xð Þ� �� �

¼ðlog

dPdP0

xð Þ� �

dP xð Þ (2)

is the so-called Kullback-Leibler (KL) divergence (1951) between P and P0, with the integral in (2) taken over the region inwhich X takes its values and dP=dP0 is the likelihood ratio between the probability models P and P0. Optimization (1) has adecision variable P, and the expectation in its objective function is with respect to P. Finally, d should be suitably calibrated(chosen as small as possible) to guarantee that

D PtruejjP0ð Þ � d: (3)

The KL divergence defined in (2) plays an important role in information theory and statistics (with connections to conceptssuch as entropy (Cover and Thomas 2012) and Fisher information (Cox and Hinkley 1979), it is also known as the relativeentropy whose properties have been substantially studied. The KL divergence measures the discrepancy between two probabil-ity distributions, in the sense that DðPkP0Þ ¼ 0 if and only if P is identical to P0 (also see discussion below). Thus, the con-straint in (1) can be viewed as a neighborhood around P0 in the space of models, where the size of the neighborhood ismeasured by the KL divergence.

The reason for selecting d depicted above is the following. First, by choosing d so that inequality (3) holds, we guaranteethat Ptrue is in the feasible region of (1). This translates, in turn, to that the interval obtained from the minimum and maximumvalues of (1) contains the true performance measure EtrueðhðXÞÞ. Second, d is chosen as small as possible because then thisobtained interval is the shortest, signaling a higher accuracy of our estimate.

The probability space of X described above can be substantially more general, including infinite dimensional objects suchas when X denotes a stochastic process like Brownian motion. However, to avoid technicalities, we will confine ourselves inthe framework X 2 Rd. In fact, to facilitate the discussion further, first we discuss the case where P is discrete finite. In thiscase, (1) can be written as

min=maxXMk¼1

p kð Þh kð Þ

s:t:

D pkp0ð Þ ¼XMk¼1

p kð Þ log p kð Þp0 kð Þ

!� d;

XMk¼1

p kð Þ ¼ 1; p kð Þ � 0 for k � 1;

(4)

where the decision variable is the probability mass function fpðkÞ : k ¼ 1; :::;Mg for some support size M, and fp0ðkÞ : k ¼1; :::;Mg is the baseline probability mass function.

Let us briefly discuss two key properties of the KL divergence that make formulation (4) appealing. First, by Jensen’sinequality, we have that

D pjjp0ð Þ ¼ �XMk¼1

p kð Þ log p0 kð Þp kð Þ

!� � log

XMk¼1

p kð Þ p0 kð Þp kð Þ

!¼ 0;

and Dðpjjp0Þ ¼ 0 if and only if pðkÞ ¼ p0ðkÞ for all k. In other words, the Dð�Þ allows us to compare the discrepancies betweenany two models, and the models agree if and only if there is no discrepancy. That is, in principle, the space of decision varia-bles can include any distributions in the form fpðkÞ : 1 � k � Mg, without confining to any parametric form. Thus the frame-work is capable to include model misspecification in a nonparametric manner.

It should be noted that Dð�Þ is not a distance in the mathematical sense, because it does not obey the triangle inequality.However, and as the second key property of the KL divergence, Dðpjjp0Þ is a convex function of pð�Þ. To see this, first note

ROBUST ACTUARIAL RISK ANALYSIS 35

Page 5: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

that the function lðxÞ ¼ x log ðxÞ is convex on ð0;1Þ. Second, observe that, if fpðkÞ : 0 � k � Mg and fqðkÞ : 0 � k � Mg aretwo probability distributions and a 2 ð0; 1Þ, then, because of Jensen’s inequality, for any fixed k,

lap kð Þ þ 1� að Þq kð Þ

p0 kð Þ

!� al

p kð Þp0 kð Þ

!þ 1�að Þl q kð Þ

p0 kð Þ

!:

Thus, we conclude

D apþ 1�að Þqjjp0� � ¼XM

k¼1

p0 kð Þl ap kð Þ þ 1�að Þq kð Þp0 kð Þ

!

�XMk¼1

ap0 kð Þl p kð Þp0 kð Þ

!þ 1� að Þp0 kð Þl q kð Þ

p0 kð Þ

!" #¼ aD pjjp0ð Þ þ 1�að ÞD qjjp0ð Þ:

Consequently, BDR problem (4) is a convex optimization problem with a linear objective function. These types of problemshave been well-studied and are computationally tractable using operations research tools.

Later in the article we will discuss the implications of KL divergence as a notion of discrepancy and propose other notionsand constraints. For now, we argue that there are direct variations of the BDR problem formulation that can be easily accom-modated within the same convex optimization framework. For this discussion, it helps to think of p(k) as the mortality distribu-tion at each age (so k ¼ 1; :::; 100 for instance). Suppose that an actuary is less confident in the estimate for ptrueðkÞ for smallvalues of k; that is, assume that p0ðkÞ� ptrueðkÞ for large values of k, but the actuary is uncertain about how similar p0ðkÞ is toptrueðkÞ for small values of k (this arises if the mortality estimate is more credible for some age groups for instance). Then wecan replace the first inequality constraint in (4) by introducing a weighting function fwðkÞ : 1 � k � Mg with wðkÞ>0 whichis increasing, thus obtaining

XMk¼1

w kð Þp kð Þ log p kð Þp0 kð Þ

!� d: (5)

To understand why wð�Þ should be increasing in order to account for model errors from misspecifying p0ðkÞ for small valuesof k, consider the following exemplary case. Suppose for some k0>0, we have (1) wðkÞ ¼ e>0 for k � k0 and e small, and (2)wðkÞ ¼ 1 for k>k0. Then observe that the constraint (5) is relatively insensitive to the value of p(k) for k � k0. Therefore, theconvex optimization program will have more freedom for the p(k) on k � k0 to jitter and improve the objective function with-out having a significant impact on feasibility.

Another variation of the BDR formulation includes moment constraints. For instance, suppose additional information isknown for the expected time-until-death for individuals who have a particular underlying medical condition and are at least 30years old (from a series of medical studies, for example). Using such information one might impose a constraint of the formEðXjX � 30Þ 2 ½a�; aþ� for a specified range ½a�; aþ�, or equivalently EðXIðX � 30ÞÞ 2 ½PðX � 30Þa�;PðX � 30Þaþ�(throughout the article, we use I(A) to denote the indicator of A, i.e., IðAÞ ¼ 1 if A occurs and ¼ 0 if not). Using this informa-tion, one can add the constraints

a�XMk¼30

p kð Þ �XMk¼30

p kð Þk � aþXMk¼30

p kð Þ; (6)

which are linear inequalities, and therefore the resulting optimization problem is still convex and tractable. In Section 9 wewill continue discussing other constraints that can inform the BDR formulation with alternate forms of expert knowledge.

At this point, several questions might be in order: How do we solve the BDR optimization problem and its variations? Howcan we understand such solution intuitively? What is the role of the constraint in (1)? How do we choose d? How do we extendthe methodology to deal with possibly multidimensional distributions? Our goal is to address these questions throughout therest of this article.

36 J. BLANCHET ET AL.

Page 6: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

3. SOLVING THE BDR FORMULATIONThis section describes how to solve (4) and then transitions toward the more general problem (1) in an intuitive way. We

concentrate on the problem of maximization; the minimization counterpart is analogous, and we will summarize the differen-ces at the end of our discussion.

3.1. The Maximization FormWe introduce Lagrange multipliers to solve the convex optimization problem (4). The Lagrangian takes the form

g p 1ð Þ; :::; p Mð Þ; k1; k2� �

¼XMk¼1

p kð Þh kð Þ�k1XMk¼1

p kð Þ log p kð Þp0 kð Þ

!� d

!�k2

XMk¼1

p kð Þ � 1

!:

The Karush-Kuhn-Tucker (KKT) Boyd and Vandenberghe (2004) conditions in our (convex optimization) setting characterizean optimal solution. Denoting an optimal solution for (4) as fpþðkÞ : 1 � k � Mg, and the corresponding Lagrange multipliersas kþ1 and kþ2 , the KKT conditions are as follows (we use “þ” to denote an optimal solution for the maximization formulation,as opposed to “–” for the minimization counterpart, which we shall discuss momentarily as well):

@g

@p kð Þ pþ 1ð Þ; :::; pþ Mð Þ; kþ1 ; kþ2� �¼ h kð Þ�kþ1 1þ log

pþ kð Þp0 kð Þ

! !�kþ2 � 0; k ¼ 1; :::;M;

(7)

pþ kð Þ @g

@p kð Þ pþ 1ð Þ; :::; pþ Mð Þ; kþ1 ; kþ2� �

¼ pþ kð Þ h kð Þ � kþ1 1þ logpþ kð Þp0 kð Þ

! !� kþ2

" #¼ 0; k ¼ 1; :::;M;

(8)

XMk¼1

pþ kð Þ log pþ kð Þp0 kð Þ

!� d;

XMk¼1

pþ kð Þ ¼ 1; pþ kð Þ � 0 for 1 � k � M; (9)

kþ1 � 0; kþ2 is free; (10)

kþ1XMk¼1

pþ kð Þ log pþ kð Þp0 kð Þ

!� d

!¼ 0: (11)

The relations (7) and (8) correspond to the so-called stationarity conditions, (9) the primal feasibility, (10) the dual feasibility,and (11) the complementary slackness condition.

Define

Mþ ¼ argmax h kð Þ : 1 � k � M� �

;

that is, Mþ is the index set on which hð�Þ achieves its maximum value. Also, denote h� ¼ maxfhðkÞ : 1 � k � Mg as the max-imum value of hð�Þ. We analyze the solution by dividing it into two cases, depending on whether log ð1=P0ðX 2 MþÞÞ � d.

Case 1: log ð1=P0ðX 2 MþÞÞ � d.Consider

pþ kð Þ ¼ p0 kð ÞI k 2 Mþð ÞPj2Mþ p0 jð Þ ¼ P0 X ¼ kjX 2 Mþð Þ; (12)

for k 2 Mþ, and pþðkÞ ¼ 0 otherwise, that is, pþð�Þ is the conditional distribution of X given X 2 Mþ. Note that this choiceof pþð�Þ gives

ROBUST ACTUARIAL RISK ANALYSIS 37

Page 7: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

XMk¼1

pþ kð Þ log pþ kð Þp0 kð Þ

!¼ log

1P0 X 2 Mþð Þ� �

� d

so that (9) holds. Moreover, choosing kþ1 ¼ 0 and kþ2 ¼ h�, we have (7), (8), (10) and (11) all hold. Thus, (12) is an optimalsolution in this case.

Case 2: log ð1=P0ðX 2 MþÞÞ>d.Consider setting

h kð Þ�kþ1 1þ logpþ kð Þp0 kð Þ

! !�kþ2 ¼ 0; k ¼ 1; :::;M;

giving pþðkÞ ¼ p0ðkÞ exp ððhðkÞ�kþ1 �kþ2 Þ=kþ1 Þ. Denoting hþ ¼ 1=kþ1 and wþ ¼ kþ2 =kþ1 þ 1, we can write

pþ kð Þ ¼ p0 kð Þ exp hþh kð Þ � wþ� �

: (13)

To satisfyPM

k¼1 pþðkÞ ¼ 1 in (9), we must have wþ ¼ logPM

k¼1 p0ðkÞ exp ðhþhðkÞÞ, that is, wþ is the logarithmic momentgenerating function of h(X) under p0ð�Þ at hþ. Now, to satisfy (11), we enforce hþ to be the positive root of the equation

XMk¼1

pþ kð Þ log pþ kð Þp0 kð Þ

!¼ hþ

PMk¼1

p0 kð Þ exp hþh kð Þ� �h kð Þ

PMk¼1

p0 kð Þ exp hþh kð Þ� � � logXMk¼1

p0 kð Þ exp hþh kð Þ� � !¼ d; (14)

where the first equality is obtained by plugging in the expression of pþð�Þ in (13). We argue that such a positive root must exist.Note that the left-hand side of (14) is continuous and increasing (continuity is immediate, and monotonicity can be verified by some-what tedious but elementary differentiation). When hþ ! 0 in (14), the left hand side goes to 0. When hþ ! 1, it becomes

hþh� þ O exp �chþð Þð Þ� hþh� þ logXk2Mþ

p0 kð Þ þ O exp �chþð Þð Þ� �

¼ � logXk2Mþ

p0 kð Þ þ O exp �chþð Þð Þ:

for some constant c> 0, by singling out the dominant exponential factor exp ðhþh�Þ and noticing a relative exponential decayin the remaining terms in the expression in (14). But note that � log

Pk2Mþ p0ðkÞ þ Oð exp ð�chþÞÞ>d as hþ ! 1 in our

considered case. Thus we must have a positive root for (14). Consequently, one can verify straightforwardly that the choice ofpþð�Þ in (13) with hþ>0 solving (14) satisfies all of (7), (8), (9), (10) and (11).

We comment that Case 1 can be regarded as a degenerate case and rarely arises in practice, while Case 2 is the more importantcase to consider. From (13) in Case 2, pþð�Þ can be interpreted as a member of a “natural exponential family,” also known as an“exponential tilting” distribution that arises often in statistics, large deviations analysis and importance sampling in Monte Carlosimulation. On the other hand, the form of pþð�Þ in (12) in Case 1 can be interpreted as a limit of (13) as hþ ! 1, namely

pþ kð Þ ¼ limh!1

p0 kð Þ exp hh kð Þð ÞPMj¼1 p0 jð Þ exp hh jð Þð Þ ¼

p0 kð ÞI k 2 Mþð ÞPj2Mþ p0 jð Þ ¼ P0 X ¼ k j X 2 Mþð Þ:

3.2. The Minimization FormThe minimization form of the BDR problem is analogous to the maximization one. In this case, the Lagrangian takes the form

g p 1ð Þ; :::; p Mð Þ; k1; k2� �

¼XMk¼1

p kð Þh kð Þ þ k1XMk¼1

p kð Þ log p kð Þp0 kð Þ

!� d

!þ k2

XMk¼1

p kð Þ � 1

!:

38 J. BLANCHET ET AL.

Page 8: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

With the corresponding KKT conditions and similar discussion as before, we arrive at the two analogous cases. Here, we define

M� ¼ argmin h kð Þ : 1 � k � M� �

:

Case 1: log ð1=P0ðX 2 M�ÞÞ � d.An optimal solution is given by

p� kð Þ ¼ P0 X ¼ kjX 2 M�ð Þ:

Case 2: log ð1=P0ðX 2 M�ÞÞ>d.An optimal solution is given by

p� kð Þ ¼ p0 kð Þ exp �h�h kð Þ � w h�ð Þ� �;

w h�ð Þ ¼ logXMk¼1

p0 kð Þ exp �h�h kð Þð Þ;

with h�>0 satisfying

�h�

PMk¼1

p0 kð Þ exp �h�h kð Þð Þh kð ÞPMk¼1

p0 kð Þ exp �h�h kð Þð Þ� log

XMk¼1

p0 kð Þ exp �h�h kð Þð Þ !

¼ d:

3.3. Solutions for the General FormulationThe insights of the solutions obtained in Sections 3.1 and 3.2 can be extended to the general BDR formulation (1). First,

denoting E0ð�Þ as the expectation under the baseline model P0ð�Þ, (1) can be written equivalently as optimizing over all randomvariables ZðxÞ, where x is an element of the underlying outcome space X, in the form

max E0 h Xð ÞZð Þs:t:

E0 Z log Zð Þ� �� d

Z xð Þ � 0; for all x; and EP0 Zð Þ ¼ 1:

(15)

The object ZðxÞ is the likelihood ratio between P and P0, which is well-defined because the KL divergence is defined only forP absolutely continuous with respect to P0. The solution to (15), denoted by ðZþðxÞ : x 2 XÞ, can be used to obtain the solu-tion Pþð�Þ to (1), by defining Pþð�Þ via

Pþ X 2 Að Þ ¼ E0 ZþI X 2 Að Þ� �:

Conceptually, the general problem formulation is not much different from the finite outcome case (4). It can be shown (e.g.,Petersen et al. 2000; Breuer and Csisz�ar 2016) that exactly the same form of the optimal solution applies. For instance, if f0ð�Þis the density of X in Rd, then

Zþ ¼ fþ Xð Þf0 Xð Þ ¼ exp hþh Xð Þ � w hþð Þ� �

;

w hþð Þ ¼ logE0 exp hþh Xð Þ� �� �;

with

hþE0 exp hþh Xð Þ� �

h Xð Þ� �E0 exp hþh Xð Þ� �� � � log E0 exp hþh Xð Þ� �� �� �

¼ d

ROBUST ACTUARIAL RISK ANALYSIS 39

Page 9: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

in case log ð1=P0ðX 2 MþÞÞ>d, where

Mþ ¼ argmax h xð Þ : x 2 Rd� �

:

However, if log ð1=P0ðX 2 MþÞÞ � d, then the optimal solution is a probability distribution given by

Pþ X 2 Að Þ ¼ P0 X 2 AjX 2 Mþð Þ:

3.4. Numerical Example of BDR FormulationLet us assume that X is the time-until-death of an individual entering a whole life insurance contract that pays $1 benefit to

the beneficiaries at the time of death. Assume that the force of interest (continuously compounded) is given by r> 0. We areinterested in the expected net present value of the benefits paid, denoted as hðXÞ ¼ exp ð�rXÞ. This expected value is thusgiven by EtrueðhðXÞÞ ¼ Etrueð exp ð�rXÞÞ. A more realistic example might consider stochastic interest rates or a portfolio of dif-ferent types of contracts, but here we stay with this simple example for clear illustration.

Consider the more concrete setting where X takes values only on the set f1; :::; 100g, the benefit is paid in integer years, and theindividual in consideration is 20 years old so that the individual could die at ages 21, 22,… , 120. Now, imagine a situation in whichwe know that the individual has a medical condition for which not much is known, so there might be substantial variability in ourestimation of the distribution of X. Or perhaps there is a significant likelihood that medical advances might be expected to occur inthe next 10 years or so, and therefore, if p0ð�Þ were calibrated based on historical data, p0ð�Þ might simply be an incorrect model.

To illustrate numerically the BDR formulation (4), we set the baseline distribution p0ðkÞ from the recent static mortalitytable under the Internal Revenue Code (Internal Revenue Service 2016, Appendix, “unisex” column), and consider the estima-tion of the worst-case expected payoff when the true mortality distribution deviates from p0ðkÞ as described by (4).

Figure 1 shows the maximum and minimum values of (4) under different d, when the force of interest is set at r ¼.015. Themaximum value increases with d, while the minimum value decreases with d, both at a decreasing rate. The effects on the max-imum values seem to be stronger than on the minimum values when the distribution changes from the baseline, as shown by a

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 1 2 3 4 5 6 7

Expe

cted

Pre

sent

Val

ue

Optimal(Best−case)Optimal(Worst−case)Original

Worst/Best−case Expected Present Value of Payoff

FIGURE 1. Robust Estimates Against d.

40 J. BLANCHET ET AL.

Page 10: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

larger magnitude of the upward movement as d increases. Note that the bounds, especially the upper one, grow quite wide as dincreases. This indicates that under no other information, the worst-case impact of model uncertainty can be quite significant.This impact can be reduced if one adds auxiliary information, either from additional data source or from expert knowledge(e.g., moment equalities or inequalities in (6) and in the subsequent Section 9).

Figures 2, 3, and 4 show the shapes of the distributions giving rise to the maximum and minimum expected payoffs, com-pared with the baseline distribution, at d¼ 1, 2, and 5 respectively. We can see a sharp concentration of mass close to 0 for themaximal distribution, as higher chance of sooner death leads to a larger expected present value of payoff. In contrast, moremass is located toward older ages for the minimal distribution, leading to a reduction in the expected present value of payoff.As d increases, the accumulation of the mass at 0 for the maximal distribution tends to be heavier, as well as the shift of massto older ages for the minimal distribution. The mass shift for the maximal distribution seems to be more dramatic than for theminimal one, which is consistent with Figure 1 that shows a wider bound for the maximization problem.

4. DISTRIBUTIONALLY ROBUST ANALYSIS OF DEPENDENCEWe consider a variation of the BDR formulation that is suitable for quantifying the impact of dependence. We revisit the example

of whole-life insurance in Section 3.4 as an illustration, but now we assume that a couple is interested in a contract that pays $1benefit at the time of the first death between the couple. In this case, the payoff equals hðX;YÞ ¼ exp ð�r minðX; YÞÞ, where Xand Y are the time-until-death of the individual and his or her spouse, respectively. We assume that both spouses are 20 years old atthe time of signing the risk premium valuation, so that the pair (X, Y) is supported on the set f1; :::; 100g f1; :::; 100g.

We are interested in the potential impact on the estimated actuarial net present value of the benefits due to unaccounteddependence structures. Suppose that, this time, enough information is available to estimate the distributions of X and Y margin-ally with relatively high accuracy, but the joint distribution is difficult to estimate. That is, the marginals are known to be

Ptrue Y ¼ jð Þ ¼X100i¼1

P0 X ¼ i;Y ¼ jð Þ;

Ptrue X ¼ ið Þ ¼X100j¼1

P0 X ¼ i;Y ¼ jð Þ:

0.00

0.02

0.04

0.06

0 25 50 75 100k

Prob

abilit

y M

ass

DistributionP0Pk−Pk+

=1

FIGURE 2. Optimal Probability Mass Function at d¼ 1.

ROBUST ACTUARIAL RISK ANALYSIS 41

Page 11: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

A joint baseline model p0ði; jÞ ¼ PðX ¼ i;Y ¼ jÞ can be conjectured (e.g., using an independent marginal assumption).To capture the above type of uncertainty, the distributionally robust (maximization) formulation is given by

maxXMi¼1

XMj¼1

p i; jð Þh i; jð Þ

s:t:XMi¼1

XMj¼1

p i; jð Þ log p i; jð Þp0 i; jð Þ� �

� d;

XMj¼1

p i; jð Þ ¼ P0 X ¼ ið Þ for all i;XMi¼1

p i; jð Þ ¼ P0 Y ¼ jð Þ for all j;

p i; jð Þ � 0 for all i; j;

(16)

where we write (16) in a more general fashion that outputs the worst-case expected value E½hðX;YÞ� over P(X, Y) supported onfði; jÞ : 1 � i � M; 1 � j � Mg. The first constraint is a KL divergence from the baseline p0ði; jÞ like in (1). The second con-straint stipulates that the marginal distributions are known to be P0ðX ¼ iÞ and P0ðY ¼ jÞ. In the joint whole-life insuranceexample, M would be set as 100.

We discuss how to solve (16), concentrating on the case analogous to Case 2 analyzed in the BDR formulation in Section3.1 (i.e., assuming the neighborhood size d is not too big, which is the more interesting case).

It is worth mentioning that p ¼ p0 is a feasible solution, and therefore we can verify the Slater condition (see Boyd andVandenberghe 2004). Such a condition guarantees that strong duality holds and that the KKT conditions are necessary and suf-ficient for optimality. Consequently, commercial packages can be used to solve a convex optimization problem like (16) veryquickly. Here, we will describe an iterative procedure based on exponential tilting similar to that for problem (4). In particular,an analogous argument in using the stationarity KKT condition in Case 2 there leads to an optimal solution of the form

pþ i; jð Þ ¼ P0 X ¼ i;Y ¼ jð Þ exp hþh i; jð Þ�ahþ ið Þ�bhþ jð Þ�

;

0.00

0.02

0.04

0.06

0 25 50 75 100k

Prob

abilit

y M

ass

DistributionP0Pk−Pk+

=2

FIGURE 3. Optimal Probability Mass Function at d¼ 2.

42 J. BLANCHET ET AL.

Page 12: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

where Xj

exp hþh i; jð Þ � bhþ jð Þ� �P0 X ¼ i;Y ¼ jð Þ ¼ exp ahþ ið Þ� �

P0 X ¼ ið Þ; (17)

Xi

exp hþh i; jð Þ � ahþ ið Þ� �P0 X ¼ i; Y ¼ jð Þ ¼ exp bhþ jð Þ� �

P0 Y ¼ jð Þ; (18)

where (17) and (18) guarantee the marginal distributions match the known values. The form above motivates the followingiterative scheme (see Deming and Stephan 1940; Glasserman and Yang 2018): Given a h, define

ph i; jð Þ ¼ exp hh i; jð Þð Þp0 i; jð ÞPk

Pl exp hh k; lð Þð Þp0 k; lð Þ :

Then iteratively update phði; jÞ so that at each step, for each i; j ¼ 1; :::;M replace phði; jÞ with

ph i; jð ÞP0 X ¼ ið ÞPl ph i; lð Þ

and then for each i; j ¼ 1; :::;M, replace phði; jÞ with

ph i; jð ÞP0 Y ¼ jð ÞPk ph k; jð Þ

until convergence. The iteration ensures the marginal distributions of the resulting probability mass function are equal toP0ðX ¼ iÞ; i ¼ 1; :::;M, and P0ðY ¼ jÞ; j ¼ 1; :::;M, respectively. Supposing, given any h, the phð�; �Þ is obtained in the waydescribed above, we solve

0.00

0.05

0.10

0.15

0 25 50 75 100k

Prob

abilit

y M

ass

DistributionP0Pk−Pk+

=5

FIGURE 4. Optimal Probability Mass Function at d¼ 5.

ROBUST ACTUARIAL RISK ANALYSIS 43

Page 13: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

minh

XMi¼1

XMj¼1

ph i; jð Þh i; jð Þ� 1h

XMi¼1

XMj¼1

ph i; jð Þ log ph i; jð Þp0 i; jð Þ� �

� d

0@ 1A

to get hþ. An optimal mass function is then pþði; jÞ ¼ phþði; jÞ.

4.1. Numerical Example on Joint MortalityWe illustrate numerically the solution to optimization (16). Consider, as discussed above, the estimation of the expected

payoff hðX;YÞ ¼ exp ð�r minðX;YÞÞ in the described setting. We set the baseline joint distribution p0ði; jÞ from the same mor-tality table used in Section 3.4, but now assuming independence of the times-until-death of the couple, i.e.p0ði; jÞ ¼ P0ðX ¼ iÞP0ðY ¼ jÞ. This independent baseline model describes the simplest assumption made by insurers whenpricing life products, and we are interested in a robust estimate of the expected payoff when this assumption is violated.

Figure 5 shows the worst scenario values of (16) under different values of d, setting r ¼.015. As depicted, there is a con-cavely increasing trend for the worst-case expected payoff as the level of dependency deviates from independence. While theestimate is .0729 according to the independent model, it can potentially rise to .0734 when the true model is misspecified fromindependence to a level of d ¼ 0:002. Note that the magnitude of the changes in worst-case estimates is significantly less dra-matic than the example in Section 3.4. This is because here we have injected complete marginal information, and the model isallowed to deviate only in terms of dependence structure. From the small change of magnitude we can see that the marginalinformation is an important piece that substantially reduces the freedom of deviation.

Figures 6 and 7 further show the approximate surfaces of the optimal joint probability mass function at d ¼ 0:013 and theoriginal independent joint probability mass function, respectively. Their differences are more clearly visualized in Figure 8.First, the maximum difference of the mass functions is close to 0.01, which is quite small. This reconciles with the observationin Figure 5 that the deviations of the models are significantly confined due to known marginal information. Next, in terms ofthe shape of the differences, the worst-case distribution falls below the baseline in a region centered around (70, 70). Thisseems to be the region where the baseline probability mass function peaks. The worst-case distribution spreads out this peakand allocates more masses to other regions to achieve a higher objective function. The positive difference seems clusteredaround two regions and is interpreted as a consequence of the spreading period. Note that the integration of the differencesmust be zero, so both positive and negative differences must be present.

0.0000 0.0005 0.0010 0.0015 0.0020

0.07

290.

0730

0.07

310.

0732

0.07

330.

0734

Worst−case Expected Present Value of Payoff

Expe

cted

Pre

sent

Val

ue

OptimalOriginal

FIGURE 5. Robust Estimate Against d.

44 J. BLANCHET ET AL.

Page 14: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

5. DISTRIBUTIONALLY ROBUST RARE-EVENT ANALYSISConsider hðXÞ ¼ IðX 2 BÞ for some set B, where we recall Ið�Þ as the indicator function. We are interested in the probabil-

ity PtrueðX 2 BÞ ¼ Etrue½IðX 2 BÞ�, which is known to be small. We wish to solve the BDR maximization problem

max P X 2 Bð Þs:t: D PkP0ð Þ � d;

(19)

where a baseline distribution P0 of X has been suitably calibrated. Because we are in a rare-event setting, it is reasonable toassume that log ð1=P0ðX 2 BÞÞ>d. Therefore, applying the general solution form of the BDR maximization problem in

x

20 40 6080

100

y

20

40

60

80

100

0.00

0.05

0.10

0.15

Optimal Distribution

0.00

0.05

0.10

0.15

FIGURE 6. Optimal Probability Mass Function at d ¼ 0:013.

x

20 40 6080

100

y

20

40

60

80

100

0.00

0.05

0.10

0.15

Initial Distribution

0.00

0.05

0.10

0.15

FIGURE 7. Baseline Probability Mass Function.

ROBUST ACTUARIAL RISK ANALYSIS 45

Page 15: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

Section 3.3, a worst-case distribution is given by Pþð�Þ defined such that for each set A,

Pþ X 2 Að Þ ¼ E0 exp hþI X 2 Bð Þ� �I X 2 Að Þ� �

E0 exp hþI X 2 Bð Þ� �� �¼ exp hþð ÞP0 X 2 A;X 2 Bð Þ

exp hþð ÞP0 X 2 Bð Þ þ P0 X 62 Bð Þ þP0 X 2 A;X 62 Bð Þ

exp hþð ÞP0 X 2 Bð Þ þ P0 X 62 Bð Þ ;(20)

for some hþ 2 ð0;1Þ. To obtain a clearer interpretation of this worst-case distribution, let us write

Pþ X 2 Að Þ ¼ exp hþð Þ�1ð ÞP0 X 2 A;X 2 Bð Þexp hþð ÞP0 X 2 Bð Þ þ P0 X 62 Bð Þ þ

P0 X 2 Að Þexp hþð ÞP0 X 2 Bð Þ þ P0 X 62 Bð Þ

¼ a hþð ÞP0 X 2 AjX 2 Bð Þ þ 1�a hþð Þð ÞP0 X 2 Að Þ;

where

a hþð Þ ¼ exp hþð Þ�1ð ÞP0 X 2 Bð Þexp hþð ÞP0 X 2 Bð Þ þ P0 X 62 Bð Þ>0

and hþ>0 satisfies

hþexp hþð ÞP0 X 2 Bð Þ

exp hþð Þ � 1ð ÞP0 X 2 Bð Þ þ 1� log exp hþð Þ�1ð ÞP0 X 2 Bð Þ þ 1

� � ¼ d: (21)

Consequently, in simple words, in the context of distributionally robust performance analysis of rare-event probabilities, theworst-case measure is a (specific) mixture between the conditional distribution of X given X 2 B and the unconditional distri-bution, under the baseline measure (regardless of how it was calibrated). This allows the actuary to obtain bounds on rare-eventprobabilities with the BDR formulation easily using only the baseline measure.

x

20 40 6080

100

y

20

40

60

80

100

−0.03

−0.02

−0.01

0.00

0.01

Difference between Optimal and Original Distributions

−0.02

−0.01

0.00

0.01

FIGURE 8. Difference between Optimal and Baseline Probability Mass Functions.

46 J. BLANCHET ET AL.

Page 16: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

Moreover, note that the worst scenario value is given by

Pþ X 2 Bð Þ ¼ exp hþð ÞP0 X 2 Bð Þexp hþð ÞP0 X 2 Bð Þ þ P0 X 62 Bð Þ ¼

exp hþð ÞP0 X 2 Bð Þ1þ exp hþð Þ � 1ð ÞP0 X 2 Bð Þ : (22)

Now, we investigate the asymptotic behavior of the worst-scenario value (22) when we fix d and let P0ðX 2 BÞ ! 0. Thisasymptotics at a fixed model uncertainty level d while the probability of interest is close to 0 is relevant from an applied stand-point in rare-event analysis. It models the situation in which we have limited data, or postulate a specific imperfect parametricmodel P0, yet we want to estimate probabilities that are very small.

To perform the asymptotic analysis on PþðX 2 BÞ, we first need to understand the behavior of hþ that satisfies (21). Wecarry out a heuristic development that follows and can be rigorized by the argument in Theorem 1 in Blanchet et al. (2014).Suppose d>0 is fixed and P0ðX 2 BÞ ! 0. Let us postulate that hþ exp ðhþÞP0ðX 2 BÞ ¼ g for some g>0 that remainsbounded away from zero and infinity as P0ðX 2 BÞ ! 0. This implies that hþ ! 1 as P0ðX 2 BÞ ! 0, and therefore we haveexp ðhþÞP0ðX 2 BÞ ! 0 as P0ðX 2 BÞ ! 0. Consequently, ð exp ðhþÞ�1ÞP0ðX 2 BÞ ! 0 and the left-hand side of (21) isasymptotically equivalent to hþ exp ðhþÞP0ðX 2 BÞ, so that hþ exp ðhþÞP0ðX 2 BÞ� d as P0ðX 2 BÞ ! 0. This implieshþ � log ð1=P0ðX 2 BÞÞ. In turn, from (22), we get that

Pþ X 2 Bð Þ� dhþ

� d

log 1=P0 X 2 Bð Þ� � :To understand decision-making implications, suppose that X models the potential losses of an insurance company, and the

baseline loss probability P0ðX>tÞ ¼ exp ð�K0ðtÞÞ for some function K0ð�Þ, that is, under P0, X has a cumulative hazard func-tion K0ð�Þ. Now, say we wish to compute b so that PtrueðX>bÞ � :005. Suppose that d ¼ :1, but the model Ptrue is not known.Then, based on the worst-case distribution of our robust analysis above, we must choose b so that

Pþ X>bð Þ� d

K0 bð Þ ¼:1

K0 bð Þ � 0:005; (23)

which implies that b�K�10 ð20Þ. For example, if X is exponentially distributed with unit mean under P0, so that K0ðbÞ ¼ b,

then this gives b� 20. In contrast, if one trusts the exponential model fully, then one would choose exp ð�bÞ ¼ 0:005, whichyields b� 5:3.

Paraphrasing, if b is the value of the statutory solvency capital needed to withstand losses with probability at least 0.995,and if the actuary uses an exponential model with unit mean, then a deviation of 0.1 units measured in KL divergence betweenthe assumed (exponential) model and the unknown reality might underestimate the statutory capital by a factor ofabout 20=5:3� 3:7.

Another important message is that the BDR formulation might provide very conservative estimates for rare-event probabil-ities when d (i.e., the size of the uncertainty) is large relative to the rare-event probability of interest. Suppose again that X isexponentially distributed with unit mean under P0, then (23) indicates that PþðX>bÞ� 0:1=b, giving a Pareto-type tail behav-ior under Pþ. In Section 9, we will discuss other types of distances that can give less conservative estimates for rare-eventprobabilities.

5.1. Numerical Example for Rare-Event ProbabilityConsider calculating the loss probability PtrueðL>bÞ for a given reserve level b, where the loss model is of the simple form

L ¼ X1 þ � � � þ Xn. Under the baseline model P0, Xi’s follow a multivariate Gaussian distribution, that is,ðX1; :::;XdÞNðl;RÞ for some mean vector l and covariance matrix R, with specific distributional parameters given moment-arily. Consequently,

P0 L>bð Þ ¼ �Ub�10lffiffiffiffiffiffiffiffiffiffi10R1

p !

;

where 1 is a vector of entries equal to one, 0 denotes transpose, and �Uð�Þ is the tail distribution function of a standard Gaussianrandom variable.

ROBUST ACTUARIAL RISK ANALYSIS 47

Page 17: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

Now suppose that the actuary is uncertain about whether ðX1; :::;XdÞ follows a Gaussian model. So we formulate the corre-sponding BDR problem (19) with the objective function being PðL>bÞ. We first check whether log ð1=P0ðL>bÞÞ � d, analo-gous to Case 1 in Section 3.1. If this is the case, then the worst-case density of L is

fþ xð Þ ¼/ x� 10l� �

=ffiffiffiffiffiffiffiffiffiffi10R1

p� P0 L > bð Þ ffiffiffiffiffiffiffiffiffiffi

10R1p I x>bð Þ; (24)

where /ð�Þ is the density of a standard Gaussian random variable, and PþðL>bÞ ¼ 1. To see this, note that hðlÞ ¼ Iðl>bÞ, and(24) is precisely the conditional distribution of L given that L > b under the model P0, which is the solution depicted in Case 1in Section 3.1.

Otherwise, if log ð1=P0ðL>bÞÞ>d, then the worst-case density is given by

fþ xð Þ ¼

exp hþð Þ/ x� 10l� �

=ffiffiffiffiffiffiffiffiffiffi10R1

p� =ffiffiffiffiffiffiffiffiffiffi10R1

p

exp hþð ÞP0 L > bð Þ þ P0 L � bð Þ for x>b

/ x� 10l� �

=ffiffiffiffiffiffiffiffiffiffi10R1

p� =ffiffiffiffiffiffiffiffiffiffi10R1

p

exp hþð ÞP0 L > bð Þ þ P0 L � bð Þ for x � b;

8>>>>>><>>>>>>:(25)

where hþ>0 is the root of the equation

hþexp hþð ÞP0 L>bð Þ

exp hþð ÞP0 L > bð Þ þ P0 L � bð Þ� log exp hþð ÞP0 L>bð Þ þ P0 L � bð Þ� � ¼ d:

These are obtained from (20) and (21), or analogously from Case 2 in Section 3.1. Alternately, using the interpretation ofthe associated exponential tilting discussed in Section 3.1, we have

fþ xð Þ / / x� 10l� �

=ffiffiffiffiffiffiffiffiffiffi10R1

p� exp hþh xð Þð Þ

/ / x� 10l� �

=ffiffiffiffiffiffiffiffiffiffi10R1

p� exp hþð ÞI x>bð Þ þ / x� 10l

� �=ffiffiffiffiffiffiffiffiffiffi10R1

p� I x � bð Þ;

which coincides with (25). Observe that the worst-case density is a weighted version of the original one, thus largely preserv-ing the shape in the corresponding regions. It assigns more (uniform) weight to the region where x > b in a way that is as largeas possible but preserves the validity of the KL and the integrating-to-one constraints.

We illustrate the above numerically for n¼ 5, with means of Xi’s being l1 ¼ 1:92; l2 ¼ 1:42; l3 ¼ 1:13; l4 ¼ 1:80; l5 ¼1:54 (five numbers generated uniformly between 1 and 2) and covariance matrix

FIGURE 9. Approximate Worst-case Density under d ¼ 0:1.

48 J. BLANCHET ET AL.

Page 18: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

R ¼

1:11 �0:21 �0:42 �0:72 1:30

�0:21 4:82 0:89 �0:31 �1:50

�0:42 0:89 1:05 0:61 �0:52

�0:72 �0:31 0:61 2:58 �0:45

1:30 �1:50 �0:52 �0:45 1:91

26666664

37777775;

which is randomly generated from a Wishart distribution with 5 degrees of freedom and an identity scale matrix. Settingb¼ 10, the probability P0ðL>bÞ is now given by 0.23. Figure 9 shows the worst-case density under d ¼ 0:1 and the baselinedensity. Note that the worst-case density puts more mass on the right side of b, and less on its left side, in order to boost thelikelihood of a big loss. In doing so, there is a spike occurring exactly at b. This spike occurs because the adjustment leadingto the worst-case density attempts to maintain the shape of distribution as much as possible, so that in each of the regionsbelow and above b the density is multiplied by a constant weight. Finally, Figure 10 shows, much like Figure 1, a concavegrowth pattern for the worst-case probability as d increases.

6. THE FEASIBLE REGION IN THE BDR FORMULATIONThis section discusses the selection of the parameter d>0. We present two main approaches: One is estimation using histor-

ical data. Another is to understand the choice of d in terms of systematic stress testing. In the second approach, we will alsoestablish the connection with conventional stress testing approaches and parametric sensitivity analysis.

6.1. Data-Driven EstimationFor simplicity, we assume that the true model has a density ftrueð�Þ, and the baseline model has a density f0ð�Þ. The most dir-

ect approach is to estimate

D PtruejjP0ð Þ ¼ðlog

ftrue xð Þf0 xð Þ

� �ftrue xð Þ dx: (26)

Some proposed procedures that guarantee convergence can be found in, for example, Nguyen et al. (2010); P�oczos andSchneider (2011). The problem with this approach, however, is that the rate of convergence of the estimator depends on thedimension of the underlying density. Intuitively, this is because this approach indirectly needs to estimate density nonparamet-rically, which has a similar dependence on the dimension. An alternate approach that is less conservative is based on empiricallikelihood that we discuss next.

To explain, first suppose that we observe X1; :::;Xn as an independent and identically distributed (i.i.d.) sample from X andform the empirical probability mass function

ln xð Þ ¼ 1n

Xni¼1

I Xi ¼ xð Þ;

FIGURE 10. Worst-case Probability Against d.

ROBUST ACTUARIAL RISK ANALYSIS 49

Page 19: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

which is uniform on the set fX1; :::;Xng. Now, for any given set of weights w ¼ ðw1; :::;wnÞ such that wi � 0 andPni¼1 wi ¼ 1, define

vn x;wð Þ ¼Xni¼1

wiI Xi ¼ xð Þ:

In simple words, vnð�;wÞ is a probability mass function that assigns probability wi to the value Xi. Then consider the so-calledprofile log-likelihood function (Owen 2001)

Rn cð Þ ¼ min D vn �;wð Þjjln �ð Þ� �: Evn �;wð Þ h Xð Þð Þ ¼

Xx2 X1;:::;Xnf g

vn x;wð Þh xð Þ ¼Xni¼1

wih Xið Þ ¼ c

( ):

It turns out, under the null hypothesis that EtrueðhðXÞÞ ¼ c and other mild conditions (including VartrueðhðXÞÞ<1), that we have

2nRn cð Þ ) v21:

That is, under the null hypothesis, 2nRnðcÞ follows approximately a chi-square distribution with one degree of freedom. Thisresult is classical in the theory of empirical likelihood (Owen 2001). To make our discussion as self-contained as possible, weprovide formal derivation of the result in the Appendix.

The connection with the BDR problem formulation can be established as follows. Assume P0 is built from the empirical dis-tribution function of the observed data; that is, we use lnð�Þ as the distribution of X under the model P0. If we knew the spe-cific value c ¼ EtrueðhðXÞÞ, it would make sense to choose d � RnðcÞ to solve the BDR problem, because the worst-casedistribution, Pn

þ, of the corresponding empirical BDR problem, namely

maxX

x2 X1;:::;Xnf gvn xð Þh xð Þ

s:t: D vn �;wð Þkln �ð Þ� � � d;(27)

will yield a worst-scenario value, EþðhðXÞÞ ¼Pn

k¼1 wþk hðXkÞ � c. So, as n ! 1, the BDR formulation will result in a correct

upper bound for EtrueðhðXÞÞ.Consequently, if one chooses P0 based on the empirical distribution, it is reasonable to select d so that

P d>Rn cð Þ� ��P 2nd > v21� �

¼ :95; (28)

thus providing a 95% confidence upper bound for c. In other words, we can set d ¼ v21;:95=ð2nÞ, where v21;:95 is the 95-percent-ile of v21. This approach hinges on the availability of a sufficiently large sample size (in the same order as invoking the centrallimit theorem).

6.2. A Systematic Approach to Stress TestingIn the absence of enough data to perform a nonparametric calibration of P0 and d as suggested in the previous subsection,

we suggest using the BDR formulation as an approach to perform systematic stress testing. The idea is to understand how theBDR optimization performs as we vary d.

To motivate, consider the example of evaluating the expected shortfall or the conditional value-at-risk (C-VaR) of a givenportfolio of risk exposures. C-VaR is the conditional expected size of the deficit faced by the company, given the unlikelyevent of insufficiency of statutory solvency capital. According to Solvency II, the capital should be sufficient to withstandlosses with probability not lower than 0.995 during a 1-year time horizon (European Parliament and the Council of EuropeanUnion 2009). To stress test C-VaR, it is customary to apply arbitrary shocks into the system (i.e., stressing the system byassuming that extreme events occur). However, such shocks are typically selected in an arbitrary way. Moreover, in the pres-ence of multiple shocks affecting different risk factors, it might be difficult to argue that a particular combination of shocks ismore reasonable than an alternative combination of shocks.

In contrast, the approach that we study here can be used to stress test more systematically. Suppose that an insurance com-pany has a given baseline model, P0, which has been calibrated using a combination of past observations and expert

50 J. BLANCHET ET AL.

Page 20: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

knowledge. The model P0 is used to compute a given risk measure (or performance measure), for example, C-VaR.Periodically, either by internal procedures or due to regulatory constraints, the company is requested to perform stress testing.The result of these stress tests is the determination of capital requirement, which might typically be higher than the C-VaRobtained under P0.

Let us assume that such a stress-testing procedure has occurred multiple times in the company—for example, n times—using the customary approach described before. So the company has built certain experience of the increase in the capitalrequirement relative to the C-VaR determined by the stress-testing procedure. Based on this experience, one could calibratevalues d1; :::; dn so that the solution to the BDR formulation matches the increased capital requirement determined during the nsessions of stress testing. Ultimately, by choosing a single d appropriately to embed all these experiences, the BDR formulationautomatically takes care of inducing the shocks that can potentially cause the highest damage.

To demonstrate in a simple setting how the above could work, suppose that the shocks used in stress-testing before areintroduced by changing the parameters in P0, which is assumed to have a density lying in a parametric family, for example fh0 ,where h0 ¼ ðh10; :::; hm0 Þ denotes a set of m baseline parameters. In the past stress-testing, the company changes the value ofsome hj0 to hj1. Now, note that the KL divergence between fh and fh0 , for any h within a neighborhood of h0 in the parametricspace, is ð

logfh xð Þfh0 xð Þ fh xð Þ dx ¼ Eh log fh Xð Þ� log fh0 Xð Þ� �

;

where Ehð�Þ denotes the expectation under fh. By a Taylor series expansion, the above is approximately (see, e.g., Arampatziset al. 2015a,b)

Eh log fh Xð Þ � log fh Xð Þ þ r log fh Xð Þ h0 � hð Þ þ 12

h0 � hð Þ0r2 log fh Xð Þ h0 � hð Þ� �� �

¼ 12

h0�hð Þ0I h0�hð Þ;

where r log fh and r2 log fh denote the gradient and Hessian of log fh, and I ¼ �Ehðr2 log fhðXÞÞ is the Fisher informationmatrix. The last equality follows from the fact that the expected value of the so-called score function r log fh, that is,Ehðr log fhðXÞÞ, is 0. With this expression, suppose that, in a simple setting, the company in a past period changes hj0 to hj1with hj1�hj0 ¼ dj. This means that by selecting d ¼ ð1=2ÞI jðdjÞ2, where I j is the jth diagonal entry of I , the worst scenariovalue of our BDR formulation gives a valid bound on the changed performance measure of interest from such a parametricstress test. Suppose that this has been done many times, so that dj, j ¼ 1; :::;m, now denote the maximum perturbation differ-ence used for each of the m parameters. Then choosing d ¼ maxj¼1;:::;mð1=2ÞI jðdjÞ2 will give a robust stress test result thatincorporates all past experience.

7. SIMULATION-BASED SOLUTION PROCEDUREThe optimization problems that we consider in the previous sections may sometimes be difficult to solve, in the sense that

an optimal model Pþ is not expressible in closed form, especially in the context of general continuous distributions. Forexample, X may comprise a complicated function or the convolution of many elementary variables. In this situation, one canresort to Monte Carlo simulation to approximate the optimization problem.

This approach is as follows. Assume that P0 is not analytically tractable or that we do not have access to a closed-formexpression of a density of X under P0, but we can simulate i.i.d. sample X1; :::;Xn of X from P0. Say we are interested in usingBDR formulation (1) to bound EtrueðhðXÞÞ. Using the re-expression (15), an empirical version of (15) takes the form given in(27), which we write explicitly here as

maxXni¼1

h Xið Þwi

s:t:Xni ¼1

wi log nwið Þ � d;Xni¼1

wi ¼ 1; wi � 0 for all 1 � i � n:

(29)

ROBUST ACTUARIAL RISK ANALYSIS 51

Page 21: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

We then proceed to solve (29) using the tools for the discrete mass case in Section 3.1. Formulation (29) can be viewed as aso-called sample average approximation (SAA) (see Shapiro et al. 2009; Kleywegt et al. 2002), whose general idea is toreplace expected value with empirical average in an optimization problem. Formulation (29) bears some differences with con-ventional SAA because its decision variable also changes with the Monte Carlo sample (depending on where the support ofthe sample is). Glasserman and Xu (2014), and Glasserman and Yang (2018) use this approach to quantify model risk in somefinancial contexts.

7.1. Simulation-Based BDR Formulation: t-Copula Baseline ModelWe illustrate the above simulation-based approach. Connecting with the example in Section 5.1, suppose now that the loss

is given by L ¼ X1 þ � � � þ Xd, where ðX1; :::;XdÞ has Gaussian marginal distributions Xi Nðli; r2i Þ and the dependency ismodeled by a t-copula. A t-copula is a multivariate distribution denoted by

Ct�;. u1; :::; udð Þ ¼ t�;. t�1

� u1ð Þ; :::; t�1� udð Þ

� ; u1; :::; udð Þ 2 0; 1ð Þd;

where t�;. is the joint distribution function of a d-dimensional t-distribution with degree of freedom �, mean 0, and dispersion(or scale) matrix . ¼ ðqijÞ which is positive definite, and t�1

� is the quantile function of a standard univariate t distribution withdegree of freedom �. Then the distribution function of ðX1; :::;XdÞ is

f x1; :::; xdð Þ ¼ Ct�;. Ul1;r

21x1ð Þ; :::;Uld ;r

2dxdð Þ� �

;

where Ul;r2ð�Þ denotes the distribution function of Nðl; r2Þ.It is difficult to evaluate PðL>bÞ in closed form, which motivates the use of Monte Carlo simulation. An unbiased estimate

of P0ðL>bÞ can be obtained by outputting L ¼ U�1l1;r

21ðt�ðZ1ÞÞ þ � � � þ U�1

ld ;r2dðt�ðZdÞÞ, where ðZ1; :::; ZdÞ is drawn from the

multivariate t distribution t�;.. By repeating this n times we get L1; :::; Ln.To solve the empirical BDR formulation, we use the sampled L1; :::; Ln and combine with our results in Section 5 as fol-

lows. We first check whether log ðn=jfj : Lj>bgjÞ � d, where jfj : Lj>bgj is the cardinality of the set fj : Lj>bg. If this is thecase, we let

wþi ¼

1j j : Lj > bf gj for i such that Li>b

0 for i such that Li � b:

8><>:This is the approximate worst-case distribution for L, which gives a corresponding approximate worst scenario value 1.

FIGURE 11. Approximate Worst-case Density under d ¼ 0:1.

52 J. BLANCHET ET AL.

Page 22: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

Otherwise, if log ðn=jfj : Lj>bgjÞ>d, then output

wi ¼exp hþð Þ

exp hþð Þj i : Li > bf gj þ j i : Li � bf gj for i such that Li>b

1exp hþð Þj i : Li > bf gj þ j i : Li � bf gj for i such that Li � b;

8>>><>>>:where hþ>0 satisfies

hþ exp hþð Þj i : Li>bf gjexp hþð Þj i : Li > bf gj þ j i : Li � bf gj� log

1n

exp hþð Þj i : Li > bf gj þ j i : Li � bf gj� �� �¼ d:

The probability weights ðwiÞi¼1;:::;n on ðLiÞi¼1;:::;n form an approximation for a worst-case distribution for L, and

exp hþð Þj i : Li>bf gjexp hþð Þj i : Li > bf gj þ j i : Li � bf gj

is the approximate worst scenario value PþðL>bÞ.To illustrate numerically, we consider d¼ 5 and ðX1; :::;X5Þ each following a Gaussian marginal distribution with l1 ¼

2:20; l2 ¼ 2:73; l3 ¼ 2:73; l4 ¼ 2:42; l5 ¼ 2:27 (five numbers generated uniformly between 2 and 3) and r1 ¼ 0:92; r2 ¼0:39; r3 ¼ 0:11; r4 ¼ 0:56; r5 ¼ 0:33 (five numbers generated uniformly between 0 and 1), respectively. We use a t-copulawith a degree of freedom 10 and the following dispersion matrix:

R ¼

8:19 �0:92 �3:54 3:35 �3:96

�0:92 6:09 1:07 1:37 �0:08

�3:54 1:07 7:10 1:35 �1:70

3:35 1:37 1:35 3:14 �3:73

�3:96 �0:08 �1:70 �3:73 8:73

26666664

37777775;

which is generated from a Wishart distribution with 5 degrees of freedom and an identity scale matrix. We use n¼ 1,000 togenerate the baseline sample from the t-copula model. Setting b¼ 10, the estimated loss probability is 0.232 with 95% confi-dence interval ½:206; :258�. The histograms of the optimally weighted sample at d ¼ 0:1 and the baseline sample are plotted inFigures 11 and 12. As can be seen, more weights are put on the right side of b, in a uniform manner, to boost the large lossprobability. We also plot the worst-case probability against d in Figure 13, which, similar to the example in Section 5.1, showsa concavely increasing pattern.

FIGURE 12. Approximate Original Density.

ROBUST ACTUARIAL RISK ANALYSIS 53

Page 23: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

8. ROBUST CONDITIONAL VALUE AT RISKThis section demonstrates how our distributionally robust analysis can be further developed to handle risk-analytic problems

involving optimization. For this discussion, we focus on the problem of computing, for example, the 95% C-VaR of a randomvariable L that represents potential losses in a given year, i.e.,

C�VaRtrue að Þ ¼ Etrue L�bjL>bð Þ;

where PtrueðL>bÞ ¼ 1�a with a ¼ :95.It is well known (Rockafellar and Uryasev 2000) that if L has a continuous distribution under Ptrue, then

C�VaRtrue að Þ ¼ minh

hþ Etrue max L�h; 0ð Þð Þ1� a

� �:

To robustify this calculation, we extend the BDR formulation discussed earlier in a natural way, to solve an upper and a lowerbound for C-VaRtrueðaÞ in the form

min=maxD PjjP0ð Þ�d

minh

hþ E max L�h; 0ð Þð Þ1� a

� �; (30)

where the outer optimization problems are taken over probability models P.Problem (30) is challenging to solve in closed form, so it is natural to consider SAA along the discussion in Section 7. To

approximate

C�VaR0 að Þ ¼ minh

hþ E0 max L�h; 0ð Þð Þ1� a

� �for a given baseline model P0, we first simulate i.i.d. copies L1; :::; Ln under P0 and then solve

dC�VaR0 a; nð Þ ¼ minh

hþ 1n

Xni¼1

max Li�h; 0ð Þ1� a

!: (31)

One property of this estimate is, from the theory of SAA, that under mild assumptions (e.g, if L has a continuous densityunder P0),

dC�VaR0 a; nð Þ�C�VaR0 að Þ þ br h�0 nð Þ� �n1=2

Z; (32)

FIGURE 13. Worst-case Probability Against d.

54 J. BLANCHET ET AL.

Page 24: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

where Z is a standard Gaussian random variable,

br2 h�0 nð Þ� � ¼ 1n� 1

Xnj¼1

h�0 nð Þ þmax Li�h�0 nð Þ; 0� �1� a

� dC�VaR0 a; nð Þ� �2

;

and h�0ðnÞ is the solution obtained by solving (31). In simple words, br2ðh�0ðnÞÞ is the empirical estimator of the variance of theobjective function evaluated at the estimated optimal solution in (31).

Approximating (30) involves adding an outer optimization problem. That is, we still keep L1; :::; Ln simulated under P0, butwe consider

min=maxw1;:::;wn

minh

hþXni¼1

wimax Li�h; 0ð Þ

1� a

!s:t:Xn

i¼1

wi log nwið Þ � d

Xni¼1

wi ¼ 1; with wi � 0 for all 1 � i � n:

(33)

The above discussion is similar to the BDR settings discussed before. However, there are also some differences. One ofthese is that calibrating a suitable d from the nonparametric, data-driven view in Section 6.1 needs some modification in theC-VaR setting. It turns out that, with i.i.d. data of size n, d could be selected so that Pð2nd > v22Þ ¼ 1�b, if we want the opti-mization output to bound the true C-VaR with ð1�bÞ confidence. Note that the degree of freedom of the chi-square distribu-tion has increased from one to two compared with the discussion in Section 6. The reason for this change is the appearance ofthe inner minimization in problem (33), which introduces another equality constraint in the empirical likelihood derivation out-lined in the Appendix, in order to capture the requirement that the derivative of the objective function with respect to h shouldvanish. This derivation is similar to that given in Section 6, and further details can be found in Lam and Zhou (2015, 2017).Moreover, the calibration can be further tightened; see, Duchi et al. (2016).

Another point to note is that, while the outer maximization formulation in (33) is a convex program, the minimization oneis not. Heuristic procedures such as multi-start or alternating minimization should be employed in the later case.

8.1. Numerical Example for C-VaR EstimationWe consider the problem of estimating C-VaR in which we assume that the loss L follows a standard normal distribution

under P0. We set a ¼ 0:9 and generate n¼ 1,000 observations. Figure 14 shows the upper and lower robust bounds from (33)against different values of d, showing how the width of the robust interval widens at a decreasing rate as d increases.

Moreover, we also test the performance of the 95% (i.e., b ¼ 0:05) confidence bounds using (33), by selecting d such thatPð2nd>v22Þ ¼ 1�b. We carry out the cases n¼ 50 and n¼ 100. Table 1 reports the point estimate of the coverage probability,

FIGURE 14. Worst-case C-VaR Against d.

ROBUST ACTUARIAL RISK ANALYSIS 55

Page 25: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

mean lower and upper bounds, and the mean and standard deviation of the interval width for empirical likelihood, while Table2 shows the results using the classical SAA theory via (32). These quantities are obtained from repeating the experiments mul-tiple times, each time generating a new data set. The coverage probability, for instance, outputs the proportion of repetitions inwhich the constructed interval covers the truth. We see that empirical likelihood gives a higher and more accurate coverage,though the SAA counterpart gives tighter and less varied interval width, at least for this example we conducted. This demon-strates some potential benefits of using our BDR formulation with empirical likelihood calibration over classical SAA, in gen-erating intervals that cover the truth more frequently and hence are more robust. On the downside, however, these more robustintervals pay the price of being generally wider. Note that we have chosen d that is calibrated using the v22-distribution here, achoice that could possibly be further refined. Nonetheless, we demonstrate that with this choice, for a user who is risk-averse,the BDR approach with empirical likelihood calibration could be preferable over SAA.

9. ADDITIONAL CONSIDERATIONSWe discuss two alternatives that can be used for robust performance analysis. The first one involves the use of moment con-

straints, and the second one involving different notions of discrepancy.

9.1. Robust Performance Analysis via Moment ConstraintsIn some situations, it may not be possible to construct an explicit baseline distribution P0. Alternatively, if information on

moments is available, we might consider worst-case optimization under such information, in the form

max E h Xð Þð Þs:t: E vi Xð Þð Þ � ai; i ¼ 1; :::; s

E vi Xð Þð Þ ¼ ai; i ¼ sþ 1; :::;m;(34)

where the maximization is over all probability models P (we focus on maximization here to avoid redundancy). This is ageneral formulation that has m moment constraints, and við�Þ can represent any function. For instance, for momentconstraints involving means and variances, we can select v1ðxÞ ¼ x, v2ðxÞ ¼ �x; v3ðxÞ ¼ x2, v4ðxÞ ¼ �x2, anda1 ¼ �l; a2 ¼ �l; a3 ¼ �r; a4 ¼ �r, and all constraints could be inequalities. There is a general procedure for solving theseproblems that builds on linear programming. The most tricky part involves finding the support of the distribution. Observethat, if the support of the worst-case distribution is known, then problem (34) is just a problem with a linear objective functionand linear constraints, and is solvable using standard routines.

Finding the support involves a sequential search. More precisely, the procedure for solving (34) is shown in Algorithm 1(which borrows from, e.g., Birge and Dul�a 1991).

TABLE 1Statistical Performances of Empirical Likelihood for Different Sample Sizes

n Coverage Mean lower Mean upper Mean interval Standard deviationprobability bound bound width of interval width

50 0.90 1.22 2.33 1.11 0.43100 0.94 1.32 2.26 0.94 0.26

TABLE 2Statistical Performances of Standard Confidence Interval of SAA for Different Sample Sizes

n Coverage Mean lower Mean upper Mean interval Standard deviationprobability bound bound width of interval width

50 0.86 1.21 2.26 1.05 0.47100 0.84 1.34 2.05 0.71 0.21

56 J. BLANCHET ET AL.

Page 26: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

Algorithm 1 Generalized linear programming procedure for solving (34)

Initialization: An arbitrary probability distribution on the support fx1; :::; xLg, where l � mþ 1, that lies in the feasible regionin (34). Set s ¼ l.Procedure: For each iteration k ¼ 1; 2; :::, given fx1; :::; xsg

1. Master problem solution: Solve

maxXsj¼1

h xjð Þpj

s:t:Xsj¼1

vi xjð Þpj � ai; i ¼ 1; :::; s

Xsj¼1

vi xjð Þpj ¼ ai; i ¼ sþ 1; :::;m

Xsj¼1

pj ¼ 1

pj � 0; j ¼ 1; :::; s

Let fpk1; :::; pksg be the optimal solution. Find the dual multipliers fhk; pk1; :::; pkmg that satisfy

hk þXmi¼1

pki vi xjð Þ ¼ h xjð Þ; if pj>0; j ¼ 1; :::; s

hk þXmi¼1

pki vi xjð Þ � h xjð Þ; if pj ¼ 0; j ¼ 1; :::; s

pki � 0; i ¼ 1; :::; s

2. Subproblem solution: Find xsþ1 that maximizes

q x; hk; pk1; :::; pkm

� ¼ h xð Þ�hk�

Xmi¼1

pki vi xð Þ

If qðxsþ1; hk; pk1; :::; p

kmÞ>0, then let s ¼ sþ 1; otherwise, stop the procedure, and fx1; :::; xsg are the optimal support points,

with fpk1; :::; pksg the associated weights.After the last iteration, output

Xsj¼1

h xjð Þpkj

We discuss Algorithm 1 in the following several aspects:

1. Interpretation: The output of the procedure is an exact optimal value of (34). The worst-case probability distribution is afinite-support discrete distribution on fx1; :::; xsg with weights fpk1; :::; pksg obtained in the last iteration.

2. Comparison with the BDR formulation: Unlike the BDR formulation, (34) does not have a baseline input distribution tobegin with.

3. Computational efficiency: Step 1 in each iteration of Algorithm 1 can be carried out by standard linear programmingsolver, which can output both the optimal fpjg and the dual multipliers fhk; pk1; :::; pkmg. Step 2 is a one-dimensional linesearch if X is one dimensional.

4. Minimization counterpart: For a minimization problem, simply replace h with – h in the whole procedure of Algorithm1, except in the last step we still output

Psj¼1 hðxjÞpkj .

ROBUST ACTUARIAL RISK ANALYSIS 57

Page 27: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

9.2. Renyi Divergence as Discrepancy NotionWhile we have presented our approach based on the KL divergence or relative entropy, other distance notions can be used.

For example, we could consider the BDR formulation using the so-called Renyi divergence of degree a>1, defined via

Da PjjP0ð Þ ¼ 1a� 1

logE0dPdP0

� �a !

:

As a ! 1, we recover the KL divergence or relative entropy, i.e.,

Da PjjP0ð Þ ! D PjjP0ð Þ ¼ E logdPdP0

� �� �:

The corresponding distributionally robust formulation takes the form

min=max E h Xð Þð Þ s:t: Da PjjP0ð Þ � d;

and the optimal solution is obtained roughly as follows. First, given h1; h2>0, define

Zþ h1; h2ð Þ ¼ max h1 þ h2h Xð Þ; 0� �1= 1�að Þ

;

with h1, h2 chosen so that

E0 Zþ h1; h2ð Þð Þ ¼ 1; E0 Zþ h1; h2ð Þa� � ¼ exp d a� 1ð Þð Þ;

0.0

0.1

0.2

0.3

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Prob

abilit

y KL:Optimal(Best−case)KL:Optimal(Worst−case)Original

RY:Optimal(Best−case)RY:Optimal(Worst−case)

FIGURE 15. Robust Estimates Against d Using Renyi and KL Divergences.

58 J. BLANCHET ET AL.

Page 28: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

which comes from KKT conditions analogous to the one in Section 3.1. Then, the worst-case distribution is defined such that,for any set A,

Pþ X 2 Að Þ ¼ E0 Zþ h1; h2ð ÞI X 2 Að Þ� �:

See, for example, Blanchet and Murthy (2016a) and Atar et al. (2015) for related derivations.The use of Renyi divergence typically leads to less conservative estimates (as we discuss momentarily in the next subsec-

tion). Next, we consider rare-event estimation to develop more intuition and contrast the effect of using Renyi divergencefrom KL.

9.2.1. Robust Rare-Event Analysis via Renyi DivergenceWe consider the case in which hðXÞ ¼ IðX 2 BÞ, so that we have

Pþ X 2 Bð Þ ¼ h1 þ h2ð Þ1= 1�að ÞP0 X 2 Bð Þ;

and

E0 Zþ h1; h2ð Þa� � ¼ ha= 1�að Þ1 P0 X 62 Bð Þ þ h1 þ h2ð Þa= 1�að Þ

P0 X 2 Bð Þ ¼ exp d a�1ð Þð Þ;E0 Zþ h1; h2ð Þð Þ ¼ h1= 1�að Þ

1 P0 X 62 Bð Þ þ h1 þ h2ð Þ1= 1�að ÞP0 X 2 Bð Þ ¼ 1:

Letting ðh1 þ h2Þa=ð1�aÞ ¼ g=P0ðX 2 BÞ, and substituting in the previous display, we have

ha= 1�að Þ1 1�P0 X 2 Bð Þð Þ þ g ¼ exp d a�1ð Þð Þ;

h1= 1�að Þ1 1�P0 X 2 Bð Þð Þ þ g1=aP0 X 2 Bð Þ1�1=a ¼ 1:

As P0ðX 2 BÞ ! 0, because a>1, we can select h1 � 1 and ðh1 þ h2Þ1=ð1�aÞ � g1=a=P0ðX 2 BÞ1=a, with g� exp ðdða�1ÞÞ�1,concluding that

Pþ X 2 Bð Þ� exp d a�1ð Þð Þ�1� �1=a

P0 X 2 Bð Þ1�1=a:

Let us revisit the example discussed at the end of Section 5. Assume that X is exponentially distributed with mean one underP0. Select B ¼ ½b;1Þ and a¼ 2. Then we have

Pþ X>bð Þ� exp dð Þ�1� �1=2

exp �b=2ð Þ:

In contrast, when using the KL divergence for the case of exponentially distributed X, we obtain a much lower rate ofdecay (of the form d=b for b large; see (23)). In this sense, KL divergence induces a much more cautious robust method-ology than Renyi divergence. To illustrate this last point, we compare the BDR formulations using KL and Renyi diver-gences with a¼ 2. We use hðXÞ ¼ IðX>bÞ and an exponential baseline with rate 1, where we set b to be the 99thpercentile of this baseline. We solve the BDR formulations using the technique in Section 5 and the discussion above.Figure 15 shows that the bounds obtained from the Renyi divergence are tighter than KL in this small probability estima-tion context. This is noticeable especially for the upper bounds where the KL divergence gives values more than doublein magnitude than Renyi as d increases.

ROBUST ACTUARIAL RISK ANALYSIS 59

Page 29: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

10. CONCLUSIONSWe have discussed a systematic approach to quantify potential model errors based on the BDR formulation, which is a con-

vex optimization problem in the space of probability distributions. This approach roots in areas including economics, statistics,and operations research, and aims to compute worst-case bounds on the performance or risk measure when the model deviatesfrom a baseline model in the KL sense. We have demonstrated several properties and advantages of this approach, includingits nonparametric nature, generality in applying to various situations, and computational tractability.

Building upon the BDR formulation, we have demonstrated how to extend our approach to applications that are relevant to actuar-ial practice, including studying the impact of dependence in multivariate models, the impact of model uncertainty in rare-event estima-tion and in computing risk-analytic measures that involve optimization such as C-VaR. We have also discussed methods to calibratethe feasible region size in BDR formulations, from the viewpoint of empirical likelihood and stress testing. Lastly, we have investi-gated additional ways to define the model uncertainty region based on Renyi divergence or imposition of moment constraints.

ACKNOWLEDGMENTSThe authors would like to thank the Society of Actuaries for its permission to publish some materials excerpted from our

project reports (Blanchet et al. 2006a,b). We would like to thank the two referees for their careful reading and help-ful feedback.

FUNDINGThis project was supported by the Society of Actuaries research grant entitled “Modeling and Analyzing Extreme Risks

in Insurance.”

REFERENCESArampatzis, G., M. A. Katsoulakis, and Y. Pantazis. 2015a. Accelerated sensitivity analysis in high-dimensional stochastic reaction networks. PloS one 10 (7): e0130825.Arampatzis, G., M. A. Katsoulakis, and Y. Pantazis (2015b). Pathwise sensitivity analysis in transient regimes. In Stochastic Equations for Complex

Systems, ed. S. Heinz and H. Bessaih, pp. 105–124. Cham: Springer.Atar, R., K. Chowdhary, and P. Dupuis. 2015. Robust bounds on risk-sensitive functionals via R�enyi divergence. SIAM/ASA Journal on Uncertainty

Quantification 3 (1): 18–33.Ben-Tal, A., D. Den Hertog, A. De Waegenaere, B. Melenberg, and G. Rennen. 2013. Robust solutions of optimization problems affected by uncertain prob-

abilities. Management Science 59 (2): 341–357.Bertsimas, D., V. Gupta, and N. Kallus. 2018. Robust sample average approximation. Mathematical Programming 171 (1–2): 217–282.Bertsimas, D., and I. Popescu. 2005. Optimal inequalities in probability theory: A convex optimization approach. SIAM Journal on Optimization 15 (3): 780–804.Birge, J. R., and J. H. Dul�a. 1991. Bounding separable recourse functions with limited distribution information. Annals of Operations Research 30 (1): 277–298.Blanchet, J., C. Dolan, and H. Lam. 2014. Robust rare-event performance analysis with natural non-convex constraints. In Proceedings of the 2014 Winter

Simulation Conference, pp. 595–603. Piscataway, NJ: IEEE Press.Blanchet, J., and Y. Kang. 2016. Sample out-of-sample inference based on Wasserstein distance. arXiv preprint arXiv:1605.01340.Blanchet, J., H. Lam, Q. Tang, and Z. Yuan. 2016a. Applied robust performance analysis for actuarial applications. Society of Actuaries. https://www.soa.

org/research-reports/2016/research-applied-robust-performance.Blanchet, J., H. Lam, Q. Tang, and Z. Yuan. 2016b. Applied robust performance analysis for actuarial applications: A guide. Society of Actuaries. https://

www.soa.org/research-reports/2016/research-applied-robust-performance.Blanchet, J., and K. R. Murthy. 2016a. On distributionally robust extreme value analysis. arXiv preprint arXiv:1601.06858.Blanchet, J., and K. R. Murthy. 2016b. Quantifying distributional model risk via optimal transport. arXiv preprint arXiv:1604.01446.Boyd, S., and L. Vandenberghe. 2004. Convex optimization. New York: Cambridge University Press.Breuer, T., and I. Csisz�ar. 2016. Measuring distribution model risk. Mathematical Finance 26 (2): 395–411.Cover, T. M., and J. A. Thomas. 2012. Elements of information theory. Hoboken, NJ: John Wiley & Sons.Cox, D. R., and D. V. Hinkley. 1979. Theoretical statistics. Boca Raton, FL: CRC Press.Cox, S. H., Y. Lin, R. Tian, and L. F. Zuluaga. 2013. Mortality portfolio risk management. Journal of Risk and Insurance 80 (4): 853–890.Delage, E., and Y. Ye. 2010. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research

58 (3): 595–612.Deming, W. E., and F. F. Stephan. 1940. On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. The

Annals of Mathematical Statistics 11 (4): 427–444.Dey, S., and S. Juneja. 2012. Incorporating fat tails in financial models using entropic divergence measures. arXiv preprint arXiv:1203.0643.Dhara, A., B. Das, and K. Natarajan. 2017. Worst-case expected shortfall with univariate and bivariate marginals. arXiv preprint arXiv:1701.04167.Duchi, J., P. Glynn, and H. Namkoong. 2016. Statistics of robust optimization: A generalized empirical likelihood approach. arXiv preprint arXiv:1610.03425.Embrechts, P. and G. Puccetti. 2006. Bounds for functions of multivariate risks. Journal of Multivariate Analysis 97 (2): 526–547.Embrechts, P., G. Puccetti, and L. R€uschendorf. 2013. Model uncertainty and VaR aggregation. Journal of Banking & Finance 37 (8): 2750–2764.Esfahani, P. M. and D. Kuhn. 2018. Data-driven distributionally robust optimization using the Wasserstein metric: Performance guarantees and tractable

reformulations. Mathematical Programming 171 (1–2): 115–166.

60 J. BLANCHET ET AL.

Page 30: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

European Parliament and the Council of European Union. 2009. Directive 2009/138/ec of the European Parliament and of the Council of 25 November 2009on the taking-up and pursuit of the business of Insurance and Reinsurance (Solvency II). http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri¼OJ:L:2009:335:0001:0155:en:PDF.

Glasserman, P., and X. Xu. 2013. Robust portfolio control with stochastic factor dynamics. Operations Research 61 (4): 874–893.Glasserman, P., and X. Xu. 2014. Robust risk measurement and model risk. Quantitative Finance 14 (1): 29–58.Glasserman, P., and L. Yang. 2018. Bounding wrong-way risk in CVA calculation. Mathematical Finance 28 (1): 268–305.Goh, J., and M. Sim. 2010. Distributionally robust optimization and its tractable approximations. Operations Research 58 (4-part-1): 902–917.Gotoh, J.-Y., M. J. Kim, and A. Lim. 2018. Robust empirical optimization is almost the same as mean-variance optimization. Operations Research 46 (4):

448–452.Gupta, V. 2015. Near-optimal ambiguity sets for distributionally robust optimization. Available at http://www.optimization-online.org/DB_FILE/2015/07/

4983.pdfHanasusanto, G. A., V. Roitch, D. Kuhn, and W. Wiesemann. 2017. Ambiguous joint chance constraints under mean and dispersion information. Operations

Research 65 (3): 751–767.Hansen, L. P., and T. J. Sargent. 2008. Robustness. Princeton University Press, Princeton, NJ.Hu, Z. and L. J. Hong. 2013. Kullback-Leibler divergence constrained distributionally robust optimization. Optimization Online. Available at http://www.

optimization-online.org/DB_FILE/2012/11/3677.pdfInternal Revenue Service. 2016. Updated static mortality tables for defined benefit pension plans for 2016. https://www.irs.gov/pub/irs-drop/n-15-53.pdf.Iyengar, G. N. 2005. Robust dynamic programming. Mathematics of Operations Research 30 (2): 257–280.Jiang, R. and Y. Guan. 2016. Data-driven chance constrained stochastic program. Mathematical Programming 158 (1–2): 291–327.Kleywegt, A. J., A. Shapiro, and T. Homem-de Mello. 2002. The sample average approximation method for stochastic discrete optimization. SIAM Journal

on Optimization 12 (2): 479–502.Kullback, S. and R. A. Leibler. 1951. On information and sufficiency. The Annals of Mathematical Statistics 22 (1): 79–86.Lam, H. (2016a). Recovering best statistical guarantees via the empirical divergence-based distributionally robust optimization. arXiv preprint arXiv:

1605.09349. Operations Research, forthcoming.Lam, H. 2016b. Robust sensitivity analysis for stochastic systems. Mathematics of Operations Research 41 (4): 1248–1275.Lam, H. 2018. Sensitivity to serial dependency of input processes: A robust approach. Management Science 64 (3): 1311–1327.Lam, H. and C. Mottet. 2017. Tail analysis without parametric models: A worst-case perspective. Operations Research 65 (6): 1696–1711.Lam, H. and E. Zhou. 2015. Quantifying uncertainty in sample average approximation. In Proceedings of the 2015 Winter Simulation Conference, pp.

3846–3857. Piscataway, NJ: IEEE Press.Lam, H. and E. Zhou. 2017. The empirical likelihood approach to quantifying uncertainty in sample average approximation. Operations Research Letters 45

(4): 301–307.Li, B., R. Jiang, and J. L. Mathieu. 2017. Ambiguous risk constraints with moment and unimodality information. Mathematical Programming. Available at

https://doi.org/10.1007/s10107-017-1212-xLove, D. and G. Bayraksan. 2015. Phi-divergence constrained ambiguous stochastic programs for data-driven optimization. Technical report, The Ohio State

University, Columbus, OH. Available at http://www.optimization-online.org/DB_FILE/2016/03/5350.pdf.Mottet, C. and H. Lam, 2017. On optimization over tail distributions. arXiv preprint arXiv:1711.00573.Natarajan, K., D. Pachamanova, and M. Sim. 2008. Incorporating asymmetric distributional information in robust value-at-risk optimization. Management

Science 54 (3): 573–585.Nguyen, X., M. J. Wainwright, and M. I. Jordan. 2010. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE

Transactions on Information Theory 56 (11): 5847–5861.Nilim, A. and L. El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices. Operations Research 53 (5): 780–798.Owen, A. B. 2001. Empirical likelihood. New York: CRC Press.Petersen, I. R., M. R. James, and P. Dupuis 2000. Minimax optimal control of stochastic uncertain systems with relative entropy constraints. IEEE

Transactions on Automatic Control 45 (3): 398–412.P�oczos, B. and J. Schneider. 2011. On the estimation of alpha-divergences. Proceedings of the Fourteenth International Conference on Artificial Intelligence

and Statistics, PMLR 15:609-617, 2011, Fort Lauderdale, FL.Popescu, I. 2005. A semidefinite programming approach to optimal-moment bounds for convex classes of distributions. Mathematics of Operations Research

30 (3): 632–657.Puccetti, G. and L. R€uschendorf. 2012. Computation of sharp bounds on the distribution of a function of dependent risks. Journal of Computational and

Applied Mathematics 236(7): 1833–1840.Puccetti, G. and L. R€uschendorf. 2013. Sharp bounds for sums of dependent risks. Journal of Applied Probability 50 (1): 42–53.Rockafellar, R. T. and S. Uryasev. 2000. Optimization of conditional value-at-risk. Journal of Risk 2: 21–42.Scarf, H., K. Arrow, and S. Karlin. 1958. A min-max solution of an inventory problem. Studies in the Mathematical Theory of Inventory and Production 10:

201–209.Shapiro, A., D. Dentcheva, and A. Ruszczynski. 2009. Lectures on stochastic programming: Modeling and Theory, Volume 16. Philadelphia, PA: SIAM.Smith, J. E. 1995. Generalized Chebychev inequalities: theory and applications in decision analysis. Operations Research 43 (5): 807–825.Van Parys, B. P. G., P. J. Goulart, and M. Morari. 2017. Distributionally robust expectation inequalities for structured distributions. Optimization Online.

Available at https://doi.org/10.1007/s10107-017-1220-xWang, B. and R. Wang. 2011. The complete mixability and convex minimization problems with monotone marginal densities. Journal of Multivariate

Analysis 102 (10): 1344–1360.Wang, Z., P. W. Glynn, and Y. Ye. 2015. Likelihood robust optimization for data-driven problems. Computational Management Science 13 (2): 241–261.Wiesemann, W., D. Kuhn, and M. Sim. 2014. Distributionally robust convex optimization. Operations Research 62 (6): 1358–1376.

Discussions on this article can be submitted until October 1, 2019. The authors reserve the right to reply to any discussion.Please see the Instructions for Authors found online at http://www.tandfonline.com/uaaj for submission instructions.

ROBUST ACTUARIAL RISK ANALYSIS 61

Page 31: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

APPENDIX. APPROXIMATING DISTRIBUTION FOR THE SIZE OF THE FEASIBLE REGIONSimilar to the use of the KKT conditions for the solution of the BDR problem formulation, we obtain that the optimal

solution satisfies

wi hð Þ ¼ exp hh Xið Þð ÞPnj¼1 exp hh Xjð Þ� � ;

where h satisfiesPn

i¼1 wiðhÞðhðXiÞ�cÞ ¼ 0, namely,Pni¼1 exp hh Xið Þð Þ h Xið Þ�cð ÞPn

j¼1 exp hh Xjð Þ� � ¼ 0: (35)

Now suppose that indeed EtrueðhðXÞ�cÞ ¼ 0, and let us consider �hðxÞ ¼ hðxÞ�c. We can rewrite (35)—after multiplyingbyPn

j¼1 exp ðhhðXjÞ�chÞ=n1=2)—as

1n1=2

Xni¼1

exp h�h Xið Þ� ��h Xið Þ ¼ 0:

Then let h ¼ g=n1=2 and perform a Taylor expansion to conclude that

1n1=2

Xni¼1

�h Xið Þ 1þ g�h Xið Þn1=2

þ � � �� �

¼ 0: (36)

Recall that the central limit theorem states the approximation in distribution

1n1=2

Xni¼1

�h Xið Þ�rZ;

where r2 ¼ VartrueðhðXÞÞ, and Z is standard Gaussian. Therefore, solving for g in (36) and ignoring lower-order error terms,we obtain that

g�� 1n1=2

Pni¼1

�h Xið Þ1=nð ÞPn

i¼1�h Xið Þ2 � Z

r:

Now consider RnðcÞ, which can be written as

Rn cð Þ ¼Xni¼1

wi hð Þ log exp h�h Xið Þ� �n�1

Pnj¼1 exp h�h Xjð Þ

� � !

¼ hXni¼1

wi hð Þ�h Xið Þ� log1n

Xni¼1

exp h�h Xið Þ� � !

¼h1n

Xn

i¼1exp h�h Xið Þ� �

�h Xið Þ1n

Xn

i¼1exp h�h Xið Þ� � � log

1n

Xni¼1

exp h�h Xið Þ� � !

¼h

1n

Xn

i¼1�h Xið Þ þ h

n

Xn

i¼1�h Xið Þ2 þ � � �

� �1þ h

n

Xn

i¼1�h Xið Þ þ h2

2n

Xn

i¼1�h Xið Þ2 þ � � �

62 J. BLANCHET ET AL.

Page 32: Robust Actuarial Risk Analysis - homepage.stat.uiowa.eduhomepage.stat.uiowa.edu/~qtang/research/2019-blty-NAAJ-j.pdfSubmit your article to this journal Article views: 129 View Crossmark

� hn

Xni¼1

�h Xið Þ þ h2

2n

Xni¼1

�h Xið Þ2 � h2

21n

Xni¼1

�h Xið Þ !2

þ � � �0@ 1A

� hn

Xni¼1

�h Xið Þ þ h2

n

Xni¼1

�h Xið Þ2

�h21n

Xni¼1

�h Xið Þ !2

� hn

Xni¼1

�h Xið Þ� h2

21n

Xni¼1

�h Xið Þ2 � 1n

Xni¼1

�h Xið Þ !2

0@ 1A

¼ h2

21n

Xni¼1

�h Xið Þ2 � 1n

Xni¼1

�h Xið Þ !2

0@ 1A

� g2r2

2n

� Z2

2n;

where the third last step follows by collecting terms up to h2. Therefore, we conclude that 2nRnðcÞ ) v21, as indicated inour discussion leading to (28).

ROBUST ACTUARIAL RISK ANALYSIS 63