CONSISTENCY OF THE SCENARIO APPROACH · Vol. 28, No. 1, pp. 135{162 CONSISTENCY OF THE SCENARIO...

SIAM J. OPTIM. c© 2018 Society for Industrial and Applied MathematicsVol. 28, No. 1, pp. 135–162

CONSISTENCY OF THE SCENARIO APPROACH∗

FEDERICO ALESSANDRO RAMPONI†

I dedicate this work to the memory of my father Alessandro Ramponi, 1945–2010Cuius nomen iam demum in meo resonat, ille rector et una mecum auctor

Abstract. This paper is meant to prove the consistency of the scenario approach a la Calafiore,Campi, and Garatti with convex constraints. Scenario convex problems are usually stated in twoequivalent forms: first, as the minimum of a linear function over the intersection of a finite randomsample of independent and identically distributed convex sets, or second, as the min-max of a finiterandom sample of independent and identically distributed convex functions. The paper shows that,under fairly general assumptions, as the size of the sample increases the minimum attained by thesolution of a problem of the first kind converges almost surely to the minimum attained by a suitablydefined “essential” robust problem (or diverges if such a robust problem is infeasible), and that theminimum attained by the solution of a scenario problem of the second kind converges almost surelyto the minimum of the pointwise essential supremum taken over all the possible convex functions (ordiverges if such essential supremum takes the only value +∞). In both cases, if the solution of theessential problem exists and is unique, the solution of the scenario problem converges to it almostsurely.

Key words. scenario approach, stochastic programming, convex programming, sample-basedoptimization, min-max optimization

AMS subject classifications. 90C15, 90C25, 90C47

DOI. 10.1137/16M109819X

1. Introduction. Robust convex optimization deals with programs of the fol-lowing form:

(1)minθ∈Θ

c>θ

subject to θ ∈ Θδ for all δ ∈ ∆,

where Θ is a closed convex subset of Rd, ∆ is an arbitrary set (possibly infinite), andto every δ ∈ ∆ there is associated a closed convex subset Θδ ⊆ Θ. Any Θδ, δ ∈ ∆,is interpreted as an additional constraint to the nominal problem minθ∈Θ c>θ, i.e.,a constraint that may appear in a practical instance but whose actual occurrence isnot known in advance; the solution of (1) is thus a safeguard against all the possibledeviations from the nominal problem.

In practical applications dealing with all the possible constraints Θδ, δ ∈ ∆, maybe overkill, and discarding a small fraction of constraints, i.e., a small subset of ∆, isoften acceptable. The set ∆ models the lack of knowledge in an optimization endeavor,and in science and engineering the natural way to model uncertainty is through prob-ability; thus, from now on, I will assume that (∆,F ,P) is a probability space, whereP describes the chance of a constraint set Θδ to occur. Moreover, (∆N ,FN ,PN ) willdenote the N -fold Cartesian product of ∆ equipped with the product σ-algebra FNand the product probability PN = P × · · · × P (N times). A point in (∆N ,FN ,PN )will thus be a sample (δ(1), . . . , δ(N)) of elements drawn independently from ∆ ac-

∗Received by the editors October 10, 2016; accepted for publication (in revised form) October 9,2017; published electronically January 11, 2018.

http://www.siam.org/journals/siopt/28-1/M109819.html†Dipartimento di Ingegneria dell’Informazione, Universita degli Studi di Brescia, 25123 Brescia,

Italy ([email protected]).

135

http://www.siam.org/journals/siopt/28-1/M109819.html

mailto:[email protected]

136 FEDERICO ALESSANDRO RAMPONI

cording to the same probability P. One possible approach to weaken problem (1) byaccepting a small portion of constraints, i.e., a subset of ∆ with small probability ε,to be violated, is the following so-called chance-constrained problem:

(2)minθ∈Θ

c>θ

subject to P [{δ ∈ ∆ : θ ∈ Θδ}] ≥ 1− ε.

Let me provide a visual explanation of the difference between robust and chance-constrained problems with a toy example.

Example 1. Suppose that ∆ = [−1, 1], and consider the problem

(3)min

(x,y)∈R2x+ y

subject to (x− δ)2 + y2 ≤ 4 for all δ ∈ ∆.

Clearly, (3) is an instance of problem (1), where θ = (x, y) and Θδ is the closed ballwith center (δ, 0) and radius 2. A pictorial view of the problem is shown in Figure 1(a),where the set ∆×{0} containing the center of each closed ball is the thick line segmentat the center of the plot. The whole point of robust programming is that each ball maybe the “true one” that will show up in reality; some balls represent favorable situations(the leftmost ones) and other bad situations, but in order to take into account all theconstraint sets, feasible points are a priori confined to their intersection: the feasibleset of problem (3) is indeed the white oval at the center of the plot, and its solutionis marked with a bullet (•). Suppose now that ∆ = [−1, 1] is equipped with a densityand that we are allowed to improve the solution discarding a subset B ⊂ ∆ with smallprobability P [B] = ε. The corresponding chance-constrained problem is

(4)min

(x,y)∈R2x+ y

subject to P[{δ ∈ ∆ : (x− δ)2 + y2 ≤ 4}

]≥ 1− ε.

The setup is shown in Figure 1(b): the feasible set has been enlarged, and the solutionhas improved a bit, at the cost of neglecting the set B ⊂ ∆ depicted in light gray.The balls Θδ with a center (δ, 0) such that δ ∈ B (partially visible on the rightwith light gray background) do not contain the solution •; the main point of chance-constrained programming is that the probability ε that one such ball pops up inreality, dooming the solution to be wrong, is small; in real-world applications this riskis often acceptable.

Chance-constrained programming is now a well-known subject in stochastic opti-mization; it has been studied systematically for the first time in the work of Prekopa(see, e.g., the seminal work [14] and the references therein; see [17, Chapter 1] for anintroduction and some motivating examples). A well-known drawback of problem (2)is that it is usually hard to solve, because its feasible set is not necessarily convex de-spite the convexity of the sets Θδ. (In this respect, Example 1 is really a toy problem.)Another aspect of (2) that seems innocuous but that I consider a drawback—beingoften unrealistic in practice—is that it requires the exact knowledge of P and of themapping δ 7→ Θδ.

Another way to weaken problem (1) is, in Marco Campi’s words, to “let the dataspeak,” to solve a random problem with finitely many constraints, and to provide ahigh-confidence guarantee on its solution. This method is called the scenario approach

CONSISTENCY OF THE SCENARIO APPROACH 137

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2

3 optimization direction

∆

-3 -2 -1 0 1 2 3

-3

-2

-1

0

1

2


∆ B

(a) (b)

Fig. 1. Example 1: (a) Robust problem (3), (b) Chance-constrained problem (4).

-4 -2 0 2 4

-3

-2

-1

0

1

2


∆ B

Fig. 2. Instance of scenario program with sample size N = 15; the solution θ∗N is marked with•. The centers (δ, 0), δ ∈ B, of the balls Θδ that do not contain θ∗N are plotted in light gray.

and was introduced in [3], initially aiming at robust control design. (See also [7] for anice introduction.) Its fundamental ideas develop as follows: it is supposed that theexperimenter can observe a finite sample of independent and identically distributedconstraint sets {Θδ(i)}Ni=1, extracted according to PN ; s/he forms the problem

(5)minθ∈Θ

c>θ

subject to θ ∈ Θδ(i) for all i = 1, . . . , N

and computes its solution θ∗N . Each element Θδ(i) of the finite sample is called ascenario, and problem (5) is called a scenario program; an instance of (5), along thelines of Example 1, is shown in Figure 2. Before observing the constraints, the solutionθ∗N may be regarded as a random vector over ∆N , and hence the set B ⊂ ∆ mappingto constraints {Θδ}δ∈B that do not contain θ∗N is a random set, and its probabilityP [B] is a random variable over ∆N . The scenario approach attaches to θ∗N a certificateof this form:

the probability that P [B] ≤ ε is always greater than or equal to 1− β,

where β is a parameter that depends only on the sample size N and the dimensiond of the problem, and that decreases very quickly as the sample size increases. After


observing the constraints, when the solution θ∗N has been computed, the experimentercan claim that P [B] ≤ ε with confidence 1− β.1

The scenario approach has a number of advantages over chance-constrained pro-gramming. First, every realization of problem (5) is convex and, the sample of con-straint sets being finite, usually simple to solve. Second, for any fixed ε ∈ (0, 1),the sample size N has a logarithmic dependence on the confidence parameter β(N ∼ log(1/β)), and hence to decrease β of k orders of magnitude it is sufficientto increase N by a factor k; this allows one to attain very high confidence (e.g.,1− β = 1− 10−10) or, so to say, “practical certainty,” with a relatively small samplesize N . Third, and most important, the true fundamental assumption of the scenarioapproach is just that δ(1), . . . , δ(N) are independent and identically distributed; exceptfor this, the guarantees on θ∗N provided by the various developments of the theory areall universal, in the sense that they hold irrespective of P and of the mapping δ 7→ Θδ.(In other words, the knowledge of P and δ 7→ Θδ is not required.) Furthermore, thetheory of convex scenario optimization provides insight about chance-constrained pro-gramming: indeed a “hot” research topic is the connection between the probabilisticguarantees on the solution of (5) and the feasibility of (2); see, e.g., [5], [13] for recentdevelopments. To formalize the above discussion, consider the following definition.

Definition 1. Let θ ∈ Θ. The point θ violates the constraint set Θδ if θ /∈ Θδ.The violation probability of θ is defined as follows:

V(θ) = P [{δ ∈ ∆ : θ /∈ Θδ}].

For any ε ∈ (0, 1), θ is said to be ε-robust if V(θ) ≤ ε.When this does not generate ambiguity, throughout the paper I will adopt the

notation P [θ ∈ Aδ] and P [fδ(θ) ∈ B] to denote the probabilities P [{δ ∈ ∆ : θ ∈ Aδ}]and P [{δ ∈ ∆ : fδ(θ) ∈ B}], respectively, it being understood that P [·] captures thevariable δ. With this convention the violation probability of θ reads V(θ) = P [θ /∈ Θδ].

Consider problem (5), and denote θ∗N its solution; since θ∗N is a random variableover (∆N ,FN ,PN ), so is its violation probability V(θ∗N ). (Measurability issues areaddressed in [13].) The following theorem is a milestone of the scenario approach withconvex constraints.

Theorem 2 (Campi, Garatti). For any ε ∈ (0, 1), irrespective of P and of themapping δ 7→ Θδ, the following bound holds:

(6) PN [θ∗N exists and V(θ∗N ) > ε] ≤d−1∑k=0

(N

k

)εk(1− ε)N−k =: β.

For the proof of Theorem 2, the reader is referred to the paper [4].2 It is shown, there,that the bound (6) is tight, i.e., that there exists a class of so-called nondegenerateproblems for which (6) holds with equality; for such problems, V(θ∗N ) is a randomvariable with Beta(d,N+1−d) density, irrespective of P. (For nondegenerate problems

1For comparison, recall that in chance-constrained programming P [B] = ε is a constant, fixed inadvance.

2Strictly speaking, Theorem 2 does not require that the solution θ∗N to problem (5), when itexists, is unique; nevertheless, it is customary in the scenario approach literature to assume that itis possible to isolate a single solution, when there are many, by means of a “tie-break” rule. I willadhere to this convention and refer to “the solution θ∗N” rather than “a solution θ∗N ,” although inthe following this will be needed only to make Theorem 2 work properly.


-4 -2 0 2 4

-3

-2

-1

0

1

2


∆δ1 δ2

Fig. 3. Solution of the robust problem (�) and of the essential robust one (•).

in min-max form—see section 2.1—much more is known; see, e.g., [9] for a recentmore general result.) The logarithmic dependence between β and N descends fromthe definition β :=

∑d−1k=0

(Nk

)εk(1 − ε)N−k; for a detailed discussion see, e.g., the

introduction of [4] and the comparison with [3] therein.Theorem 2 says that when θ∗N exists it is ε-robust with confidence 1 − β. If

ε is small and N big enough so that the confidence 1 − β is very high—“practicalcertainty”—one can say that the scenario minimum c>θ∗N is “close enough” to the“robust minimum” (from a risk-analysis perspective, albeit not necessarily in themetric sense); and I dare say, in statistical jargon, that c>θ∗N is a good estimator ofthe “robust minimum.” I claim that, under fairly general hypotheses, this estimatoris also consistent—hence the title of the paper—i.e., c>θ∗N converges to the robustminimum as N → ∞. But there is a big caveat here. Unless the probability P andthe mapping δ 7→ Θδ are particularly well-behaved, the true robust minimum is notthe solution of problem (1); it is instead the solution of the following one, that I liketo call the essential robust problem:

(7)minθ∈Θ

c>θ

subject to P [θ ∈ Θδ] = 1.

The meaning of the solution of (7) and its fundamental difference with the solution of(1) are illustrated in Figure 3, along the lines of Example 1. Suppose that the samplespace ∆ = [−1, 1] of Example 1, equipped, e.g., with a uniform density, is augmentedwith two points δ1 = −3/2 and δ2 = 3/2 in such a way that P

[{δ1}

]= P

[{δ2}

]= 0.

Despite the fact that the balls centered at (δ1, 0) and (δ2, 0) show up with probability0 and are completely inessential, the original robust problem must take them into ac-count; its solution is marked with � in the plot. On the other hand, the essential robustproblem (7) disregards the negligible event {δ1, δ2}; its solution is marked with •.

To make my claim more rigorous, let θ∗∗ be a solution of the essential robustproblem (7). By saying that θ∗N is consistent I mean that, under fairly general as-sumptions, c>θ∗N → c>θ∗∗ almost surely as N → ∞, and that if θ∗∗ is unique (• inFigure 3), then also θ∗N → θ∗∗ almost surely. This is precisely the main message of thepaper. The message can, and will, be translated in the language of min-max optimiza-tion; such a translation will lead to more intuitive assumptions and to simpler proofs;besides, it will make some justice of the otherwise arbitrary adjective “essential.”


Structure of the paper. Section 2 is meant to show that the solution of thescenario program (5) is equivalent to the solution of a min-max problem, and to theintroduction of a function L whose minimization is equivalent to the solution of theessential robust problem (7). Section 3 is dedicated to the proof of some fundamentalproperties of L (to start with, convexity and lower semicontinuity). Section 4 contains,among other properties of the function L and of the violation probability V, the rig-orous statement and the proof of the main results of the paper (Theorems 14 and 15).In section 5 I “translate” the assumptions of Theorems 14 and 15, which are statedwith respect to min-max optimization, back to the language of problems (5)–(7), andrestate the main result with respect to these problems (Theorem 17). The discussionpreceding and following Theorem 17 shows that c>θ∗N → +∞ if and only if problem(7) is infeasible, and that in turn this happens if and only if

⋂∞i=1 Θδ(i) = ∅ almost

surely. Section 6 concludes the paper with some final remarks and acknowledgments.

2. Formalization of the problem: min-max optimization. To establishthe main convergence results, I find it convenient to express constraints in terms ofconvex functions rather than convex sets, and to recast scenario programs as min-max programs. In this section I will show that the two approaches are equivalent,then show that the concept of max needed to recast the robust problem is somewhatsubtler than one would expect at first sight, formalize two assumptions that will beused throughout the paper, state the main results, and try to provide some insightabout the assumptions and their consequences, along with a brief comparison withsome recent results in the literature.

2.1. Scenario min-max problems. Let (∆,F ,P) and (∆N ,FN ,PN ) be theprobability spaces defined at the beginning of section 1, let X be a closed convexsubset of Rd, and suppose that to each δ ∈ ∆ there is associated a convex functionfδ : X → R, taking values in the extended real set R = R ∪ {+∞}. Consider thefollowing min-max problem:

(8)let fN (x) = max

i=1···Nfδ(i)(x);

find y∗N = minx∈X

fN (x), x∗N = arg minx∈X

fN (x),

where (δ(1), . . . , δ(N)) ∈ ∆N .3 The epigraphical form of (8) is as follows:

(9)min

(x,y)∈(X×R)y

subject to fδ(i)(x) ≤ y for all i = 1, . . . , N.

On one hand, problem (9) is a particular, (d+1)-dimensional instance of problem (5);this follows immediately letting Θ = X × R, θ = (x, y), c>θ = y, and Θδ = {(x, y) ∈X × R : fδ(x) ≤ y}, and assuming, by convention, that +∞ must be understood asthe minimum of problem (9) if such problem is infeasible. On the other hand, everyproblem in the form (5) can be formulated as a min-max problem in the form (8). To

3Following the literature of the scenario approach, I assume that a single solution x∗N ∈ X canbe isolated by means of a “tie-break” rule even if many solutions exist, although this is not reallyrelevant to my discussion and is only necessary to make Theorem 2 work properly. Thus, in the restof the paper arg minx∈X fN (x) will always denote an element of X , not a subset of X as it wouldnaturally mean.


do this, let X = Θ, x = θ, and for all δ ∈ ∆ define fδ : Θ→ R as follows:4

fδ(θ) = c>θ + IΘδ(θ) =

{c>θ if θ ∈ Θδ,

+∞ otherwise.

Then problems (5), (8), and (9) are equivalent, i.e., they yield exactly the samesolutions x∗N = θ∗N , y∗N = c>θ∗N , and hence any result about the convergence of thesolution (x∗N , y

∗N ) of (8), as N →∞, reflects on a similar result about the convergence

of the solution (θ∗N , c>θ∗N ) of (5), and vice versa.

In the setting of problem (9), the violation probability of (x, y) reads V(x, y) =P [fδ(x) > y], and since problem (9) is (d+ 1)-dimensional, the bound established byTheorem 2, provided that the solution (x∗N , y

∗N ) exists, becomes

(10) PN [V(x∗N , y∗N ) > ε] ≤

d∑k=0

(N

k

)εk(1− ε)N−k.

Since Theorem 2 lays the foundation for the main results of this paper, and sincefor the bound (10) to make sense it is necessary to ensure that a solution of (8) existsat least for N big enough, I need to introduce two assumptions that will be sufficientfor this to hold. The first assumption goes as follows.5

Assumption 1. The domain X ⊆ Rd is convex and closed; for all δ ∈ ∆, fδ :X → R is convex and lower semicontinuous. For all x ∈ X , δ 7→ fδ(x) is P-measurable(i.e., f(·)(x) is a random variable).

An immediate consequence of Assumption 1 is that fN is convex and lower semicon-tinuous for all N ∈ N. Without further mention, let us agree that Assumption 1 willbe in force throughout the whole paper. Here follows the second assumption.

Assumption 2. For all N ∈ N, fN is PN -almost surely proper.6 Moreover, thereexists N ∈ N such that

PN[fN is coercive

]> 0.

Remark. If the functions fδ are themselves coercive for all δ ∈ ∆, then the coer-civity of fN follows automatically for all N ∈ N. There are conditions that ensure thisproperty, which are very easy to check but otherwise rather conservative. One suchcondition is, of course, that the domain X is compact. Another condition, assuming

4Here IA(x) is the “indicator function of convex analysis,” which takes the value 0 if x ∈ A andthe value +∞ otherwise. Later in the paper the “indicator function of probability theory” 1A(x),taking the value 1 if x ∈ A and the value 0 otherwise, will also show up and be denoted 1(x ∈ A)for the sake of readability.

5For future reference, let me recall here some standard terminology and notation about R-valuedfunctions. A function F : X → R is lower semicontinuous if, for all x ∈ X , F (x) ≤ lim infx→x F (x).For t ∈ R, the t-sublevel set of F is the set {x ∈ X : F (x) ≤ t}. The following are well-known facts(see, e.g., [12]): F is lower semicontinuous if and only if all its t-sublevel sets are closed; the pointwisesupremum F (x) = supα∈A Fα(x) of an arbitrary family of lower semicontinuous functions Fα is lowersemicontinuous. The effective domain of a function F : X → R is the set domF = {x ∈ X : F (x) <+∞}; the closure of domF is denoted domF . A function F : X → R is proper if it has a nonemptyeffective domain, i.e., if it does not take the only value +∞. A function F : X → R is coercive if forall t ∈ R there exists a compact set C ⊆ X including the t-sublevel set of F : {x ∈ X : F (x) ≤ t} ⊆ C.Of course, if X is compact, then F : X → R is automatically coercive.

6The assumption that fN is almost surely proper implies that fδ is proper for almost all δ ∈ ∆.In general the converse is false: for instance, if fδ(x) = I[δ,δ+1](x), where δ takes the values 0 and 2with equal probability 1/2, then f2(x) = +∞ for all x ∈ R with probability 1/2.


for simplicity that X = Rd, is that lim‖x‖→∞ fδ(x) = +∞ for all δ ∈ ∆.7 But herewe need substantially less: a practical way to check Assumption 2 is indeed to isolatefinitely many events, say, A1, . . . , AN ⊆ ∆, each with positive probability, such thatfor any choice of δ(i) ∈ Ai, i = 1, . . . , N , the function fN (x) = maxi=1,...,N fδ(i)(x)tends to +∞ (or “bumps” against the boundary of X ) as ‖x‖ → ∞.

Example 2. Let δ = (a, b), where a and b are both random variables with uniformdensity in [−1, 1]. Let X = R and fδ : R→ R be defined as follows:

fδ(x) = ax+ b.

Clearly, f1(x) = fδ(1)(x) (an affine function) cannot be coercive, since all its sublevelsets are unbounded (or empty, in the negligible event a(1) = 0). On the other hand,f2(x) = max{fδ(1)(x), fδ(2)(x)} is coercive when fδ(1) and fδ(2) have opposite slopes(then f2 is a “V-shaped” function). This happens with probability 1/2, and henceAssumption 2 holds with N = 2. In view of the above remark, the key property hereis that the functions fδ with positive slope (event A1, with probability 1/2) tend to+∞ when x → +∞, and those with negative slope (event A2, also with probability1/2) tend to +∞ when x → −∞; these are the only directions along which x cantend to infinity when X = R.

2.2. A “meaningful min-max”: Main results of the paper. The objectiveof this paper is to prove the almost sure convergence of the empirical minimum y∗Nto, roughly speaking, the “min-max of all the functions fδ for δ ∈ ∆.” But what is“the maximum of all the functions fδ,” exactly, supposed to mean, since it is clearthat a true maximum may not even exist?8 The first rigorous answer that comes tomind is, of course,

S(x) = supδ∈∆

fδ(x).

S has many nice properties that one would demand from stochastic optimization: it isconvex, lower semicontinuous, and by construction fN (x) ≤ S(x) for all x ∈ X . Sincethe fN ’s form a nondecreasing sequence of functions, it is licit to wonder whether, asN →∞, fN → S in some sense, and maybe whether

(11) y∗N = minx∈X

fN (x)→ minx∈X

S(x).

Unfortunately, in any useful probabilistic sense the claim (11) is in general false, asthe following example shows.

Example 3. Suppose that δ is a random variable with uniform density in [0, 1].Let X = R and fδ : R→ R be defined by

(12) fδ(x) =

{|x|+ 2− δ if δ = 1/n for some n ∈ N,|x|+ δ otherwise.

Here S(x) = |x|+2 and minx∈X S(x) = 2. However, since δ has a density, it holds thatP [δ = 1/n for some n ∈ N] = 0, and hence fN (x) > |x|+ 1 happens with probability0, and almost surely minx∈X fN (x) ≤ 1 for all N ∈ N. A pictorial view of the family{fδ}δ∈∆ and of S is shown in Figure 4.

7Indeed, assume that lim‖x‖→∞ fδ(x) = +∞. Then, by definition of limit, for all t ∈ R thereexists M > 0 such that fδ(x) > t when ‖x‖ > M . Hence the sublevel set {x ∈ X : fδ(x) ≤ t} is asubset of the closed ball {x ∈ X : ‖x‖ ≤M}, which is always compact since Rd is finite-dimensional.

8In the following Example 3, for instance, maxδ∈∆ fδ(x) does not exist for any x ∈ X .


-1 -0.5 0 0.5 10

0.5

1

1.5

2

2.5

3

Fig. 4. Gray lines: functions fδ(x) defined in (12); dashed black line: supremum S(x); solidblack line: essential supremum L(x) (see Definition 3).

Since in the scenario approach P and the functions fδ (or the sets Θδ) are un-known, we cannot exclude that even a single fδ, extracted with probability 0, dom-inates all the other functions fδ, δ ∈ ∆, thus driving the minimum of S by itselfalone. Indeed S is a “useless supremum.” The useful supremum is instead the upperboundary of the region where all the mass of the functions lies. In Example 3, all themass of the functions lies below the upper bound |x| + 1, which is actually the leastsuch upper bound; anything above |x|+ 1 is negligible and must be discarded in ourdiscussion. The reader can now safely expect that the minimum of fN converges tominx∈R |x|+ 1 = 1. In fact this is what happens almost surely: |x|+ 1 is the “usefulsupremum”! This example leads to the following definition.

Definition 3. Let the function L : X → R be defined as follows:

L(x) = inf {y ∈ R : P [fδ(x) > y] = 0} .

In measure theory L(x) is known as the essential supremum of the function δ 7→fδ(x) and denoted ess supδ∈∆ fδ(x). Thus, L is the (x-)pointwise essential supremumof the family of functions {fδ}δ∈∆. The function L is precisely the useful supremum:it is immediate to check that, in Example 3, L(x) = |x|+ 1 (see Figure 4, solid blackline). Note that, even if fN is proper for all N and for any choice of (δ(1), . . . , δ(N)),L (as well as S) can take the value +∞ for all x ∈ X . The main results of the paper(Theorems 14 and 15) assert that, under Assumptions 1 and 2,

• if L ≡ +∞, then y∗N → +∞ almost surely;9

• otherwise min L < +∞ exists, and y∗N → min L almost surely;• if, moreover, arg min L is unique, then x∗N → arg min L almost surely.

For the same reasons why, in general, the solution y∗N does not converge to min S,in general the minimum c>θ∗N attained by the solution of (5) does not converge tothe minimum of problem (1). It converges instead to the minimum attained by thesolution—now let me call it essential robust solution—of problem (7). The connectionwith the main results is fairly intuitive; all it requires is to translate the meaning of L,and Assumptions 1 and 2, to the domain θ,Θ,Θδ of problems (5)–(7). Once this job

9More rigorously, P∞-almost surely, where P∞ is a probability on the space of infinite sequencesof elements in ∆, compatible with PN for all N ; for more details, refer to the beginning of section 4.


-2 0 2 4 6-2

0

2

4

6

δN increases

-2 0 2 4 6-2

0

2

4

6

δN increases

(a) (b)

Fig. 5. Example 4: (a) functions fδ(x) as in (13), (b) functions fδ(x) as in (14).

is done, the main results of the paper can be stated as follows (Theorem 17): underthe equivalent of Assumptions 1 and 2,

• if problem (7) is infeasible, then c>θ∗N → +∞ almost surely;• otherwise, a solution θ∗∗ of (7) exists and c>θ∗N → c>θ∗∗ almost surely;• if, moreover, θ∗∗ is unique, then θ∗N → θ∗∗ almost surely.

I will discuss in detail this version of the main results in section 5.

2.3. Discussion of the assumptions and comparison with literature.Suppose that Assumption 1 holds and that fN is proper and coercive for N bigenough. Since fN is proper, coercive, and lower semicontinuous, it attains a finiteminimum in X . This is the assertion of the Tonelli–Weierstrass theorem (see, e.g.,[2, Proposition 3.2.1]), a generalization of the “classical” Weierstrass’s theorem. Butactually the true targets of Assumption 2 are the coercivity of L (Proposition 11)and the existence of compact sublevel sets. From my point of view, Assumption 1is natural in any convex optimization problem; but Assumption 2 is truly the sub-stantial requirement here, and I conjecture that nothing can be established about theconvergence of y∗N if the fN ’s are not, in a way or another, coercive for N big enough,and if L does not have compact sublevel sets. The following example illustrates thekind of issues that may arise.

Example 4. Let p ∈ (0, 1) and δ be a random variable with geometric distributionP [δ = n] = p(1− p)n. Let X = R and fδ : R→ R be defined by

(13) fδ(x) =

{δ − x if x ≤ δ,0 otherwise

(see Figure 5(a)). Letting δN = maxi=1,...,N δ(i), we have fN (x) = δN − x for x ≤

δN , 0 otherwise. No function fN is coercive, and convergence fails: while y∗N =minx∈X fN (x) = 0 for all N and for any possible extraction of δ(1), . . . , δ(N), it isimmediate to recognize that L(x) = +∞ for all x ∈ X . Let instead fδ : R → R bedefined by

(14) fδ(x) =

{−1/(δ + 1)− x if x ≤ 0,−1/(δ + 1) otherwise

(see Figure 5(b)). In this case, letting δN = maxi=1,...,N δ(i) as before, it holds that

fN (x) = −1/(δN + 1) − x for x ≤ 0, −1/(δN + 1) otherwise, and L(x) = −x for


x ≤ 0, 0 otherwise. Neither L nor any fN is coercive, but this time y∗N = −1/(δN +1)converges to 0 = min L almost surely.

On the same subject, it is worth comparing the results presented here with thework of Shapiro and others on sample average approximation (SAA) estimators; see,e.g., [11], [17]. Roughly speaking, this paper proves for the min-max problem whatin [17, section 5.1.1] is proved about the minimum of the sample average (of randomreal-valued convex functions). Letting fN (x) = 1

N

∑Ni=1 fδ(i)(x) and F(x) = E [fδ(x)]

it is shown there that under fairly general hypotheses, as N → ∞, min fN → min Fand arg min fN → arg min F almost surely. One of these hypotheses recurs in differentforms: [17, Theorem 5.3], “there exists a compact set C ∈ Rd such that [. . . ] the setS of optimal solutions of the true problem is nonempty and is contained in C,” or[17, Theorem 5.4], “the set S of optimal solutions of the true problem is nonemptyand bounded.” (The “true problem” is to find [arg] min F.) Both these requirementsgo in the same direction as the coercivity of F: we all need the existence of compactsublevel sets!

However, differently from SAA, neither the law of large numbers nor the centrallimit theorem applies to min-max stochastic problems; hence, whereas averaging typi-cally ensures that the mean square error (MSE, that is; the expected squared deviationfrom the “true” minimum) decreases with rate 1/N as happens for SAA [17, Theorem5.7], here the MSE’s decrease rate can be arbitrarily low; in other words, althoughy∗N = min fN does converge to min L almost surely as N → ∞, the convergence canbe arbitrarily slow, as the following example shows.

Example 5. Let a > 0 and b = 1/a. Suppose that δ is a random variable withexponential density gδ(t) = ae−at for t ∈ [0,+∞). Let X = [−1, 1] and fδ : X → Rbe defined by

fδ(x) = x2 − e−δ/2.

Here L(x) = x2 and minx∈X L(x) = 0; letting δN = maxi=1,...,N δ(i), it holds that

fN (x) = x2 − e−δN/2 and y∗N = minx∈X fN (x) = −e−δN/2. The cumulative distribu-tion function of each δ(i) is Gδ(i)(t) = P

[δ(i) ≤ t

]= 1−e−at, and hence the cumulative

distribution function of δN is GδN (t) = PN[δN ≤ t

]= (1− e−at)N , and its density is

gδN (t) = Na(1− e−at)N−1e−at. The MSE of y∗N with respect to its own limit is

E[(min L− y∗N )2] = E

[e−δN

]=∫ +∞

0e−t Na(1− e−at)N−1e−at dt (let u = e−at)

= N

∫ 1

0u1/a(1− u)N−1 du (= N× Euler’s Beta(1/a+ 1, N) function)

=Γ(b+ 1) Γ(N + 1)

Γ(N + b+ 1)(now use Stirling’s approximation)

∼√

2πN NN e−N√2π(N + b) (N + b)(N+b) e−(N+b)

Γ(b+ 1)

=

√N

N + b·(

N

N + b

)N· 1

(N + b)b· eb · Γ(b+ 1)

∼ 1 · 1eb· 1

(N + b)b· eb · Γ(b+ 1) ∼ Γ(b+ 1)

N b.


Since the parameter b = 1/a can be chosen arbitrarily small, the decrease rate MSE ∼1Nb

can be arbitrarily low.

The search for performance bounds on the approximation error with respect tothe robust solution, which in min-max form would read min S− y∗N , is another “hot”research topic in the scenario approach literature. Letting aside the question of whatare the minimal requirements on P and on the mapping δ 7→ fδ sufficient to ensurethat L = S, which may be interesting per se,10 I would like to mention that resultsof this kind, coming in the form

(15) for big enough N , PN[minx∈X

S(x)− y∗N ≤ ε]≥ 1− β

with very high confidence 1− β, are now available; see, for example, the recent paper[13] by Mohajerin Esfahani, Sutter, and Lygeros, partially building on the previouswork [10] by Kanamori and Takeda. But everything comes at a price, and the currentprice for the performance bound (15) is the requirement of significant knowledge aboutP and δ 7→ fδ. For example, both [13] and [10] assume the uniform Lipschitz continuityof δ 7→ fδ(x) over X 11 In this paper I prefer to assume the least possible knowledgeabout P and δ 7→ fδ: therefore the distinction between S and L must remain, andwith only Assumptions 1 and 2 I maintain that there is not and there cannot be anyguarantee on the convergence speed of y∗N → min L. With respect to Example 5,

PN[minx∈X

L(x)− y∗N ≥ ε]

= PN[e−δN/2 ≥ ε

]= PN

[δN ≤ − log ε2] = GδN (− log ε2) =

(1− ε2a)N ,

which may be arbitrarily close to 1 because a may be arbitrarily large; hence noperformance bound in the form (15) can be established at all.12

3. Fundamental properties of L. This section is dedicated to the proof ofsome fundamental properties of L that would be trivial about S, namely, that Lis convex and lower semicontinuous and that fN ≤ L almost surely. The proof ofthe latter statement requires two technical lemmas about proper, convex, and lowersemicontinuous functions that I did not find in the “standard” literature. Let mestart the discussion by resuming in a lemma the most intuitive facts about L.

Lemma 4. For all x ∈ X ,1. P [fδ(x) > L(x)] = V(x,L(x)) = 0;2. P [fδ(x) ≤ L(x)] = 1;3. if y < L(x), then P [y < fδ(x) ≤ L(x)] > 0;4. if D ⊆ ∆ and P [D] = 1, then L(x) ≤ supδ∈D fδ(x).

10Let, e.g., (∆,m) be a metric space, T be the topology induced by m, F be the Borel σ-algebragenerated by T , P be a probability on (∆,F), and f(·)(x) : ∆ → R be a (F ,Borel(R))-measurablefunction. Are there “nice” sufficient conditions such that ess supδ∈∆ fδ(x) = supδ∈∆ fδ(x)? A simplesufficient condition is that δ 7→ fδ(x), understood as a function between metric spaces, is continuous,and that the support of P is the whole ∆ (i.e., every open subset of ∆ has positive probability); butI guess that this condition can be weakened. Moreover, one could argue that the condition actuallyneeded in the minimization endeavor is that L(x) = S(x) at the minimum points x of L: if thisholds, since L(x) ≤ S(x) for all x ∈ X anyway, a minimum point of L is also a minimum point of S.

11The paper [13] also ensures the coercivity of all the fN ’s by assuming that0 X is compact.12The fundamental requirement of [13] that fails to hold in Example 5, thus preventing us from

establishing performance bounds, is that P must have bounded support. In Example 5 the supportof P is [0,+∞).


Proof. To prove point 1 note that, by the definition of L, for all y > L(x) itholds that P [fδ(x) > y] = 0. If fδ(x) > L(x), then there exists n ∈ N such thatfδ(x) > L(x) + 1/n; hence

{δ ∈ ∆ : fδ(x) > L(x)} ⊆∞⋃n=1

{δ ∈ ∆ : fδ(x) > L(x) +

1n

}and therefore

P [fδ(x) > L(x)] ≤ P

[ ∞⋃n=1

{fδ(x) > L(x) +

1n

}]

≤∞∑n=1

P[fδ(x) > L(x) +

1n

]= 0.

Point 2 is now trivial, since the event {fδ(x) ≤ L(x)} is the complement of theevent {fδ(x) > L(x)}. To prove point 3 suppose, for the sake of contradiction, thatP [y < fδ(x) ≤ L(x)] = 0. Then

P [fδ(x) > y] = P [{y < fδ(x) ≤ L(x)} ∪ {fδ(x) > L(x)}]= P [y < fδ(x) ≤ L(x)] + P [fδ(x) > L(x)] = 0,

and hence y ∈ {y ∈ R : P [fδ(x) > y] = 0}; but now y < L(x) and the definition ofL(x) yield a contradiction. To prove point 4, let y = supδ∈D fδ(x). Any δ ∈ ∆ suchthat fδ(x) > y belongs to the complement of D, and hence P [fδ(x) > y] = 0, andy ∈ {y ∈ R : P [fδ(x) > y] = 0}. The claim follows from the definition of L(x).

The following Propositions 5 and 6 establish the most important properties ofL, its convexity and its lower semicontinuity. These are the counterparts, for thepointwise essential supremum, of two well-known facts: the pointwise supremum ofan arbitrary family of convex (resp., lower semicontinuous) functions is itself convex(resp., lower semicontinuous).

Proposition 5. L is convex.

Proof. For the sake of contradiction, suppose that L is not convex, so that thereexist x1, x2 ∈ X and λ ∈ (0, 1) such that λL(x1) + (1−λ)L(x2) < L(λx1 + (1−λ)x2).By Lemma 4 (point 2) there exist sets D1 ⊆ ∆, D2 ⊆ ∆, both with probability 1,such that fδ(x1) ≤ L(x1) for all δ ∈ D1 and fδ(x2) ≤ L(x2) for all δ ∈ D2; on theother hand, by Lemma 4 (point 3) there exists a set B ⊂ ∆ with P [B] > 0 such thatfor all δ ∈ B

(16) λL(x1) + (1− λ)L(x2) < fδ(λx1 + (1− λ)x2) ≤ L(λx1 + (1− λ)x2).

Let B = B ∩D1 ∩D2 and note that, since P[B]

= P [B] > 0, B is not empty. Butfor any δ ∈ B it holds that fδ(x1) ≤ L(x1), fδ(x2) ≤ L(x2), and by convexity of fδ

fδ(λx1 + (1− λ)x2) ≤ λfδ(x1) + (1− λ)fδ(x2)≤ λL(x1) + (1− λ)L(x2),

which is in contradiction with (16). The contradiction stems from the assumptionthat L is not convex, and this concludes the proof.


Proposition 6. L is lower semicontinuous.

Proof. Fix x ∈ X . Let (xn)∞n=1 be a sequence of points in X converging to x; letmoreover

Dn = {δ ∈ ∆ : fδ(xn) ≤ L(xn)}; D =∞⋂n=1

Dn.

Since P [Dn] = 1 for all n (Lemma 4, point 2), also P[D]

= 1. For all δ ∈ D, thelower semicontinuity of fδ implies

fδ(x) ≤ lim infn→∞

fδ(xn) ≤ lim infn→∞

L(xn),

and therefore, by Lemma 4 (point 4),

L(x) ≤ supδ∈D

fδ(x) ≤ lim infn→∞

L(xn).

This holds for all the sequences (xn)∞n=1 converging to x, and hence L is lower semi-continuous at x; the claim follows since x was chosen arbitrarily.

Lemma 7. Suppose that a function L : Rd → R is proper, convex, and lowersemicontinuous. Then for all x ∈ domL

lim infx→x

x∈domL

L(x) = L(x).

Remark. It is a well-known fact that any proper convex function L is actuallycontinuous in the interior of its effective domain [12, Corollary 2.1.3], but here I amparticularly interested in what happens at the boundary of domL; on the other hand,I do not assume that domL has nonempty interior.

Proof. The claim is trivially true if L takes the value +∞ everywhere but at onepoint. Otherwise, domL has a (nonempty, convex) relative interior.13 Suppose that

(17) lim infx→x

x∈domL

L(x) = limx→x

x∈domL

L(x) = +∞.

(This can only happen if x belongs to the relative boundary of domL.) Let ξ be anypoint in the relative interior. (The line segment [x, ξ] lies in domL.) By [15, Corollary7.5.1], and by uniqueness of the limit (17),

L(x) = limλ→1

L(λx+ (1− λ)ξ) = +∞ = lim infx→x

x∈domL

L(x).

Suppose, on the other hand, that

lim infx→x

x∈domL

L(x) = M < +∞.

Let (xn)∞n=1 be a sequence in domL converging to x and such that limn→∞ L(xn) =M . By convexity, for each n,

L(x/2 + xn/2) ≤ L(x)/2 + L(xn)/2,

13The relative interior of a set S ⊆ Rd is the interior of S with respect to the topology of thesmallest affine subspace of Rd that contains S; the relative boundary of S is its boundary withrespect to the same topology. For instance, the smallest affine subspace of R3 containing a linesegment [x1, x2] ⊂ R3 is the affine subspace `, of dimension 1, generated by x1 and x2. The relativeinterior of [x1, x2] is the segment (x1, x2) (without the endpoints), and its relative boundary is theset {x1, x2}. See, e.g., [15] for further details.


and hence

M ≤ lim infn→∞

L(x/2 + xn/2) ≤ limn→∞

L(x)/2 + L(xn)/2 = L(x)/2 +M/2,

M ≤ L(x).

Since it also holds that L(x) ≤M by lower semicontinuity, L(x) = M ; this concludesthe proof.

Lemma 8. Suppose that a function L : Rd → R is proper, convex, and lowersemicontinuous. Then there exists a finite or countable set S ⊆ domL such that forall x ∈ domL

lim infx→xx∈S

L(x) = L(x).

Remark. From the standpoint of this paper, Lemmas 7 and 8 are only propaedeu-tic to the proof of Proposition 9; therefore their statements involve a function L :Rd → R. Nevertheless, to my understanding, they can be generalized without sub-stantial changes at least to functions L : V → R, where V is any separable Banachspace.

Proof. For any i ∈ N, there exists a finite or countable covering Bi = {B(cj , 1/i)}∞j=1

of domL made of open balls with centers cj and radius 1/i; let B =⋃∞i=1 Bi. From

each open ball B(cj , 1/i) ∈ Bi select a point xij ∈ domL such that

infx∈B(cj ,1/i)

L(x) ≤ L(xij) ≤ infx∈B(cj ,1/i)

L(x) + 1/i

and form the set Si = {xij}∞j=1.14 Let S =⋃∞i=1 Si. The set S ⊆ domL is everywhere

dense in domL, and at most countable.Fix now x ∈ domL. By definition,

lim infx→x

x∈domL

L(x) = limn→∞

infx∈B(x,1/n)∩domL\{x}

L(x).

Note that the clauses “∩domL” and “\{x}” are both inessential, the first one becauseL ≡ +∞ outside domL, and the second one because by Lemma 7 the limit inferior is≤ L(x). So let me simplify notation:

(18) lim infx→x

x∈domL

L(x) = limn→∞

infx∈B(x,1/n)

L(x).

For each n ∈ N, by construction, there exists an open ball B(cn, rn) ∈ B suchthat x ∈ B(cn, rn) and B(cn, rn) ⊆ B(x, 1/n). The corresponding point xn ∈ S takenin the construction at the beginning of the proof satisfies

(19) L(xn) ≥ infx∈B(cn,rn)

L(x) ≥ infx∈B(x,1/n)

L(x);

on the other hand for each n there exists hn ∈ N big enough such that B(x, 1/(n +hn)) ⊆ B(cn, rn); by construction it follows that

(20) L(xn) ≤ infx∈B(cn,rn)

L(x) + rn ≤ infx∈B(x,1/(n+hn))

L(x) + 1/n,

14I know that an engineer invoking the axiom of choice may sound a bit booming, but. . . there itis.


because rn ≤ 1/n. As n → ∞, xn → x, and both the right-hand sides of (19)and (20) converge to the limit inferior (18) (possibly to +∞, with a slight abuse ofterminology). Thus we have

lim infx→xx∈S

L(x) ≤ limn→∞

L(xn) = lim infx→x

x∈domL

L(x) ≤ lim infx→xx∈S

L(x),

where the first inequality holds because {xn}∞n=1 ⊆ S and the second one holds becauseS ⊆ domL; hence

lim infx→xx∈S

L(x) = lim infx→x

x∈domL

L(x),

and the claim follows by an application of Lemma 7.

Remarks. (A) The only case in which the set S built in Lemma 8 is finite is whendomL contains only one point x (of course in this case S = {x}): if domL contains atleast two points, then S must be countably infinite. (B) The statement “for all x ∈ X ,for almost all δ ∈ ∆, fδ(x) ≤ L(x)” is trivial. Much less trivial is a statement like“for almost all δ ∈ ∆, for all x ∈ X , fδ(x) ≤ L(x),” because the clause “almost all” ispreserved only by countable intersections, but X ⊆ Rd is either trivial or uncountable,and hence the quantifiers “all x” and “almost all δ” cannot be interchanged freely.Of the second kind is the following proposition, that indeed relies essentially on thecountability of S established by Lemma 8.

Proposition 9. For all N , PN -almost surely fN (x) ≤ L(x) for all x ∈ X .

Proof. The claim is trivial if L ≡ +∞, and hence suppose that L is proper. (Itis also convex and lower semicontinuous by Propositions 5 and 6.) Since X ⊆ Rd,without loss of generality we can extend L to the whole of Rd, letting L(x) = +∞ forall x ∈ Rd \X . Consider now the countable, everywhere dense subset S = {xn}n∈N ⊆dom L whose existence with respect to L is established by Lemma 8. Let

Dn = {δ ∈ ∆ : fδ(xn) ≤ L(xn)}, D =∞⋂n=1

Dn;

since P [Dn] = 1 for all n ∈ N, also P[D]

= 1. Fix N ; for any choice of δ(1), . . . , δ(N) ∈D, it holds that

fN (xn) = maxi=1,...,N

fδ(i)(xn) ≤ L(xn) for all xn ∈ S.

Now let x ∈ X . If x ∈ dom L, the lower semicontinuity of fN and Lemma 8 imply

fN (x) ≤ lim infx→x

fN (x) ≤ lim infx→xx∈S

fN (x) ≤ lim infx→xx∈S

L(x) = L(x).

If instead x /∈ dom L, then fN (x) ≤ L(x) = +∞ trivially. Since the choice of x isarbitrary, fN (x) ≤ L(x) for all x ∈ X . This happens for all (δ(1), . . . , δ(N)) ∈ DN =D× · · · × D (N times). But now, since δ(1), . . . , δ(N) are supposed to be independentrandom elements (or, which is the same, PN is the product measure P× · · · × P),

PN[DN

]=

N∏i=1

P[D]

= 1,

and hence the uniform upper bound is almost sure; this proves the claim.


4. Convergence of the solution (x∗N , y∗

N). This section establishes anotherfundamental property of L, namely, that it is coercive if Assumption 2 holds, twoproperties of the violation probability V, one probabilistic (V(x∗N , y

∗N ) → 0 almost

surely) and one analytical (V is, in a sense, lower semicontinuous), and finally themain results of the paper: if Assumptions 1 and 2 hold, then y∗N → min L almostsurely, and if x∗∗ = arg min L is unique, then x∗N → x∗∗ “almost surely” (Theorem14); the case for L ≡ +∞ is covered by Theorem 15. The first thing I need to do is toassign a rigorous meaning to the clause “almost surely”, that I have tacitly left vaguefrom the beginning for the sake of readability.

To this purpose, turn now to consider the probability space of infinite sequencesextracted independently from (∆,F ,P), which I will denote (∆∞,F∞,P∞). An ele-ment δ∞ = (δ(i))∞i=1 ∈ ∆∞ is an infinite sequence of elements in ∆; F∞ is the smallestσ-algebra of subsets of ∆∞ containing the sets

E(N) ={

(δ(1), . . . , δ(N), . . .) ∈ ∆∞ : (δ(1), . . . , δ(N)) ∈ E(N)}

for all N ∈ N and E(N) ∈ FN ; and P∞ is a probability function such that

P∞[E(N)

]= PN

[E(N)

]for all N ∈ N and E(N) ∈ FN ; since PN , N ∈ N, are all product measures, suchprobability exists and is unique in view of Ionescu-Tulcea’s theorem [18, Theorem 2,p. 249]. Moreover, let FN = {E(N) ∈ F∞ : E(N) ∈ F (N)}; then

(F (N)

)∞N=1 is a

filtration in F∞. Fix an arbitrary x ∈ X , and if for a certain N the solution (x∗N , y∗N )

does not exist, let by convention (x∗N , y∗N ) = (x,−∞);15 with this convention, the

sequence of solutions ((x∗1, y∗1), . . . , (x∗N , y

∗N ), . . .) is a stochastic process adapted to(

F (N))∞N=1, and P∞ is compatible with PN also in the following sense: for all N ∈ N

and for any Borel set B ⊆ (X × R)N ,

P∞[ {

(x∗i , y∗i )∞i=1 ∈ (X × R)∞ : (x∗i , y

∗i )Ni=1 ∈ B

} ]= PN

[(x∗i , y

∗i )Ni=1 ∈ B

].

Consider now the random variable

ˆN = min {N ∈ N : (x∗N , y∗N ) exists with y∗N > −∞} .

ˆN is a stopping time with respect to the filtration(F (N)

)∞N=1. The following lemma

ensures that, under Assumption 2, ˆN is P∞-almost surely finite.

Lemma 10. If Assumption 2 holds, then P∞-almost surely there exists N ∈ Nsuch that fN is coercive. For all N ≥ N , fN is coercive and attains a finite minimum.

Proof. Let p = PN[fN is coercive

]> 0. For k = 0, 1, 2, . . . , the sets

Bk ={

maxi=kN+1,...,(k+1)N

fδ(i) is not coercive}⊆ ∆(k+1)N

15Here, −∞ is just a placeholder to mark the nonexistence of a solution and to ensure that in thiscase (x∗N , y

∗N ) /∈ X ×R; it does not mean that fN is unbounded from below. Visualize δ with uniform

density in [−1, 1], X = R, and fδ(x) = eδx; if δ(1), . . . , δ(N) are all positive, then infx∈X fN (x) = 0is not attained.


form a sequence of independent events, each with probability 1− p. Therefore

P∞[for all k ∈ N, max

i=kN+1,...,(k+1)Nfδ(i) is not coercive

]= limK→∞

P(K+1)N[for all k ≤ K, max

i=kN+1,...,(k+1)Nfδ(i) is not coercive

]= limK→∞

K∏k=0

P(k+1)N[

maxi=kN+1,...,(k+1)N

fδ(i) is not coercive]

= limK→∞

(1− p)K+1 = 0,

and hence P∞-almost surely there exists k finite such that maxi=kN+1,...,(k+1)N fδ(i)

is coercive, and a fortiori f(k+1)N = maxi=1,...,(k+1)N fδ(i) is coercive. The first claimfollows letting N = (k+1)N . Then, for allN ≥ N , fN is proper, lower semicontinuous,and coercive (because fN (x) ≥ fN (x) for all x ∈ X ). Hence

y∗N = minx∈X

fN (x) < +∞ and x∗N = arg minx∈X

fN (x)

exist by the Tonelli–Weierstrass theorem.

Proposition 11. If Assumption 2 holds, then L is coercive.

Proof. If L takes the only value +∞, the claim is trivial, since any sublevel setof L is the empty set. Assume, therefore, that L is proper, and suppose for thesake of contradiction that it is not coercive. Then there exists t ∈ R such that theset C = {x ∈ X : L(x) ≤ t} is nonempty and not compact. The set C is anywayclosed, because L is lower semicontinuous [12, Proposition 2.2.5]; therefore, by theHeine–Borel theorem [16, Theorem 2.41], C is unbounded.

Proposition 9 ensures that, for all N ∈ N,

P∞[fN (x) ≤ L(x) for all x ∈ X

]= PN

[fN (x) ≤ L(x) for all x ∈ X

]= 1.

Hence the event D ∈ F∞ defined as follows,

D =∞⋂N=1

{fN (x) ≤ L(x) for all x ∈ X

},

has also probability 1. Therefore

P∞-almost surely, for all N , fN (x) ≤ L(x) for all x ∈ X

⇒ almost surely, for all N , {x ∈ X : fN (x) ≤ t} ⊇ {x ∈ X : L(x) ≤ t} = C

⇒ almost surely, for all N , {x ∈ X : fN (x) ≤ t} is unbounded.

On the other hand, by Lemma 10, P∞-almost surely there exists N finite such that fNis coercive, and hence P∞-almost surely there exists N such that {x ∈ X : fN (x) ≤t} is compact. That {x ∈ X : fN (x) ≤ t} is both compact and unbounded is acontradiction, stemming from the assumption that L is not coercive.

The following two propositions establish an asymptotic property of V(x∗N , y∗N )

and an analytic property of the function V (in essence, very similar to lower semicon-tinuity).


Proposition 12.

limN→∞

V(x∗N , y∗N ) = 0 P∞-almost surely.

Proof. Fix ε ∈ (0, 1). By Lemma 10, P∞-almost surely there exists N ∈ N suchthat for all N ≥ N the solution (x∗N , y

∗N ) exists. For all such N , by Theorem 2 (see

(10)),

(21)

P∞ [V(x∗N , y∗N ) > ε] = PN [V(x∗N , y

∗N ) > ε]

≤d∑k=0

(N

k

)εk(1− ε)N−k

≤d∑k=0

Nd+1(1− ε)N−d

= (d+ 1)Nd+1(1− ε)N−d.

Since as N →∞

(d+ 1)(N + 1)d+1(1− ε)N+1−d

(d+ 1)Nd+1(1− ε)N−d=(

1 +1N

)d+1

(1− ε)→ (1− ε) < 1,

by the ratio test

∞∑N=N

P∞ [V(x∗N , y∗N ) > ε] ≤

∞∑N=N

(d+ 1)Nd+1(1− ε)N−d <∞.

Therefore, by the Borel–Cantelli lemma [18, p. 255],

P∞ [V(x∗N , y∗N ) > ε infinitely often] = 0.

Actually this holds for all ε ∈ (0, 1], the case ε = 1 being trivial, and hence

P∞ [there exists h ∈ N s.t. V(x∗N , y∗N ) > 1/h infinitely often]

= P∞[ ∞⋃h=1

{V(x∗N , y∗N ) > 1/h infinitely often}

]

≤∞∑h=1

P∞ [V(x∗N , y∗N ) > 1/h infinitely often] = 0.

Since the event {there exists h ∈ N such that V(x∗N , y∗N ) > 1/h infinitely often} has

probability 0, the complementary event {for all h ∈ N there exists N ∈ N such that0 ≤ V(x∗N , y

∗N ) ≤ 1/h for all N ≥ N} has probability 1; the latter is included in the

event {for any ε′ ∈ (0, 1) there exist h ∈ N and N < ∞ such that 0 ≤ V(x∗N , y∗N ) ≤

1/h ≤ ε′ for all N ≥ N}, that is,

(22){δ∞ ∈ ∆∞ : lim

N→∞V(x∗N , y

∗N ) = 0

},

and hence (22) has also probability 1; this proves the claim.


Remark. Equation (21) is the only point of this paper that makes use of the bound(10) provided by Theorem 2; the reader will of course notice that the second inequalityis rather loose. Indeed I have chosen to present Theorem 2 in the introduction becausethat result provides the state-of-the-art bound for the scenario approach with convexconstraints, but an inequality similar to (21), and equally sufficient for my purposes,would follow also from the pioneering result by Calafiore and Campi [3], which in thepresent context would read

(23) PN [V(x∗N , y∗N ) > ε] ≤

(N

d+ 1

)(1− ε)N−d−1.

Proposition 13. Suppose that (xN )∞N=1 is a sequence in X converging to x∞ asN →∞ and that (yN )∞N=1 is a nondecreasing sequence in R such that yN ≤ y < +∞for all N ∈ N. Then

lim infN→∞

V(xN , yN ) ≥ V(x∞, y).

In particularlim infN→∞

V(xN , yN ) ≥ V(x∞, y∞),

where y∞ = limN→∞ yN .

Proof. Consider random variables of the form 1(fδ(xN ) > y). First note that, forall δ ∈ ∆ and N ∈ N, 1(fδ(xN ) > y) ≤ 1(fδ(xN ) > yN ), since yN ≤ y.

Now fix δ ∈ ∆ and note that the indicator function can only take the values 0and 1; therefore if lim infN→∞ 1(fδ(xN ) > y) = 0, it must hold that fδ(xN ) ≤ y forinfinitely many N ; then, by lower semicontinuity,

fδ(x∞) ≤ lim infN→∞

fδ(xN ) ≤ y.

By contraposition, if fδ(x∞) > y, then lim infN→∞ 1(fδ(xN ) > y) = 1. It followsthat

1(fδ(x∞) > y) ≤ lim infN→∞

1(fδ(xN ) > y) ≤ lim infN→∞

1(fδ(xN ) > yN ).

Finally

lim infN→∞

V(xN , yN ) = lim infN→∞

P [fδ(xN ) > yN ]

= lim infN→∞

∫∆1(fδ(xN ) > yN ) dP

≥∫

∆lim infN→∞

1(fδ(xN ) > yN ) dP

≥∫

∆1(fδ(x∞) > y) dP = V(x∞, y),

where the first inequality is due to Fatou’s lemma [18, p. 187]. The second claimfollows trivially letting y = y∞.

We are now ready to prove the main results of the paper.

Theorem 14. Suppose that Assumptions 1 and 2 hold and that L is proper. Then1. L attains its minimum y∗∗ = minx∈X L(x) < +∞ at at least one point x∗∗ ∈X ;


2. irrespective of the uniqueness of x∗∗, it holds that

limN→∞

y∗N = y∗∗ P∞-almost surely;

3. if, moreover, x∗∗ is unique, then

limN→∞

x∗N = x∗∗ P∞-almost surely.

Proof. Since L is proper, lower semicontinuous, and also coercive by Proposition11, by the Tonelli–Weierstrass theorem there exists in X at least one minimum pointx∗∗ of L attaining a finite minimum y∗∗ = L(x∗∗) < +∞. This proves point 1.

To prove point 2 first note that, by Lemma 10, there exists a set D1 ⊆ ∆∞ withprobability 1 such that, for all δ∞ ∈ D1, there exists N ∈ N such that fN is coercive.Moreover, by Proposition 12, there exists a set D2 ⊆ ∆∞ with probability 1 suchthat, for all δ∞ ∈ D2, V(x∗N , y

∗N )→ 0 as N →∞. And by Proposition 9 the set

D3 ={

for all N ∈ N, fN (x) ≤ L(x) for all x ∈ X}

=∞⋂N=1

{fN (x) ≤ L(x) for all x ∈ X

}has probability 1. Let D = D1 ∩ D2 ∩ D3 (hence also P∞ [D] = 1).

Fix a δ∞ ∈ D and the corresponding N . For N ≥ N , the fN ’s form a nondecreas-ing sequence of coercive functions, all bounded from above by L, and their minimay∗N form a nondecreasing sequence of real numbers bounded from above by y∗∗, whichtherefore has a limit y∗∞. Let C = {x ∈ X : fN (x) ≤ y∗∗}. C is nonempty because itcontains at least the minimum point x∗∗, closed because fN is lower semicontinuous,and bounded because fN is coercive. Thus, C is nonempty and compact. For allN ≥ N , since fN (x) ≥ fN (x) for all x ∈ X , it holds that {x ∈ X : fN (x) ≤ y∗∗} ⊆ C.Therefore x∗N ∈ C for all N ≥ N , and since C is compact there exists a convergingsubsequence (x∗Nk)∞k=1 of the sequence (x∗N )∞

N=N. Let x∗∞ = limk→∞ x∗Nk ; of course it

holds also that limk→∞ y∗Nk = y∗∞. Now

P [fδ(x∗∞) > y∗∞] = V(x∗∞, y∗∞)

≤ limk→∞

V(x∗Nk , y∗Nk

) (Proposition 13)

= 0 (because δ∞ ∈ D2).

Hence P [fδ(x∗∞) > y∗∞] = 0, and consequently

(24) L(x∗∞) = inf{y ∈ R : P [fδ(x∗∞) > y] = 0} ≤ y∗∞.

Since y∗∗ = min L, it follows that y∗∞ ≥ y∗∗; on the other hand y∗∞ ≤ y∗∗ by construc-tion. Therefore limN→∞ y∗N = y∗∗; and since the choice of δ∞ ∈ D was arbitrary andP∞[D] = 1, this proves point 2.

To prove point 3, suppose that x∗∗ is unique. Fix δ∞ ∈ D and the correspondingN , consider the nonempty compact set C as in the proof of point 2, and recall thatx∗N ∈ C for all N ≥ N . Suppose also, for the sake of contradiction, that x∗N does notconverge to x∗∗. Then there exists η > 0 such that the set {x ∈ X : ‖x − x∗∗‖ ≥ η}contains infinitely many terms of the sequence (x∗N )∞

N=N. Since {‖x − x∗∗‖ ≥ η}


is closed, the intersection K = {‖x − x∗∗‖ ≥ η} ∩ C is nonempty and compact,and hence from the sequence (x∗N )∞

N=Nwe can extract a converging subsequence

(x∗Nk)∞k=1 of elements in K and let x∗∞ = limk→∞ x∗Nk . (Again, it also holds thatlimk→∞ y∗Nk = y∗∞.) But then x∗∞ ∈ K, and hence ‖x∗∞− x∗∗‖ ≥ η so that x∗∞ 6= x∗∗.Finally,

y∗∗ = L(x∗∗) < L(x∗∞) (uniqueness of the minimum point)≤ y∗∞ (inequality (24)).

That limN→∞ y∗N > y∗∗ is a contradiction (indeed y∗N ≤ y∗∗ for all N), stemmingfrom the assumption that x∗N does not converge to x∗∗. Hence limN→∞ x∗N = x∗∗ forall δ∞ ∈ D; this concludes the proof of the theorem.

Theorem 15. Suppose that Assumptions 1 and 2 hold and that L takes the onlyvalue +∞. Then

limN→∞

y∗N = +∞ P∞-almost surely.

Proof. Construct the set D ⊆ ∆∞, with probability 1, exactly as in the proofof Theorem 14, and fix δ∞ ∈ D and the corresponding N . For N ≥ N , the fN ’sform a nondecreasing sequence of coercive functions, and their minima y∗N form anondecreasing sequence of real numbers. For the sake of contradiction suppose that,for all N , y∗N ≤ y for a certain y ∈ R. Let C = {x ∈ X : fN (x) ≤ y}; the set C isnonempty and compact, and for all N ≥ N , since fN (x) ≥ fN (x) for all x ∈ X , itholds that {x ∈ X : fN (x) ≤ y} ⊆ C. It follows that x∗N ∈ C for all N ≥ N , anddue to the compactness of C there exists a converging subsequence (x∗Nk)∞k=1 of thesequence (x∗N )∞

N=N. Let x∗∞ = limk→∞ x∗Nk ; we have

P [fδ(x∗∞) > y] = V(x∗∞, y)≤ limk→∞

V(x∗Nk , y∗Nk

) (Proposition 13)

= 0 (because δ∞ ∈ D).

Hence P [fδ(x∗∞) > y] = 0, and consequently

L(x∗∞) = inf{y ∈ R : P [fδ(x∗∞) > y] = 0} ≤ y.

This is a contradiction (L(x∗∞) = +∞), stemming from the assumption that thesequence (y∗N )∞

N=Nis bounded from above; hence limN→∞ y∗N = +∞. Since δ∞ ∈ D

is arbitrary and P∞ [D] = 1, this proves the theorem.

5. Back to the scenario approach. This section is meant to apply Theorems14 and 15 to the scenario problem (5) and to the essential robust problem (7) intro-duced in Section 1; let therefore X = Θ ⊆ Rd and x = θ. Recall from section 2.1 thatthe solution of (5) is equivalent to the minimization of fN (θ) = maxi=1···N fδ(i)(θ),where

(25) fδ(θ) = c>θ + IΘδ(θ) =

{c>θ if θ ∈ Θδ,

+∞ otherwise,

and denote

θ∗N = arg minθ∈Θ

fN (θ),

y∗N = minθ∈Θ

fN (θ) = c>θ∗N .


Assumption 1 requires that for all δ ∈ ∆ the function fδ is convex and lowersemicontinuous. Since the term c>θ is linear, this is the same as requiring that IΘδis convex and lower semicontinuous; and since Θδ is the 0-sublevel set of IΘδ this is,in turn, equivalent to the requirement that Θδ is convex and closed. Hence in thepresent setting Assumption 1 can be restated as follows.

Assumption 3. The set Θ ⊆ Rd is convex and closed; for all δ ∈ ∆, Θδ is convexand closed. For all θ ∈ Θ, the event {θ ∈ Θδ} belongs to F .

Assumption 2 requires that fN is almost surely proper. Since fN (θ) < +∞ if andonly if θ belongs to all the sets Θδ(i) , this is equivalent to the requirement that theintersection of these sets is nonempty. Moreover, the assumption requires that theprobability of the event {fN is coercive} is nonzero for a certain N . fN is coercivewhen its t-sublevel set is either empty (when t < min fN ) or bounded (equivalently,compact, since fN is lower semicontinuous and all its sublevel sets are closed). Thus,for the problem at hand, Assumption 2 can be restated as follows.

Assumption 4. For all N ∈ N, PN -almost surely⋂Ni=1 Θδ(i) 6= ∅. Moreover,

there exists N ∈ N such that the eventfor some t ∈ R, the setN⋂i=1

Θδ(i) ∩ {c>θ ≤ t} is nonempty and bounded

has nonzero probability.

Next, I need to translate in the language of problems (5)–(7) the distinctionbetween when L is proper and when it takes the only value +∞. This is done in thefollowing lemma.

Lemma 16. If the functions fδ, δ ∈ ∆, are defined as in (25), then L is properif and only if there exists θ ∈ Θ such that P

[θ ∈ Θδ

]= 1, i.e., if and only if the

essential robust problem (7),

minθ∈Θ

c>θ

subject to P [θ ∈ Θδ] = 1,

is feasible. If L is proper, the solution of problem (7) is arg minθ∈Θ L(θ).

Proof. For any y ∈ R, fδ(θ) > y if and only if c>θ > y or θ /∈ Θδ. ThereforeP [fδ(θ) > y] = P

[c>θ > y or θ /∈ Θδ

], and we recover three inequalities:

(26)

P [fδ(θ) > y] ≤ 1(c>θ > y) + P [θ /∈ Θδ],

P [fδ(θ) > y] ≥ 1(c>θ > y),P [fδ(θ) > y] ≥ P [θ /∈ Θδ].

If P [θ /∈ Θδ] = 0 (equivalently P [θ ∈ Θδ] = 1), then the first two inequalities in (26)imply P [fδ(θ) > y] = 1(c>θ > y), and hence P [fδ(θ) > y] = 0 if and only if c>θ ≤ yand, by definition,

L(θ) = inf {y ∈ R : P [fδ(θ) > y] = 0} = inf{y ∈ R : c>θ ≤ y

}= c>θ.

If instead P [θ /∈ Θδ] > 0, then the third inequality in (26) implies P [fδ(θ) > y] > 0for all y ∈ R and, again by definition,

L(θ) = inf {y ∈ R : P [fδ(θ) > y] = 0} = inf ∅ = +∞.


Summing up,

L(θ) =

{c>θ if P [θ ∈ Θδ] = 1,+∞ otherwise.

Thus, if there exists θ ∈ Θ such that P[θ ∈ Θδ

]= 1, then L(θ) = c>θ and L is proper;

otherwise L takes the only value +∞, and this proves the first claim. From the lastobservation it is also clear that

minθ∈Θ

c>θ subject to P [θ ∈ Θδ] = 1

= minθ∈Θ

L(θ) subject to L(θ) < +∞

= minθ∈Θ

L(θ) (if L is proper);

the second claim follows immediately.

Remark. At first sight, it might seem that the problem of finding arg minθ∈Θ L(θ),that is. the unconstrained minimization of a convex and lower semicontinuous func-tion, being conceptually simpler than the constrained minimization in problem (7),could be simpler also computationally, i.e., more convenient to solve by means of nu-merical methods. This is in general false. Letting aside the choice of a method tominimize L, the true difficulty here is the computation of L: indeed deciding whetherL(θ) = c>θ or L(θ) = +∞ for a given θ may be an intractable problem.16 The reasonof the computational difficulty is made explicit by the following example, proposedby Ben-Tal and Nemirovski in [1, section 3.2.2, p. 787]. Consider the robust problem

(27)minθ∈Rn

c>θ

subject to θ>diag(δ)2θ ≤ 1 for all δ ∈ ∆,

where ∆ is a parallelotope in Rn centered at the origin; assume that ∆ is equippedwith, say, a uniform distribution. The corresponding essential robust problem is

(28)minθ∈Rn

c>θ

subject to P[θ>diag(δ)2θ ≤ 1

]= 1.

Both (27) and (28) are feasible (the former because θ = 0 satisfies the constraintθ>diag(δ)2θ ≤ 1 for all δ ∈ ∆, and the latter because P [∆] = 1); actually, the twoproblems are the same.17 But, for instance, deciding the robust feasibility of the pointθ = [1 1 1 · · · 1]>, that is, deciding whether or not L(θ) is finite, amounts to verifyingthat ‖δ‖2 ≤ 1 for all δ ∈ ∆, and this is known to be a NP-hard problem. (Refer to[1] for more details.)

From the standpoint of section 1 the following theorem may also be thought of asthe main result of the paper. Nevertheless, in view of Lemma 16 and of the previousdiscussion, it is clearly just a restatement of Theorems 14 and 15, and therefore Ileave it without proof.

16The same story can be told about the computation of the convex and lower semicontinuousfunction S, whose minimization corresponds to the solution of the robust problem (1).

17Problems (27) and (28) coincide because the mapping δ 7→ θ>diag(δ)2θ is continuous for all θand, assuming uniform distribution, the support of P is the whole ∆: this implies L = S, as hasalready been mentioned in a footnote.


Theorem 17. Suppose that Assumptions 3 and 4 hold. If the essential robustproblem (7) is feasible, then

1. problem (7) admits a solution θ∗∗;2. irrespective of the uniqueness of θ∗∗, it holds that

limN→∞

c>θ∗N = c>θ∗∗ P∞-almost surely;

3. if, moreover, θ∗∗ is unique, then

limN→∞

θ∗N = θ∗∗ P∞-almost surely.

If instead problem (7) is infeasible, then

limN→∞

c>θ∗N = +∞ P∞-almost surely.

To conclude the discussion, I will try to provide a bit more insight about thepossible infeasibility of the essential robust problem. It has already been observed, ina comment below Definition 3, that the requirement that fN is almost surely proper(Assumption 2) does not exclude that L ≡ +∞ so that limN→∞ fN (θ) = +∞ almostsurely for all θ ∈ Θ. In pretty much the same way,

⋂Ni=1 Θδ(i) 6= ∅ does not exclude

that⋂∞i=1 Θδ(i) = ∅; indeed this is what happens almost surely in the “bad” case

where problem (7) is infeasible and c>θ∗N → +∞. This connection is clarified by thefollowing result.

Proposition 18. Under the hypotheses of Theorem 17, problem (7) is infeasibleif and only if

∞⋂i=1

Θδ(i) = ∅ P∞-almost surely.

A necessary condition for this to hold is that⋂Ni=1 Θδ(i) is PN -almost surely unbounded

for all N .

Proof. Suppose that problem (7) is infeasible. Consider a δ∞ ∈ ∆∞ such that⋂∞i=1 Θδ(i) is not empty. For any such δ∞ there exists θ ∈

⋂∞i=1 Θδ(i) , and it holds

that limN→∞ y∗N ≤ c>θ < +∞. Hence, by the last claim of Theorem 17, the set ofall such δ∞ has probability 0. Vice versa, suppose that almost surely

⋂∞i=1 Θδ(i) = ∅

and, for the sake of contradiction, that there exists θ ∈ Θ such that P[θ ∈ Θδ

]= 1.

Then, by independence,

P∞[θ ∈

∞⋂i=1

Θδ(i)

]= limN→∞

PN[θ ∈

N⋂i=1

Θδ(i)

]= limN→∞

N∏i=1

P[θ ∈ Θδ(i)

]= 1,

i.e., almost surely θ ∈⋂∞i=1 Θδ(i) , which is a contradiction. Hence such a θ does

not exist, and problem (7) is infeasible; this proves the first claim. To prove thesecond claim suppose again that almost surely

⋂∞i=1 Θδ(i) = ∅ and, for the sake of

contradiction, that⋂Ni=1 Θδ(i) is bounded with nonzero probability for a certain N .

Let Ck =⋂(k+1)Ni=1 Θδ(i) . The sets (Ck)∞k=0 form a nonincreasing sequence · · ·Ck−1 ⊇

Ck ⊇ Ck+1 · · · of nonempty closed sets; but since

PN[N⋂i=1

Θδ(kN+i) is bounded

]= PN

[N⋂i=1

Θδ(i) is bounded

]> 0 for all k ∈ N,


P∞-almost surely there exists k ∈ N such that⋂Ni=1 Θδ(kN+i) is bounded. Hence,

for k ≥ k the intersections Ck are nonempty and compact. It follows [16, Corollaryto Theorem 2.36] that

⋂∞i=1 Θδ(i) =

⋂∞k=0 Ck 6= ∅, P∞-almost surely. This is a

contradiction stemming from the assumption that PN [⋂Ni=1 Θδ(i) is bounded] > 0;

hence PN [⋂Ni=1 Θδ(i) is bounded] = 0 for all N , and this concludes the proof.

Remarks. (A) If problem (7) is infeasible, then, of course, problem (1) is alsoinfeasible, because P [∆] = 1. (B) The main point of the second claim of Proposition 18is to help establish feasibility by contraposition: for example, if P [Θδ is bounded] > 0or, more generally, if there exist finitely many events, say, A1, . . . , AN ⊆ ∆, eachwith positive probability, such that for any choice of δ(i) ∈ Ai, i = 1, . . . , N , the setΘδ(1) ∩ · · · ∩Θδ(N) is bounded, then problem (7) must be feasible. Moreover, if theseevents exist the second part of Assumption 4 is automatically enforced, because if⋂Ni=1 Θδ(i) is bounded there always exists t ∈ R such that

⋂Ni=1 Θδ(i) ⊆ {c>θ ≤ t}.

The message of Proposition 18 and of this remark is illustrated in the following, final,example.

Example 6. Let p ∈ (0, 1), Θ = R2, θ = (x1, x2), c>θ = x1+x2, and consider threecases. (A) Suppose that δ = (r, γ), where r is a binary variable taking values 1, 2 withequal probability 1/2 and γ is a random variable independent of r with geometricdistribution P [γ = n] = p(1− p)n. Let moreover

Θδ =

{{(x1, x2) ∈ R2 : x1 ≥ γ

}if r = 1,{

(x1, x2) ∈ R2 : x2 ≥ γ}

if r = 2.

Letting GN,1 = maxi=1,...,N : r(i)=1 γ(i) and GN,2 = maxi=1,...,N : r(i)=2 γ(i), it holdsthat

⋂Ni=1 Θδ(i) = {x1 ≥ GN,1, x2 ≥ GN,2}. In this case, P∞-almost surely, as N →

∞ both GN,1 and GN,2 tend to +∞, the minimum c>θ∗N = GN,1 + GN,2 also tendsto +∞, and, as claimed by Proposition 18,

⋂∞i=1 Θδ(i) = ∅; and of course

⋂Ni=1 Θδ(i)

is PN -almost surely unbounded for all N . A pictorial view of this case is shown inFigure 6(a).

(B) Suppose that δ = (r, γ) as before, but this time let

Θδ =

{

(x1, x2) ∈ R2 : x1 ≥ − 1γ+1

}if r = 1,{

(x1, x2) ∈ R2 : x2 ≥ − 1γ+1

}if r = 2.

(See Figure 6(b).) As before⋂Ni=1 Θδ(i) is PN -almost surely unbounded for all N ,

but now c>θ∗N = −1/(GN,1 + 1) − 1/(GN,2 + 1) tends almost surely to 0 (i.e., theminimum attained by problem (7)), and ∅ 6=

⋂∞i=1 Θδ(i) = the first quadrant of the

x1, x2 plane. This point shows that the unboundedness of⋂Ni=1 Θδ(i) for all N is only

necessary for the infeasibility of (7), not sufficient.(C) Finally, suppose that r is a ternary variable taking values 1, 2, 3 with equal

probability 1/3, δ = (r, γ) as before, and

Θδ =

{

(x1, x2) ∈ R2 : x1 ≥ − 1γ+1

}if r = 1,{

(x1, x2) ∈ R2 : x2 ≥ − 1γ+1

}if r = 2,{

(x1, x2) ∈ R2 : x1 ≤ 5, x2 ≤ 5}

if r = 3.

(See Figure 6(c).) It is immediate to check that⋂Ni=1 Θδ(i) 6= ∅ for all N . Moreover,

although Θδ is unbounded for every δ ∈ ∆, the events A1 = {δ ∈ ∆ : r = 1},


-2 0 2 4 6 8 10-202468

10(b)

-2 0 2 4 6 8 10-202468

10(c)

-2 0 2 4 6 8 10-202468

10(a)

Fig. 6. Instances of finite-sample problems for each case in Example 6. Light gray:⋂Ni=1 Θδ(i) ;

dark gray:⋂∞i=1 Θδ(i) ; ◦ = θ∗N ; • = essential robust solution. (a) The essential robust problem is

infeasible. (b) The essential robust problem is feasible but⋂Ni=1 Θδ(i) is PN -almost surely unbounded

for all N . (c) The essential robust problem is feasible and⋂Ni=1 Θδ(i) is compact.

A2 = {δ ∈ ∆ : r = 2}, and A3 = {δ ∈ ∆ : r = 3}, each with positive probability,are such that if δ(i) ∈ Ai, i = 1, 2, 3, then the set Θδ(1) ∩ Θδ(2) ∩ Θδ(3) is alwaysbounded. According to the second claim of Proposition 18, this is enough to establishthe feasibility of problem (7). P∞-almost surely, the solution θ∗N converges to theorigin (i.e., the solution of (7)) and

⋂∞i=1 Θδ(i) = [0, 5]× [0, 5].

6. Conclusions. In this paper I have shown that the solution of a convex sce-nario program, subject to independent and identically distributed constraints that,almost surely, sooner or later confine it to a compact set (coercivity), converges almostsurely to the solution of a suitably defined essential robust problem. Future work maybe dedicated

• to searching for weak enough conditions such that the scenario solution con-verges to the solution of the robust problem, i.e., nice sufficient conditions toensure that L(x) = S(x) at least at the minimum points x = x∗∗ of L;

• to weakening the convexity hypothesis. On one hand, it should not be difficultto show that the convergence proved here holds also for certain families ofnonconvex problems, for example, the one considered in [13, section IV].On the other hand, the scenario approach can be applied also to nonconvexproblems, albeit with a different a posteriori assessment method (see, e.g., [6],[8]), but to my knowledge no general a priori bounds like (10) or (23), powerfulenough to exploit the Borel–Cantelli lemma (proof of Proposition 12), existwithout the assumptions of convexity and finite-dimensionality (X ⊆ Rd).Convexity has also been used to prove that, for all N , almost surely fN ≤ L(Proposition 9), but I conjecture that a uniform bound is not necessary toestablish the main results.

Acknowledgments. I am indebted to Rinaldo Colombo, to Augusto Ferrante,and to my colleagues Giorgio Arici, Marco Campi, Algo Care, Marco Dalai, and Re-nata Mansini for their fruitful suggestions about the paper, and particularly to SimoneGaratti for helping me lift a technical assumption about ∆. I extend my gratitude tothe anonymous reviewers for their constructive and encouraging comments.

REFERENCES

[1] A. Ben-Tal and A. Nemirovski, Robust convex optimization, Math. Oper. Res., 23 (1998),pp. 769–805.

[2] D. P. Bertsekas, Convex Optimization Theory, Athena Scientific, Belmont, MA, 2009.[3] G. C. Calafiore and M. C. Campi, The scenario approach to robust control design, IEEE

Trans. Automat. Control, (2006), pp. 742–753.


[4] M. C. Campi and S. Garatti, The exact feasibility of randomized solutions of uncertain convexprograms, SIAM J. Optim., 19 (2008), pp. 1211–1230.

[5] M. C. Campi and S. Garatti, A sampling-and-discarding approach to chance-constrainedoptimization: Feasibility and optimality, J. Optim. Theory Appl., 148 (2011), pp. 257–280.

[6] M. C. Campi and S. Garatti, Wait-and-judge scenario optimization, Math. Program.,Sprnger, New York, 2016, https://doi.org/10.1007/s10107-016-1056-9.

[7] M. C. Campi, S. Garatti, and M. Prandini, The scenario approach for systems and controldesign, Annu. Rev. Control, 33 (2009), pp. 149–157.

[8] M. C. Campi, S. Garatti, and F. A. Ramponi, Non-convex scenario optimization with ap-plication to system identification, in Proceedings of the 54th IEEE Conference on Decisionand Control, Osaka, Japan, 2015.

[9] A. Care, S. Garatti, and M. C. Campi, Scenario min-max optimization and the risk ofempirical costs, SIAM J. Optim., 25 (2015), pp. 2061–2080.

[10] T. Kanamori and A. Takeda, Worst-case violation of sampled convex programs for optimiza-tion with uncertainty, J. Optim. Theory Appl., 152 (2012), pp. 171–197.

[11] A. J. Kleywegt, A. Shapiro, and T. Homem-de-Mello, The sample average approximationmethod for stochastic discrete optimization, SIAM J. Optim., 12 (2001), pp. 479–502.

[12] R. Lucchetti, Convexity and Well-Posed Problems, Springer, New York, 2006.[13] P. Mohajerin Esfahani, T. Sutter, and J. Lygeros, Performance bounds for the scenario

approach and an extension to a class of non-convex programs, IEEE Trans. Automat.Control, 60 (2015), pp. 46–58.

[14] A. Prekopa, Contributions to the theory of stochastic programming, Math. Program., 4 (1973),pp. 202–221.

[15] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1997.[16] W. Rudin, Principles of Mathematical Analysis, 3rd intl. ed., McGraw-Hill, New York, 1976.[17] A. Shapiro, D. Dentcheva, and A. Ruszczynski, Lectures on Stochastic Programming—

Modeling and Theory, SIAM, Philadelphia, 2009.[18] A. N. Shiryaev, Probability, Springer, New York, 1996.

https://doi.org/10.1007/s10107-016-1056-9

CONSISTENCY OF THE SCENARIO APPROACH · Vol. 28, No. 1, pp. 135{162 CONSISTENCY OF THE SCENARIO...

Documents

Transcript of CONSISTENCY OF THE SCENARIO APPROACH · Vol. 28, No. 1, pp. 135{162 CONSISTENCY OF THE SCENARIO...