STOCHASTIC VARIATIONAL INEQUALITIES: SINGLE-STAGE TO …rtr/papers/rtr238-MultiSVI.pdf ·...

Post on 29-Jun-2020

0 views 0 download

Transcript of STOCHASTIC VARIATIONAL INEQUALITIES: SINGLE-STAGE TO …rtr/papers/rtr238-MultiSVI.pdf ·...

STOCHASTIC VARIATIONAL INEQUALITIES:SINGLE-STAGE TO MULTISTAGE

R. Tyrrell Rockafellar,1 Roger J-B Wets2

Abstract

Variational inequality modeling, analysis and computations are important for many applications,but most of the subject has been developed in a deterministic setting with no uncertainty in aproblem’s data. In recent years research has proceeded on a track to incorporate stochasticity inone way or another. However, the main focus has been on a rather limited idea of what a stochasticvariational inequality might be.

Because variational inequalities are especially tuned to capturing conditions for optimality andequilibrium, stochastic variational inequalities ought to provide such service for problems of op-timization and equilibrium in a stochastic setting. Therefore they ought to be able to deal withmultistage decision processes involving actions that respond to increasing levels of information,which has so far hardly been the case. This can be accommodated by bringing in the tools ofnonanticipativity and its martingale dualization, as shown here.

Keywords: stochastic variational inequalities, multistage stochastic optimization, stochas-tic programming problems, stochastic equilibrium, recourse decisions, response rules, nonan-ticipativity, martingale duality, expectation derivatives, monotonicity

1 Introduction

Solving a “variational inequality” problem is a concept that originated in an infinite-dimensionalsetting with partial differential operators subjected to one-sided constraints inside domains or on theirboundaries. Independently the same concept, viewed as solving a “generalized equation,” arose in thefinite-dimensional context of optimality conditions in nonlinear programming. Special versions called“complementarity problems” gained attention even earlier on those lines. See e.g. [6], [8], [15].

The theory has since undergone enormous development from every angle, but almost entirely ina deterministic setting. More recently there have been efforts to extend it to models where randomvariables enter into the data. A challenge is to understand then what a “stochastic variatonal inequal-ity” might properly be in general. Efforts so far have mainly been directed at formulations proposedin a rather limited setting. Our purpose here is to tackle the subject more broadly and develop aformulation that can cover vastly more territory. Roughly speaking, previous approaches have their

1Department of Mathematics, University of Washington, Seattle, WA 98195-4350; E-mail: rtr@uw.edu.2Department of Mathematics, University of California-Davis, CA 95615; E-mail: rjbwets@ucdavis.edu This material

is based in part on work supported by the U.S. Army Research Office under grant W911NF-12-1-0273

1

motivations in “single-stage” modeling in optimization and equilibrium, at best, whereas our appoachwill also encompass “multistage” models that allow for responses to increasing levels of information.

Background in nonstochastic variational inequalities. Most simply, in a deterministic finite-dimensional framework to start with, a variational inequality condition (or generalized equation) is arelation of the form

−F (x) ∈ NC(x) (1.1)

in which F is a continuous single-valued mapping from IRn to IRn and NC(x) is the normal cone to anonempty closed convex set C ⊂ IRn at the point x ∈ C,

v ∈ NC(x) ⇐⇒ x ∈ C and 〈v, x′ − x〉 ≤ 0 for all x′ ∈ C. (1.2)

Here 〈·, ·〉 denotes inner product in IRn. The problem associated with the variational inequality (1.1)is to determine an x that satisfies it.

By putting −F (x) in place of v in (1.2), one arrives at the system of inequalities behind the“variational inequality” name for (1.1). On the other hand, by taking C = IRn one gets the reductionof (1.1) to the vector equation F (x) = 0 which underlies the name “generalized equation.”

Other instances of (1.1) relate to optimization models of various sorts and deserve a brief reviewbecause their importance in motivation. The elementary optimization case concerns the first-ordernecessary condition for minimizing a continuously differentiable function f(x) over C (cf. [26, 6.12]):

−∇f(x) ∈ NC(x). (1.3)

This obviously fits (1.1) with F being the gradient mapping ∇g. The Lagrangian optimization caseconcerns a continuously differentiable function L(y, z) on a product of closed convex sets Y and Z andtakes the form

−∇yL(y, z) ∈ NY (y), ∇zL(y, z) ∈ NZ(z). (1.4)

This constitutes a variational inequality through the rule of convex analysis that NY (y) × NZ(z) isthe normal cone to Y × Z at (y, z), so that (1.4) fits (1.1) with

x = (y, z), F (x) = (∇yL(y, z),−∇zL(y, z)), C = Y × Z. (1.5)

The Karush-Kuhn-Tucker conditions in nonlinear programming correspond to having Z be a specialconvex cone, but first-order optimality conditions expressed with multiplier vectors z likewise have thispattern far more generally; see [23], [26, 11.46–11.47]. A case of heirarchical optimization furnishesstill another example to keep in mind. For that, think of the Lagrangian function L in (1.4) asparameterized by a vector u under the control of an agent who can chose it from a closed convex setU and in that way influence the optimization problem faced by a secondary agent. In reaction to u,the secondary agent chooses y and z to satisfy the corresponding parameterized instance of (1.5). Ifthe primary agent, in choosing u ∈ U , wants to minimize an expression f(u, x, y) that depends alsoon the response of the secondary agent, we end up looking at conditions like

−∇uf(u, y, z) ∈ NU (u), −∇yL(u, y, z) ∈ NY (y), ∇zL(u, y, z) ∈ NZ(z), (1.6)

can be put together as the variational inequality (1.1) for

x = (u, y, z), F (x) = (∇uf(u, y, z),∇yL(y, z),−∇zL(y, z)), C = U × Y × Z, (1.7)

2

which stands for a kind of equilibrium.3 Other versions of equilibrium, of Nash type or not, involvingmore than just two agents, can similarly be set up as examples of (1.1). For example, this has beencarried out in [20] for a classical economic model of equilibrium in prices, supply and demand.

A solution x to (1.1) is well known to exist, for instance, when C is bounded. The set of allsolutions, if any, is always closed, and in the monotone case of (1.1), where F is monotone relative toC, meaning that

〈F (x′)− F (x), x′ − x〉 for all x, x′ ∈ C, (1.8)

it must also be convex. Monotone variational inequalities relate closely to convex optimization andits methodology as illustrated by (1.3) when f(x) is convex with respect to x ∈ C and by (1.4) whenL(y, z) is convex in y ∈ Y for each z ∈ Z, but concave in z ∈ Z for each y ∈ Y . For monotonevariational inequalities, other criteria than the boundedness of C are available for the existence ofsolutions, and the library of algorithms for solving the problem is much richer.

The set-valued normal cone mapping NC is the subdifferential mapping ∂δC associated with theindicator δC , which is a closed proper convex function. A natural generalization therefore is to replaceNC in (1.1) by the subdifferential mapping ∂f for any closed proper convex function f on IRn (whichgoes back to the earliest days of the subject) or perhaps any set-valued mapping T that, like such ∂f ,is maximal monotone. It would be possible even to take F to be set-valued. However, despite thegenuine interest in such generalizations and their eventual importance in applications, the fundamentalversion in (1.1) and its infinite-dimensional analogs will be our touchstone here.

Uncertainty and how to think about it. Extensions to allow for stochasticity in the formula-tion of a variational inequality can be portrayed as involving elements ξ of a space Ξ with probabilitystructure: a field A of subsets furnished with a probability measure P . Either F or C, or both, canthen be imagined as depending measurably4 on ξ as F (x, ξ) and C(ξ), but what exactly should bemade of that in problem formulation?

A key question is whether ξ is supposed to be known before, or only after, a decision about x isfinalized. If before, we are looking at a collection of separate variational inequality conditions, one foreach ξ, and can contemplate ξ-dependent solutions x(ξ) to them:

−F (x(ξ), ξ) ∈ NC(ξ)(x(ξ)) for ξ ∈ Ξ a.s. (1.9)

(where “a.s.” means “almost surely” with respect to the underlying probability measure). An expla-nation could be that we have a “random variational inequality” yielding a “random solution set” fromwhich random variables x(·) can be selected. One might be interested in the statistical properties ofthis random set in connection with “uncertainty qualification.”

On the other hand, if ξ comes afterward, only a single choice of x is available for coping with theuncertainty in F (x, ξ) and C(ξ). We are asking x to satisfy

−F (x, ξ) ∈ NC(ξ)(x) for ξ ∈ Ξ a.s. (1.10)

Is finding such an x a problem with a reasonable interpretation? In the first place, (1.10) demandshaving x ∈ C(ξ) a.s., i.e.,

x ∈ D = the “essential” intersection of the sets C(ξ). (1.11)

3Hierachical optimization, strictly speaking, would minimize f(u, x.y) subject to both u ∈ U and the associatedvariational inequality on y and z as a further constraint.

4For the measurability issues that arise in this paper, Chapter 14 our variational analysis book [26] provides thedefinitions and facts that will be needed. Another measurability reference is [2].

3

Even when the essential intersection is nonempty, though, there is little hope in the conditions in(1.10) being satisfied simultaneously by any x. The elementary optimization case makes this veryclear. When F (x, ξ) = ∇xg(x, ξ), we are demanding in (1.10) that x satisfy the first-order optimalityconditions for minimization of g(x, ξ) over C(ξ) simultaneously for every ξ (almost surely)! Ordinarily,no one looks for an x that solves two different optimization problems at the same time, much lessone that solves all problems in a large collection indexed by a parameter like the current ξ. In theLagrangian optimization case, one would be asking for not just the solution but also the Lagrangemultipliers in the problem to be the same for all ξ almost surely, which sounds even more far-fetched.

In response to that, one can fall back on merely determining an x that “approximately” solves eachof the conditions in (1.10) and is somehow “best” in doing that. This is idea behind the “expectedresidual minimization (ERM)” approach that many researchers have taken in minimizing with respectto x the expectation in ξ of an error expression relating x to a solution of the ξ-variational inequalityin (1.10); see [1], [3], [4], [5]. [9], [13], [14], [29], [30]. But understanding the meaning of what isaccomplished by such approximation, and its recommendability in applications, might not always beeasy, — aside from narrow circumstances in which the variations among the conditions are slightbecause the randomness comes only from minor “noise in the data.” Anyway, such an approximationproblem isn’t one of finding an x that solves a single variational inequality of some sort. Referring to(1.10) as somehow a “stochastic variational inequality” seems inaccurate, even though the efforts thathave gone into tackling it are commendable for opening up new techniques of computation.

Function space framework. To gain better footing on route to formulating, instead, a single truevariational inequality from F (x, ξ) and C(ξ), it helps to look at (1.9) from a function space perspectiveand take optimization as a guidepost. The function spaces traditionally associated with our probabilityspace (Ξ,A, P ) are the spaces Lpn = Lpn(Ξ,A, P ) for p ∈ [1,∞] of A-measurable functions from Ξ toIRn with their norms || · ||p. The pairing between functions x(·) in Lpn and functions v(·) in the usualcounterpart Lqn, with 1

p + 1q = 1 (but p =∞ for q = 1), has

〈x(·), v(·)〉 = Eξ[ 〈x(ξ), v(ξ)〉 ].

In this context we can think about (1.9) as revolving around the set

C = {x(·) |x(ξ) ∈ C(ξ) a.s. }, (1.12)

which, when articulated within some Lpn, is closed and convex. At an element x(·) of C, the normalcone NC(x(·) — with respect to the pairing between Lpn and its Lqn — consists by definition of thefunctions v(·) ∈ Lqn such that 〈v(·), x′(·)− x(·)〉 ≤ 0 for all x′(·) ∈ C. It is known5 to be the subset ofLqn given by

NC(x(·)) = { v(·) | v(ξ) ∈ NC(ξ)(x(ξ)) a.s. } (1.13)

From this point of view, (1.4) itself constitutes in fact a single variational inequality in a functionspace setting, namely

−F(x(·)) ∈ NC(x(·)) for F(x(·)) : ξ 7→ F (x(ξ), ξ). (1.14)

In the optimization case having F (x, ξ) = ∇xg(x, ξ), (1.9) moreover gives the first-order optimalitycondition for the problem of minimizing G(x(·)) = Eξ[g(x(ξ), ξ)] with respect to x(·) ∈ C.6

5See [22, Theorem 21] in particular. This is the case of applying that general result to the integral function given bythe indicators δC(ξ).

6Details about such “expectation functionals” G will be worked out in Section 6. The unconstrained minimization ofG reduces to separate minimization for each ξ by [26, 14.60].

4

For comparison, we can also try to place (1.10) in the function space setting. The set

C const = C ∩ {x(·) ∈ Lpn |x(·) ≡ constant } (1.15)

seems then to be the centerpiece. But (1.10) still concerns, through (1.13), the normal cone to C atan x(·) that happens to be a constant function. It doesn’t concern the normal cone to C const at suchan x(·): it isn’t, in parallel to (1.14), the variational inequality

−F(x(·)) ∈ NC const (x(·)) for F(x(·)) : ξ 7→ F (x(ξ), ξ). (1.16)

Furthermore, in the optimization case having F (x, ξ) = ∇xg(x, ξ) it doesn’t give the first-order opti-mality condition for minimizing G(x(·)) = Eξ[g(x(ξ), ξ)] with respect to x(·) ∈ C const .

Multiplier issue. The deficiency there is the lack of a Lagrange multiplier element for the con-stancy constraint in (1.15). From longstanding experience in stochastic optimization,7 that multipliershould be a function w(·) in Lqn with Eξ[w(ξ)] = 0. Perhaps attention should therefore be given to thecorresponding adjustment of (1.10) into the condition

∃w(·) with Eξ[w(ξ)] = 0 such that − F (x, ξ) + w(ξ) ∈ NC(ξ)(x) a.s. (1.17)

Although (1.17) isn’t directly a variational inequality in x(·), it corresponds to one on the pair(x(·), w(·)), as will be placed in a larger picture in the next section. The idea behind (1.17), withgenesis in convex analysis, is that the normal cone formula for C const ought to be

NC const (x(·)) = { v(·)− w(·) ∈ Lqn | v(ξ) ∈ NC(ξ)(x(ξ)) a.s., Eξ[w(ξ)] = 0 }. (1.18)

Validating that can run into technical difficulties, however. Lagrange multiplier rules typically needconstraint qualifications, and in an infinite-dimensional function space that raises thorny topologicalissues. We will face them squarely later.

It’s revealing to bring the essential intersectionD in (1.11) into the interpretation of (1.17). Becausex ∈ D ⊂ C(ξ), we have ND(x) ⊃ NC(ξ)(x), so that (1.17) implies

∃w(·) with Eξ[w(ξ)] = 0 such that − F (x, ξ) + w(ξ) ∈ ND(x) a.s. (1.19)

Obviously these two conditions are identical in the special case where C(ξ) doesn’t really depend onξ, i.e., C(ξ) ≡ D. That case is well understood in stochastic optimization as meaning there are noinduced constraints on x, which is the term used for constraints that depend on information still inthe future when x has to be selected and thus influence the feasibility of the decision process implicitlyrather than explicitly. Then we immediately do arrive at a standard variational inequality, namely

−F (x) ∈ ND(x) for F (x) = Eξ[F (x, ξ)]. (1.20)

Indeed, (1.19) and (1.20) are equivalent in view of the convexity of the cone ND(x).8 When all theuncertainty is in F (x, ξ) and none in C(ξ), i.e., C(ξ) ≡ D, exactly this kind of condition (1.20), ratherthan (1.10), has been deemed a “stochastic variational inequality” by another line of researchers intaking the “expected value (EV)” approach; see [10], [11], [12], [27], [28].

Our goal in this paper is to proceed to the formulation of multistage stochastic variational in-equalities built on (1.16) and (1.17) as single-stage models replacing (1.10) and the narrow model in

7cf. the theory in [24] and algorithmic developments in [25].8It is elementary that, for a convex set K, having Eξ[v(ξ)] ∈ K is equivalent to having v(ξ) +w(ξ) ∈ K a.s. for some

w(·) with Eξ[w(ξ)] = 0.

5

(1.2).9 An inkling of the advantages of this can already be gained from the hierarchical optimiza-tion/equilibrium case we looked at earlier. It would make sense that ξ becomes known after theprimary agent determines u, and the secondary agent can respond then with y(ξ) and z(ξ); thenx would have a first-stage component u and a second-stage components (y(ξ), z(ξ). Note that thedifferent stages not only correspond to different agents but also incorporate z(ξ) as a multiplier. Thisunderscores the great flexibility in modeling that goes along with allowing more complicated dynamics.

The analysis of the models we obtain will diverge between the finitely stochastic case and thegeneral stochastic case. For the general case we will need additional assumptions which enable us todraw on some basic results we obtained in a 1976 paper on multistage stochastic programming [24].

2 From single-stage to multistage

Our extension beyond (1.16) and (1.17) as the prototypes for single-stage stochastic variational in-equalities considers more than choosing a single x and afterward observing a single ξ. It incorporatesinstead an N -stage pattern

x1, ξ1, x2, ξ2, . . . , xN , ξN where xk ∈ IRnk , ξk ∈ IRνk (2.1)

in which xk is the decision to be taken at the kth stage and ξk is a random vector standing forinformation observed after that decision, but before the next. The previous x and ξ are replaced by

x = (x1, . . . , xN ) ∈ IRn = IRn1 × · · · × IRnN for n = n1 + · · ·+ nN ,ξ = (ξ1, . . . , ξN ) ∈ IRν = IRν1 × · · · × IRνN for ν = ν1 + · · ·+ νN ,

with (Ξ,A, P ) being the probability space induced by the random vector ξ.10 The nonempty closedconvex sets C(ξ), depending measurably on ξ, lie in this version of IRn, and the mapping F has thecomponent structure

F (x, ξ) = (F1(x, ξ), . . . , FN (x, ξ)) with Fk(x, ξ) ∈ IRnk , (2.2)

where each Fk(x, ξ) is continuous in x and measurable in ξ, yielding

F(x(·)) : ξ 7→ (F1(x(ξ), ξ), . . . , FN (x(ξ), ξ)). (2.3)

Nonanticipativity. A crucial aspect of the sequencing in (2.1) is that the choice of xk for k > 1will be allowed to be influenced by the observations made before it but not by the observations madeafter it. This is called nonanticipativity . A straightforward way of handling nonanticipativity isthrough response rules that express xk as a function of (ξ1, . . . , ξk−1) only:

x(ξ) = (x1, x2(ξ1), x3(ξ1, ξ2), . . . , xN (ξ1, ξ2, . . . , ξN−1)). (2.4)

9A two-stage precedent of sorts is present in [4] where the “approximate” x in the ERM approach is treated as afirst-stage decision compared to a ξ-dependent hindsight solution. This can still be viewed as an error minimizationmodel rather than solving an actual variational inequality.

10Thus, Ξ is the essential support ξ in IRν , A is the field of “events” generated by ξ, and P be the probability measureinduced on A by ξ. It would be perfectly possible, and in the end superior, to model stages of information in terms ofan abstract probability space and a “filtration” of its subsets of Ξ as is customary in probability theory. Here we havekept with a simpler “observation” approach in order to keep the focus on the main ideas.

6

Alternatively, nonanticipativity can be interpreted as a constraint imposed within a space of moregeneral functions x(·) : ξ 7→ (x1(ξ), x2(ξ), . . . , xN (ξ)) by restricting them to the nonanticipativitysubsubspace

N = {x(·) = (x1(·), . . . , xN (·)) |xk(ξ) does not depend on ξk, . . . , ξN } (2.5)

Basic constraints. As earlier, we further constrain decisions by requiring x(ξ) ∈ C(ξ) for ameasurable mapping C : Ξ →→ IRn such that C(ξ) is closed, convex, and nonempty a.s. Thus, alongwith the nonanticipativity constraint x(·) ∈ N we will want x(·) ∈ C with C as in (1.12). We speak ofthis as a basic stochastic constraint, making room that way for other, “more advanced” constraints,perhaps involving risks or expectations, which could replace C by some smaller set K.

A very special case of C that deserves to be kept in mind, especially in the context of its laterreplacement maybe by some K, is the one in which

C(ξ) = D1 ×D2(ξ1)×D3(ξ1, ξ2)× · · · ×DN (ξ1, ξ2, . . . , ξN−1) (2.6)

because then

x(·) ∈ C ∩ N ⇐⇒ x(ξ) has form (2.4) withx1 ∈ D1, x2(ξ1) ∈ D2(ξ1), . . . . . . , xN (ξ1, . . . , ξN−1) ∈ DN (ξ1, . . . , ξN−1).

(2.7)

We will speak of (2.6) as the case of simple basic constraints.

Restriction to bounded response. For technical reasons soon to be explained, the focus willhenceforth be on response functions x(·) that belong to L∞n rather than some other (larger) Lpn space,and therefore on

N ⊂ L∞n and C ⊂ L∞n .

There’s a common situation, of course, in which this restriction doesn’t matter: namely the case when(Ξ,A, P ) is a discrete probability space with Ξ being a finite set and A consisting of all subsets of Ξ(so that every function x(·) : Ξ → IRn is measurable), and each ξ having positive probability. We’llrefer to it for short as the finitely stochastic case. Then the spaces Lpn are the same finite-dimensionalfunction space, except for having different but equivalent norms, and the notation for that space canbe simplified to Ln (which might as well be taken as L2n).

With L∞n as our Lpn, the corresponding paired space Lqn is L1n. The normal cones in (1.13) are thennaturally in L∞n . However, because L1n can’t generally be identified with the dual Banach space L∞n ∗,unless we are in the finitely stochastic case just described, there will be some delicate issues to dealwith in treating the normal cones NC(x(·)) this way.

In this setting we want the mapping F to go from L∞n to L1n, so the function F(x(·)) in (2.3) hasto be summable for any x(·) ∈ L∞n . Our assumption that F (x, ξ) is continuous in x and measurablein ξ ensures that F (x(ξ), ξ) is measurable in ξ,11 but for the summability we need more. In terms of

IB = the closed unit ball of IRn with the Euclidean norm || · ||, (2.8)

we will impose the “uniform summability condition” that

∀α > 0 ∃β(·) ∈ L11 such that x ∈ αIB =⇒ F (x, ξ) ∈ β(ξ)IB a.s.. (2.9)

11see [26, 14.28+14.29].

7

Then F is not only well defined but also continuous as a mapping from L∞n into L1n.12

Martingale duality. In our framework of L∞n paired with L1n a crucial role, alongside of thenonanticipativity subspace N of L∞n , will be played by the martingale subspace M of L1n, defined by

M = {w(·) = (w1(·), . . . , wN (·)) |Eξk,...,ξN [wk(ξ1, . . . , ξk−1, ξk, . . . , ξN )] = 0 }, (2.10)

where the expectation is the conditional expectation knowing the initial components ξ1, . . . , ξk−1 ofξ = (ξ1, . . . , ξk−1, ξk, . . . , ξN ). It’s evident that

M = {w(·) ∈ L1n | 〈x(·), w(·)〉 = 0 for all x(·) ∈ N },N = {x(·) ∈ L∞n | 〈x(·), w(·)〉 = 0 for all w(·) ∈M}. (2.11)

In other words, N and M are “orthogonally” complementary to each other in the pairing betweenL∞n and L1n. Inasmuch as N , as a closed convex set, is actually a subspace, the normal cones to Nare given then by

NN (x(·)) =M for every x(·) ∈ N (2.12)

(whereas NN (x(·)) = ∅ when x(·) /∈ N ). Obviously,

N = 1 =⇒ N = {x1(·) |x1(ξ) ≡ const }, M = {w1(·) |Eξ[w1(ξ)] = 0 }, (2.13)

and that is how the current set-up ties in with the one in our introduction.

Stochastic variational inequalities: basic and extensive. In terms of N , C and F as above,we define the stochastic variational inequality in basic form to be the condition on x(·) that

−F(x(·)) ∈ NC∩N (x(·)), (2.14)

which entails x(·) ∈ C ∩ N in particular. The corresponding stochastic variational inequality inextensive form is the condition on x(·) that

x(·) ∈ N and there exists w(·) ∈M such that−F(x(·)) + w(·) ∈ NC(x(·)), i.e., − F (x(ξ), ξ) + w(ξ) ∈ NC(ξ)(x(ξ)) a.s.

(2.15)

The alternative expression at the end of (2.15) comes from the normal cone formula in (1.13).Obviously (2.14) is a variational inequality like (1.1), except for its function-space setting in L∞n withthe mapping F and the convex set C∩N , but (2.15) might seem not to pass muster as a true variationalinequality. It can be expressed however in primal-dual form as one on the pair (x(·), w(·)) in L∞n ×L1n:

−Φ(x(·)), w(·)) ∈ NC×M(x(·), w(·)) for Φ(x(·)), w(·) = (F(x(·)),−x(·)). (2.16)

This is based on the fact of convex analysis that, in the pairing of L∞n × L1n with L1n × L∞n , normalcones to C ×M decompose to products of normal cones to C and M,

NC×M(x(·), w(·)) = NC(x(·))×NM(w(·)), (2.17)

12The continuity follows through (2.8) from the Lebesgue dominated convergence theorem because, if a sequence{x(·)}∞ν=1 converges in L∞n to x(·), all these elements belong to some bounded B with the corresponding functionsF (xν(·), ·) in L1

n converging pointwise a.s. to F (x(·), ·) while their vector components are uniformily dominated in sizeby some summable function.

8

while the “orthogonality” between N and M in (2.11) implies

NM(w(·)) = N for every w(·) ∈M. (2.18)

Clarifying the exact relationship between the basic and extensive forms of a stochastic variationalinequality associated with the same data elements C and F will be our task in the next section. Itrevolves around w(·) being interpreted as a Lagrange multiplier for the nonanticipativity constrainton x(·). Requiring that x(ξ) belong to the subspace N is a “convex” constraint added to the under-lying “convex” constraint x(·) ∈ C. In optimization problems with convex constraints, a Lagrangemultiplier condition is typically sufficient for optimality but often not necessary without a “constraintqualification.” Additional assumptions of that sort will therefore need to come in.

Stochastic variational inequalities more generally. A “stochastic variational inequality”should ultimately be a condition of the form

−F(x(·)) ∈ NK∩N (x(·)) for some convex set K ⊂ C, (2.19)

either directly or as elaborated in its expression, say, by Lagrange multiplier elements. The roleof the basic form and its partner in extensive form is to provide a stripped-down target to whichsuch more general forms of stochastic variational inequalities may be reduced, e.g., for purposes ofcomputing solutions. Examples with K specified by additional expectation constraints will be discussedin Section 6. Another feature that could of course be relaxed, without stopping (2.19) from being calleda stochastic variational inequality, is the special form of F(x(·)) as comprised of separate elementsF (x(ξ), ξ) for each ξ.

Simple stochastic variational inequalities. When the basic constraints have the separatedstructure in (2.6), leading to (2.7), the corresponding stochastic variational inequality in extensiveform as in (2.15) turns into the following condition on x(·):

x(·) has form (2.4), and a.s.

−Eξ1,ξ2,...,ξN [F1(x(ξ), ξ)] ∈ ND1

(x1),

−Eξ2,...,ξN [F2(x(ξ), ξ)] ∈ ND2(ξ1)(x2(ξ1)),...

−EξN [FN (x(ξ), ξ)] ∈ NDN (ξ1,...,ξN−1)(xN (ξ1, . . . , ξN−1)).

(2.20)

This re-applies, to the multistage setting, the observation made about the equivalence of (1.19) and(1.20) in the single-stage case.

Extension to models with terminal response. It is possible and in many situations desirableto allow in (2.1) for a final decision xN+1 ∈ IRnN+1 which is able to respond to the final informationinput from ξN . From a mathematical standpoint, of course, this can equally well be regarded asalready implicit in the model above as the case in which ξN is “trivialized,” and therefore wouldn’tneed to be explored for the sake of new results Nonetheless, it is helpful for examples considered laterin this paper, to have explicit notation for this N +1 extension, in which both C(ξ) and F (x, ξ) wouldacquire an N + 1 component. Since the point is to allow xN+1(ξ) to depend on all of ξ = (ξ1, . . . , ξN ),The corresponding augmentation of N and M in (2.5) and (2.10) is to

N+ = {x+(·) = (x1(·), . . . , xN (·), xN+1(·)) | (x1(·), . . . , xN (·)) ∈ N , xN+1(·) ∈ L∞nN+1},

M+ = {w+(·) = (w1(·), . . . , wN (·), wN+1(·)) | (w1(·), . . . , wN (·) ∈M, wN+1(ξ) ≡ 0 }.(2.21)

Then in the “simple” case of (2.20), for instance, the new N + 1 condition would have no expectationbut just ask that −FN+1(x

+(ξ), ξ) ∈ NDN+1(ξ)(xN+1(ξ)) a.s.

9

Two-stage case. Because of its widespread importance in applications, the case of x1, ξ, x2(ξ)(response after a single observation ξ = ξ1) is worth looking at in detail. There, having

x(·) = (x1(·), x2(·)) ∈ N+ ⇐⇒ x1(·) ≡ const ∈ IRn1 , x2(·) ∈ L∞n2,

w(·) = (w1(·), w2(·)) ∈M+ ⇐⇒ w1(·) ∈ L1n1, Eξ[w1(ξ)] = 0, w2(ξ) ≡ 0,

(2.22)

the stochastic variational inequality in extensive form comes out as

−(F1(x1, x2(ξ), ξ), F2(x1, x2(ξ), ξ)) + (w1(ξ), 0) ∈ NC(ξ)(x1, x2(ξ)) a.s. with Eξ[w1(ξ)] = 0. (2.23)

When C(ξ) = D1 ×D2(ξ) this reduces to

−Eξ[F1(x1, x2(ξ), ξ) ] ∈ ND1(x1), −F2(x1, x2(ξ), ξ)) ∈ ND2(ξ)(x2(ξ)) a.s. (2.24)

3 Relationship between the basic and extensive forms

Establishing an implication from the condition in (2.14) to the condition in (2.15) is easy, so we takeit up first.

Sufficiency Theorem. If x(·), with w(·), solves the stochastic variational inequality problem in itsextensive form in (2.15), then x(·) solves it in its basic form in (2.14).

Proof. We have to show for x(·) and w(·) as in (2.12) that −F(x(·)) belongs to the normal cone toC ∩ N at x(·), or in other words that

〈−F(x(·), x′(·)− x(·)〉 ≤ 0 for every x′(·) ∈ C ∩ N . (3.1)

Here 〈−F(x(·)), x′(·)− x(·)〉 = Eξ[ 〈−F(x(ξ), ξ), x′(ξ)− x(ξ)〉 ]. We know from having −F (x(ξ), ξ) +w(ξ) ∈ NC(x(ξ)) in (1.17) that

〈−F (x(ξ), ξ) + w(ξ), x′(ξ)− x(ξ)〉 ≤ 0 when x′(ξ) ∈ C(ξ),

so that〈−F(x(·)) + w(·), x′(·)− x(·)〉 when x′(·) ∈ C,

but also from the “orthogonality” in (2.11) that

〈w(·), x′(·)− x(·)〉 = 0 for x(·), x′(·) ∈ N and w(·) ∈M.

Putting these facts together gives us (3.1), as desired.

In addressing next the question of necessity, there is a major difference between the finitely stochas-tic case, where L∞n and L1n reduce to a finite-dimensional linear space denoted by Ln, and the generalstochastic case where the complications of infinite-dimensional topology and duality are inescapable.In Ln we’ll be able to get away with a condition about the relative interiors of the convex sets C(ξ) inposing a constraint qualification, but later it will need to be the full interiors. We’ll use the notation

int = interior, rint = relative interior.

Recall that the “finitely stochastic case” refers here to the probability space (Ξ,A, P ) being discreteand finite (with A consisting of every subset of Ξ and every ξ having positive probability).

10

Necessity Theorem 1 (finitely stochastic case). In the finitely stochastic case, suppose as a con-straint qualification that

∃x(·) ∈ N such that x(ξ) ∈ rintC(ξ) for all ξ ∈ Ξ. (3.2)

Then if x(·) solves the stochastic variational inequality problem in its basic form (2.14), it must also,for some w(·), solve the problem in its extensive form (2.15).

Proof. In (2.14) we are concerned with the intersection of C and N as closed convex subsets of Ln,which is finite-dimensional, and can invoke a rule of convex analysis (see [17, 23.8.1]) saying that

rint C ∩ rintN 6= ∅ =⇒ NC∩N (x(·)) = NC(x(·)) +NN (x(·)). (3.3)

Another rule tells us, on the basis of C being in essence the product of the sets C(ξ), that rint C consistsof the elements x(·) of Ln having x(ξ) ∈ C(ξ) for all ξ. On the other hand, because N is a subspace, wehave rintN = N and, as already noted, NN (x(·)) =M for all x(·) ∈ N . The constraint qualification(3.2) therefore amounts to assuming the existence of some x(·) ∈ rint C ∩ rintN and ensures through(3.3) that NC∩N (x(·)) = NC(x(·)) +M for any x(·) ∈ C ∩ N . Having −F(x(·)) ∈ NC∩N (x(·)) as in(2.14) is equivalent then to the existence of w(·) ∈M such that −F(x(·))−w(·) ∈ NC(x(·)). Appealingto the formula for NC(x(·)) in (1.13) and reversing the sign of w(·), we arrive at (2.15). (Our choice ofsign in (2.15) lines up with the different way that w(·) will be identified in the more general theoremcoming next.)

The proof of necessity just presented relies on a calculus rule for normal cones to the intersection oftwo convex sets, but that rule only works in a finite-dimensional space. In replacing Ln by an infinite-dimensional L∞n , two obstacles arise. The first is that relative interiors of convex sets no longer serveas an effective tool; true interiors have to be brought in instead. The second is that calculus rulesin in L∞n ordinarily utilize the pairing between L∞n and the dual space Banach space L∞n ∗, not thepairing between L∞n and L1n. Although L1n has a natural embedding as a subspace of L∞n ∗, the latteris larger.13 Fortunately there is a way of getting around this second, more serious, difficulty in ourstochastic framework. We developed it in our 1976 paper [24] and will now be able to take advantageof the technique in a special way.

Working with a concept of nonanticipativity for the constraint x(ξ) ∈ C(ξ) will be the key. Thecircumstances in our introduction, where we looked in (1.11) at the essential intersection D of thesets C(ξ), will strongly be reflected. It was observed there that in requiring a fixed vector x to belongto C(ξ) almost surely, we are requiring in particular that x belong to D. In stochastic optimizationthe latter is called the constraint induced by the former; it contains all the information relevant inthe choice of x if that choice has to be made before ξ is known. The normal cone ND(x) is naturallyimportant for dealing with this induced constraint, but there is trouble in using it because it mightnot be representable easily, or fully, in terms of the normal cones NC(ξ)(x), which correspond to thegiven data elements C(ξ). The issue goes away when the sets C(ξ) are all the same, equaling theintersection D, or it can be sidestepped simply by replacing C(ξ) by D in the model. That need notbe seen as thowing away information but rather as putting the question of a formula for the normalsto D on a shelf for later study.

Moving in that direction, we will concentrate now on a property of the constraint mapping C :ξ → C(ξ) that in the single-stage case corresponds to having all sets C(ξ) be the same, but in themultistage case is more complicated in relating to the stage structure of x and ξ in (2.1). This propertywas introduced in our paper [24].

13An element of L∞n ∗ can be decomposed into an element of L1n plus a “singular” element. See [19] for more on this.

11

Definition (basic constraint nonanticipativity). The constraint mapping C is nonanticipative if foreach stage k the projection of C(ξ) ⊂ IRn1×· · ·×IRnk×IRnk+1×· · · IRnN into IRn1×· · ·×IRnk , namely

Ck(ξ) = { (x1, . . . , xk) | ∃(xk+1, . . . , xN ) with (x1, . . . , xk, xk+1, . . . , xN ) ∈ C(ξ) }, (3.4)

hasCk(ξ) = Ck(ξ1, . . . , ξk−1) independent of (ξk, . . . , ξN ), (3.5)

so that the sequence of constraint projections conforms to the pattern

C1, C2(ξ1), C3(ξ1, ξ2), . . . , CN (ξ1, ξ2, . . . , ξN−1). (3.6)

The principle here is making sure that when xk(ξ1, · · · , ξk−1) has to be selected as belonging toC(ξ), that set is fully known on the basis of knowing (ξ1, · · · , ξk−1).

Another constraint property we will need is the following, in which the closed unit ball IB of IRn

helps in the statement.

Definition (constraint boundedness). The constraint mapping C is essentially bounded if

∃ρ > 0 such that C(ξ) ⊂ ρIB a.s. (3.7)

Under constraint boundedness, the set C ⊂ L∞n is not only closed and convex, as we knew before,but also bounded. Moreover it is sure to be nonempty . This follows because the measurability of Cand the nonemptiness of the sets C(ξ) imply the existence of a measurable selection x(ξ) ∈ C(ξ) (see[26, 14.6]), which in this case will be bounded in ξ and therefore belong to L∞n .

It should be noted that, in the finitely stochastic case, having the constraint mapping C be essen-tially bounded just means that the sets C(ξ) are all bounded subsets of IRn.

Necessity Theorem 2 (general stochastic case). Let the constraint mapping C be nonanticipativeand essentially bounded, and as a constraint qualification suppose that

∃x(·) ∈ N and ε > 0 such that x(ξ) + εIB ∈ intC(ξ) a.s. (3.7)

Then if x(·) solves the stochastic variational inequality problem in its basic form (2.14), it must also,for some w(·), solve the problem in its extensive form (2.15).

Proof. The normal cone condition in (2.15) requires by definition that

〈−F(x(·)), x′(·)− x(·)〉 ≤ 0 for all x′(·) ∈ C ∩ N . (3.8)

We can rewrite this advantageously for our purposes by defining

f(y, ξ) = 〈F (x(ξ), ξ), y〉+ δC(ξ)(y) for y ∈ IRn, (3.9)

where δC(ξ) is the indicator function having values 0 on C(ξ) but ∞ outside. Then f is a so-callednormal convex integrand on IRn × Ξ, due to the measurability assumption on C along with themeasurability and uniform summability assumption on F (see [26, 14.32,14.44(d)]). We can pass nextto the corresponding convex integral functional

If : y(·) ∈ L∞n 7→ E[f(y(ξ), ξ)] (3.10)

having essential domain dom If = { y(·) | If (y(·) <∞} = C and interpret (3.8) as saying that

x(·) ∈ argminx′(·)∈N

If (x′(·)). (3.11)

12

This is a situation to which a result of ours in [24, Theorem 2 & Corollary] can be applied, becauseour assumptions of constraint nonanticipativity and essencial boundedness match the ones there.According to that result, (3.11) implies the existence of w(·) ∈ M (within L1n instead of perhapsthe larger space L∞n ∗) such that w(·) belongs to the subgradient set ∂If (x(·)). That subgradient setconsists however of all v(·) such that v(ξ) ∈ ∂yf(x(ξ), ξ) (see [21, p. 227]). On the other hand, throughthe construction of f and the constraint qualification in (3.7), we have (almost surely)

∂yf(x(ξ), ξ) = F (x(ξ), ξ) +NC(ξ)(x(ξ))

(by invoking the subgradient calculus rule in [17, 23.8]). Hence having w(·) belong to ∂If (x(·))corresponds to the existence of v(·) with v(ξ) ∈ NC(ξ)(x(ξ)) such that w(ξ) − F (x(ξ), ξ) = v(ξ) a.s.Altogether, then, (3.11) implies (2.15) as targeted.

4 Existence and monotonicity

Constraint boundedness also has a role in establishing whether a stochastic variational inequality hasa solution.

Existence Theorem 1 (finitely stochastic case). In the finitely stochastic case, if C ∩ N 6= ∅ andthe sets C(ξ) are bounded, then the stochastic variational inequality in basic form has a solution x(·).The stochastic variational inequality in extensive form is sure to have a solution as well if in additionthe constraint qualification (3.2) is satisfied.

Proof. This falls back on the fact that in the finitely stochastic case the set C is a closed convexsubset of the finite-dimensional space Ln and is bounded when the sets C(ξ) are bounded. Then C,and consequently C ∩ N , are compact. Consider now the mapping S : C ∩ N → C ∩ N defined byletting S(x(·)) be the projection of x(·)−F(x(·)) onto C ∩N . That mapping S is continuous, and itsfixed points are the solutions to (2.14). A continuous mapping from a compact convex set into itselfalways has a fixed point, so we conclude that a solution to (2.11) exists. Necessity Theorem 1 leadsthen to the existence of a solution to (2.15) as indicated.

In the general stochastic case, a solution to the problem in extensive form is likewise assured onthe basis of the conditions in Necessity Theorem 2, as long as a solution to the problem in basic formis at hand. For that, however, a simple result is so far lacking. It is true that the assumption ofconstraint boundedness, along with the closedness and convexity of the sets C(ξ), makes C ∩ N , bea closed bounded convex subset of L∞n . Such a set is known to be compact with respect to the weaktopology induced on L∞n by its pairing with L1n. But the mapping S in the proceeding proof won’tgenerally be continuous in that topology, so the same fixed point argument doesn’t work.

An existence result for the general stochastic case can be obtained nevertheless in the importantmonotone case, where

〈F (x′, ξ)− F (x, ξ), x′ − x〉 ≥ 0 for all x, x′ ∈ C(ξ) a.s. (4.1)

Existence Theorem 2 (under monotonicity). In the general stochastic case under the monotonicitycondition (4.1), suppose the basic constraint mapping C is essentially bounded and the constraintqualification (3.7) is satisfied. Then both the stochastic variational inequlity in basic form and theone in extensive form have solutions.

Proof. In this situation the set-valued mapping T (ξ) : ξ 7→ F (x, ξ) +NC(ξ)(x) is maximal monotonefrom IRn to IRn (cf. [18, Theorem 3]). Then, under the assumptions in Existence Theorem 1, the

13

set-valued mapping

T : x(·) 7→ { (z(·)− w(·) |w(·) ∈M, z(ξ) ∈ T (x(ξ), ξ) a.s. } (4.2)

is maximal monotone from L∞n to L1n (by [18, Theorem 1]). Its (set-valued) inverse T −1 is maximalmonotone then from L1n to L∞n and moreover has bounded range, equal namely to dom T ⊂ C. Inthat case dom T −1 has to be all of L1n (by [16, Corollary 2.2]) and hence must contain 0. Thus, thereexists x(·) ∈ T −1(0).

Through the definition ot T , that refers to the existence of w(·) which, together with x(·), solvesthe variational inequality (2.15) in extensive form. In that case we know from the Sufficiency Theoremthat x(·) solves the variational inequality (2.14) in basic form.

This result not only furnishes existence but also discloses in its proof a platform for solving mul-tistage stochastic variational inequalities by algorithms based on monotonicity.

Monotonicity can also help in the finitely stochastic case by allowing constraint boundedness tobe weakend to some kind of coercivity; see [26, 12.51–12.52].

5 Lagrangian elaboration of basic constraints

We next look at what can be gained when the basic constraint mapping C is specified by a system offunction constraints. Specifically, we suppose that

x ∈ C(ξ) ⇐⇒ x ∈ D(ξ) and fi(x, ξ)

{≤ 0 for i = 1, . . . , r,= 0 for i = r + 1, . . . ,m,

(5.1)

where D(ξ) is a nonempty closed convex set depending measurably on ξ, fi(x, ξ) is measurable in ξ,differentiable and convex in x for i = 1, . . . , r, but affine in x for i = r+ 1, . . . ,m. Then, in particular,C(ξ) is a closed convex set depending measurably on ξ (by way of [26, 14.36]).

Our aim is to obtain, and apply, a formula for the normal cones to the basic constraint set C ⊂ L∞nin terms of Lagrange multipliers for the conditions in (5.1). With ξ and x fixed for the moment, thosemultipliers should form a vector

y = (y1, . . . , ym) ∈ Y = [0,∞)r × (−∞,∞)m−r (5.2)

which, in coordination with x in (5.1), satisfies

yi

{≥ 0 for i ≤ r having fi(x, ξ) = 0,= 0 for i ≤ r having fi(x, ξ) < 0,

or equivalently together with (5.1):

f(x, ξ) ∈ NY (y) for f(x, ξ) = (f1(x, ξ), . . . , fm(x, ξ)). (5.3)

Multiplier Theorem 1 (for basic constraints). With respect to the constraint system (5.1) as given,the additional assumption that fi(x, ξ) is essentially bounded in ξ guarantees that

f(x(ξ), ξ) and ∇xf(x(ξ), ξ) are essentially bounded in ξ when x(·) ∈ L∞n . (5.4)

14

The normal cone NC(x(·)) ⊂ L1n at an x(·) ∈ C then contains all v(·) having, in the notation above,the representation

∃ y(·) ∈ L1m, z(·) ∈ L1n, such that a.s.

{v(ξ) =

∑m

i=1yi(ξ)∇xfi(x(ξ), ξ) + z(ξ) with

f(x(ξ), ξ) ∈ NY (y(ξ)), z(ξ) ∈ ND(ξ)(x(ξ))(5.5)

(in utilizing the fact that NY (y(ξ)) 6= ∅ entails y(ξ) ∈ Y ). In the finitely stochastic case, this furnishesa complete description of NC(x(·)) under the constraint qualification that

∃ x(·) such that, for all ξ ∈ Ξ, x(ξ) ∈ rintD(ξ) and fi(x(ξ), ξ)

{< 0 for i ≤ r,= 0 for i > r.

(5.6)

In the general stochastic case, it furnishes a complete description of NC(x(·)) if m = r, i.e., there areno equation constraints, and

∃ x(·) ∈ C, ε > 0, such that a.s. x(ξ) ∈ D(ξ) and fi(x(ξ), ξ) < −ε for i = 1, . . . ,m. (5.7)

Here if actually v(·) ∈ L∞n , then necessarily y(·) ∈ L∞m .

Proof. In view of the breakdown in (1.13), determining the elements of NC(x(·)) comes down to deter-mining the elements of NC(ξ)(x(ξ)) in IRn for individual ξ and making sure — for the general stochasticcase — that the results of that calculus can hang together as measurable functions of ξ belonging tothe right function spaces. (In the finitely stochastic case, all functions are measurable, bounded, andso forth.) The Lagrange multiplier representation in (5.5) for elements v(ξ) of NC(ξ)(x(ξ)) derives fromwell known rules for the calculus of normal cones to convex sets in IRn, for instance in [17, Section 23];in that context constraint qualifications can rely on relative interiors of convex sets as in (5.6) andalso implicitly in (5.7). The rules for measurability in [26, Chapter 14] ensure that y(ξ) and z(ξ) canbe selected so as to depend measurably on ξ. All this is standard.

The only feature requiring attention in detail is verifying that y(·) can be taken to be in L1m andz(·) in L1n. It is for this reason that (5.4) is included in the theorem and will need to be confirmed.Clearly if y(·) and z(·) are indeed summable, then, because of the boundedness in (5.4), the expressionfor v(·) in (5.5) implies that v(·) is summable, as demanded in our framework of placing the normalcone NC(x(·)) in L1n. In the other direction, starting with v(·) summable, if y(·) is summable, then,through (5.4), z(·) will be summable. Thus, everything hinges on establishing (5.4) and showing,under the constraint qualification (5.7) that (5.4) forces y(·) to be summable.

In dealing with the claim in (5.4), we start by recalling that the convexity of fi(x, ξ) in x impliesfor the any a ∈ IRn that

fi(x+ a, ξ)− fi(x, ξ) ≥ 〈∇xfi(x, ξ), a〉 ≥ fi(x, ξ)− f(x− a, ξ).

In taking a = ±ej for the “unit vectors” ej of IRn (having 1 for the jth coordinate and 0 for theothers), we get

|〈∇xfi(x, ξ), ej〉| ≤ max{fi(x+ a, ξ), fi(x− a, ξ)} − fi(x, ξ) for j = 1, . . . , n (5.8)

and therefore through the essential boundedness of fi in ξ the essential boundedness of ∇xfi(x, ξ) inξ for any x. Consider now for any ρ > 0 the vectors ±ρej which have, as their convex hull, the “unitcube of radius ρ.” For any x in this cube we have14

fi(x, ξ) ≤ maxj=1,...,n

{fi(ρej , ξ), fi(−ρej , ξ)}

14A convex function on a polyhedral set attains its max at one of the vertices of that set.

15

and on the other hand, from the subgradient inequality,

fi(x, ξ) ≥ fi(ρej , ξ) + 〈∇xf(ρej , ξ), x− ρej〉

where ||x− ρej || ≤ ρmaxk 6=j{ || ± ek − ej || }. This shows, via the essential boundedness of fi(±ρej , ξ)and ∇xf(ρej , ξ), already demonstrated, that there is an upper bound b such that |f(x, ξ)| ≤ b a.s.uniformly for x in the cube in question. That tells us that for any x(·) ∈ L∞n we have f(x(ξ), ξ)essentially bounded. Repeating then the argument that arrived at (5.8), but this time with x(ξ) inplace of x, we arrive at the essential boundedness of ∇xf(x(ξ), ξ) in ξ.

We come back now, in the case r = m, i.e., inequality constraints only, to confirming that thesummability of y(·) is a consequence of (5.4) and (5.7) when the formula (5.5) concerns a summablev(·). In this case the relation f(x(ξ), ξ) ∈ NY (y(ξ)) in (5.5) reduces of course to

yi(ξ) ≥ 0, fi(x(ξ), ξ) ≥ 0, yi(ξ)fi(x(ξ), ξ) = 0 for i = 1, . . . ,m. (5.9)

We can make use of this through the subgradient inequalities

fi(x(ξ), ξ) + 〈∇xfi(x(ξ), ξ), x(ξ)− x(ξ)〉 ≤ fi(x(ξ), ξ).

Multiplying each by yi(ξ) and adding them up, while recalling from (5.7) that fi(x(ξ), ξ) < −ε, we get∑m

i=1yi(ξ)fi(x(ξ), ξ) +

⟨∑m

i=1yi(ξ)∇xfi(x(ξ), ξ), x(ξ)− x(ξ)

⟩≤ −ε

∑m

i=1yi(ξ).

Here the first sum vanishes according to (5.9), while∑mi=1 yi(ξ)∇xfi(x(ξ), ξ) = v(ξ) − z(ξ) in (5.5).

Thereforeε∑m

i=1yi(ξ) ≤ −〈v(ξ), x(ξ)− x(ξ)〉+ 〈z(ξ), x(ξ)− x(ξ)〉, (5.10)

where however the final term is ≤ 0, because x(ξ) ∈ D(ξ) and z(ξ) ∈ ND(ξ)(x(ξ)), and therefore can bedropped. Since v(·) is summable while x(ξ)−x(ξ) is essentially bounded, we see from (5.10) (withoutits final term) that

∑mi=1 yi(ξ) is bounded from above by a summable function of ξ. In combination

with yi(ξ) ≥ 0, this tells us that y(·) is summable, as claimed. If v(·) were essentially bounded, thissame argument would tell us that y(·) is essentially bounded.

Stochastic variational inequalities of Lagrangian basic form. When the representation(5.5) for NC(x(·)) is substituted into the condition in the stochastic variational inequality (2.15) inextensive form, the resulting relation, involving and y(·) ∈ L1m and z(·) ∈ L1n is

−F (x(ξ), ξ) + w(ξ) =∑m

i=1yi(ξ)∇xfi(x(ξ), ξ) + z(ξ)

with f(x(ξ), ξ) ∈ NY (y(ξ)), z(ξ) ∈ ND(ξ)(x(ξ)).(5.11)

Utilizing the boundedness in (5.4) we can thereby pass from (2.15) to a condition jointly on x(·) andy(·) instead of just x(·), which we call the associated stochastic variational inequality in Lagrangianbasic form:

x(·) ∈ N , y(·) ∈ L1m, and ∃w(·) ∈M such that

−(F (x(ξ), ξ) +

∑s

i=1yi(ξ)∇xfi(x(ξ), ξ),−f(x(ξ), ξ)

)+ (w(ξ), 0) ∈ ND(ξ)×Y (x(ξ), y(ξ)).

(5.12)

This doesn’t quite fit our chosen framework in the general stochastic case because that would requirey(·) ∈ L∞n . In the finitely stochastic case, that isn’t an issue, and there might be other situations aswell in which the constraints naturally lead to y(ξ) being essentially bounded in ξ.

16

Interpretation as extensive form with terminal response. The really interesting thing aboutthis representation is that (5.12) actually constitues a stochastic variational inequality of extensive formin x+(·) = (x(·), y(·)), with y(·) interpreted as a final response xN+1(ξ) after the observation of ξN (atleast if we are in position to take y(·) in L∞n instead of L1n, as certainly holds in the finitely stochasticcase). This follows the pattern for terminal response explained near the end of Section 2 with N+ andM+ as in (2.21). In terms of

D+ = {x+(·) = (x(·), y(·)) |x+(ξ) ∈ D+(ξ) a.s. } with D+(ξ) = D(ξ)× Y,F+(x+(ξ), ξ) =

(F (x(ξ), ξ) +

∑s

i=1yi(ξ)∇xfi(x(ξ), ξ),−f(x(ξ), ξ)

),

(5.13)

we can express (5.12) as

x+(·) ∈ N+ and ∃w(·)+ ∈M+ such that − F+(x+(ξ), ξ) + w+(ξ) ∈ D+(ξ) a.s. (5.14)

This is clearly again a stochastic variational inequality in extensive form for which the correspondingstochastic variational inequality in basic form is

−F+(x+(·)) ∈ ND+∩N+(x+(·)). (5.15)

That provides confirmation of our underlying idea that the basic and extensive forms can serve asmodels to which more complicated stochastic variational inequalities can be reduced.

6 Some examples with expectation functionals and constraints

As explained in our introduction, stochastic variational inequalities ought to be broad enough inconcept to assist in characterizing solutions to problems of stochastic optimization or equilibrium,even when initial decisions can be followed by recourse decisions in later stages. We’ll illustrate herehow our formulation achieves that coverage in a fundamental setting.

The examples to be presented will involve expectation functionals G : L∞n → IR , by which wemean expressions of the type

G(x(·)) = Eξ[ g(x(ξ), ξ) ] for g : IRn × Ξ→ IR. (6.1)

Under the assumption that g(x, ξ) is continuous with respect to x = (x1, . . . , xn) ∈ IRn and measurablewith respect to ξ = (ξ1, . . . , ξN ) ∈ Ξ, we have g(x(ξ), ξ) measurable with respect to ξ when x(ξ) ismeasurable with respect to ξ (by [26, 14.28–14.29]). More is needed, however, to guarantee thatg(x(ξ), ξ) summable in ξ for x(·) in L∞n , so that G is well defined as a function from L∞n to IR. In factwe will want G to be differentiable on L∞n with “gradients” ∇G(x(·)) ∈ L1n in the (Frechet) sense that

G(x(·) + u(·)) = G(x(·)) + 〈∇G(x(·), u(·)〉+ o(||u(·)||∞). (6.2)

That will furnish the directional derivative formula

dG(x(·);u(·)) = 〈∇G(x(·), u(·)〉 for dG(x(·);u(·)) = limt→0+

[G(x(·) + tu(·))− G(x(·))t

], (6.3)

which we can put to work in describing optimality. For these reasons we further suppose for thefunction g in (6.1) that

∀α > 0 ∃β(·) ∈ L11 such that x ∈ αIB ⇒ |g(x, ξ)| ≤ β(ξ) and ∇xg(x, ξ) ∈ β(ξ)IB a.s. (6.4)

17

Expectation Derivative Theorem. Under the stated assumptions on g, the function G in (6.1) iswell defined from L∞n to IR and is differentiable with gradients ∇G(x(·)) ∈ L1n given by

∇G(x(·)) : ξ 7→ ∇xg(x(ξ), ξ), so that dG(x(·);u(·)) = Eξ[ 〈∇xg(x(ξ), ξ), u(ξ)〉 ]. (6.5)

Moreover the gradient mapping ∇G : L∞n → L1n is continuous and locally bounded, so that G is locallyLipschitz continuous. If in addition g(x, ξ) is convex in x, then G is a convex function and ∇G is amonotone mapping.

Proof. The bound on function values in (6.4) ensures for any x(·) ∈ L∞n (as applied to α ≥ ||x(·)||)that |g(x(ξ), ξ)| ≤ β(ξ), so that the integrand in (6.1) is summable and G(x(·)) is well defined andfinite. The bound on gradients in (6.4) provides, through the mean value theorem on the line segmentjoining x(ξ) and x′(ξ) within αIB for fixed ξ that

|g(x′(ξ), ξ)− g(x(ξ), ξ)| ≤ β(ξ)||x′(ξ)− x(ξ)|| when max{||x(·)||∞, ||x′(·)||∞} ≤ α (6.6)

and consequently for κ = Eξ[β(ξ)] that

|G(x′(·))− G(x(·))| ≤ κ||x′(·)− x(·)||∞ when max{ ||x(·)||∞, ||x′(·)||∞} ≤ α.

This confirms that G is locally Lipschitz continuous on L∞n . Then too the difference quotient

G(x(·) + tu(·))− G(x(·))t

= Eξ

[g(x(ξ) + tu(ξ), ξ)− g(x(ξ), ξ))

t

]concerns functions of ξ that are uniformly bounded by some summable β(ξ) as t→ 0+. Those functionsconverge pointwise in ξ to 〈∇xg(x(ξ), ξ), u(ξ)〉, so by the Lebesgue dominated convergence theorem theexpectations converge as well, and we get (6.5). Local Lipschitz continuity along with the existenceof these directional derivatives always entails differentiability as in (6.2).

The limit argument also reveals that 〈∇xg(x(ξ), ξ), u(ξ)〉 is likewise bounded by some summableβ(ξ) uniformly as long as x(·) and u(·) stay within bounded subsets of L∞n . In particular this implies,through u(·) ranging over the unit ball of L∞n , that ||∇xg(x(ξ), ξ)|| ≤ β(ξ). From there we deducenot only that ∇G(x(·)) ∈ L1n but also (again through the Lebesgue dominated convergence theorem)that ∇G(xν(·))) converges to ∇G(x(·)) in L1n as xν(·) converges to x(·) in L∞n . In other words, ∇G iscontinuous as a mapping from L∞n to L1n and locally bounded as well.

The convexity claim is obvious. Gradient mappings for convex functions are always monotone, asfollows from the usual subgradient inequality.

Note that this differentiability result is one of the places where working in L∞n instead of someother Lpn for x(·) is especially convenient.

Optimality in minimizing an expectation. The elementary “optimization case” that was help-ful in Section 1 as mitivation for single-stage stochaseic variational inequalities can now be expandedto a multistage setting and supplied with technical details.

For this we consider an expectation functional as above and for optimization turn to the problem

minimize G(x(·)) = Eξ[ g(x(ξ), ξ) ] over all x(·) ∈ C ∩ N ⊂ L∞n (6.7)

with C and N as in Section 2. From the differentiability in (6.5) along with the convexity of C ∩N , wesee that, if a local minimum in problem (6.7) occurs at x(·) ∈ C ∩N , then 〈∇G(x(·)), x′(·)−x(·))〉 ≤ 0for all x′(·) ∈ C ∩ N . Thus

local min at x(·) =⇒ −∇G(x(·)) ∈ NC∩N (x(·)), (6.8)

18

which is the stochastic variational inequality of basic form in (2.14) for F(x(·)) = ∇G(x(·)), corre-sponding to F (x, ξ) = ∇xg(x, ξ).

This necessary condition becomes a sufficient condition for global optimality when G is convex,and then one has a monotone stochastic variational inequality.

Equilibrium in a corresponding game model based on expectations. A gamelike extensionof this optimization example can be built around agents j = 1, . . . , J who select strategies xj(·) ∈Cj ∩ N j with Cj coming from a constraint mapping Cj in the manner of (1.12). (The dimensionscould be different for different agents, and this is why we are writing N j : a different L∞nj space maybe involved.) Agent j is concerned with minimizing with respect to the choice of xj(·) an expectationfunctional

Gj(x1(·), . . . , xJ(·)) = Eξ[ gj(x1(ξ), . . . , xJ(ξ), ξ) ]. (6.9)

The complication is that this “cost” to agent j depends also on the strategies chosen by the otherplayers. On the other hand, through the structure of stages of uncertainty and the availability ofrecourse decisions as information develops, the agents can interact repeatedly over time.

As understood from the preceding optimization, the first-order optimality condition for agent j is

−F j(x1(·), . . . , xJ(·)) ∈ NCj∩N j (xj(·)) for the mapping

F j(x1(·), . . . , xJ(·)) : ξ 7→ ∇xjgj(x1(ξ), . . . , xJ(ξ), ξ).(6.10)

The situation in which all these conditions in (6.10) are satisfied simultaneously constitutes a kind ofequilibrium reflecting “stationarity” in the perceptions of the agents. It is a true Nash equilibrium ifthe functionals Gj in (6.9) are convex, which corresponds to each gj(x1, . . . , xJ , ξ) being convex withrespect to xj .

The most important fact here for our purposes of illustration is that the simultaneous satisfactionof the conditions in (6.10) can be unified into a single stochastic variational inequality in basic form:

−(F1(x1(·), . . . , xJ(·)), . . . ,FJ(x1(·), . . . , xJ(·))) ∈ NC∩N (x1(·), . . . , xJ(·))where C = C1 × · · · × CJ and N = N 1 × · · · × N J .

Incorporation of expectation constraints. An example of a stochastic variational inequalityof the more general form in (2.19), with the basic constraint set C replaced by a smaller set K, comesfrom adding constraints of the type

Gi(x(·)) = Eξ[ gi(x(ξ), ξ) ]

{≤ 0 for i = 1, . . . , r,= 0 for i = r + 1, . . . ,m.

(6.11)

where each Gi is an expectation functional satisfying the conditions laid out above. In working withsuch expectation constraints we will be able to exploit the fact that Gi is differentiable with gradientsas in (6.5):

∇Gi(x(·)) ∈ L1n for ∇Gi(x(·)) : ξ 7→ ∇xgi(x(ξ), ξ). (6.12)

For the present purpose we also assume that gi(x, ξ) is convex on x for i = 1, . . . , r but affine in xfor i = r+ 1, . . . ,m, so that Gi is convex for i = 1, . . . , r but affine for i = r+ 1, . . . ,m. This is neededto guarantee that the set

K = {x(·) ∈ C | (6.10) holds } (6.13)

is not just closed in L∞n (through the continuity of each Gi) but also convex, as is appropriate for thecondition

−F(x(·)) ∈ NK∩N (x(·)) (6.14)

19

to legitimately be a stochastic variational inequalty with expectation constraints.We can then try to understand how the normal cones NK(x(·)) and NK∩N (x(·)) relate to the

previous normal cones NC(x(·)) and NC∩N (x(·)). Again we’ll be interested in Lagrange multipliervectors

y = (y1, . . . , ym) ∈ Y = [0,∞)r × (−∞,∞)m−r, (6.15)

but now, in coordination with x(·) and (6.11), they should satisfy

yi

{≥ 0 for i ≤ r having Gi(x(·)) = 0,= 0 for i ≤ r having Gi(x(·)) < 0,

or equivalently when combined with (6.11):

G(x(·)) ∈ NY (y) with G(x(·)) = (G1(x(·)), . . . ,G(x(·))), i.e.,G(x(·)) = Eξ[g(x(ξ), ξ)] for g(x, ξ) = (g1(x, ξ), . . . , gm(x, ξ)).

(6.16)

Multiplier Theorem 2 (for expectation constraints). For K in (6.13), the normal cone NK∩N (x(·))at an x(·) ∈ K ∩N contains all v(·) having, in the notation above, the representation

∃ y ∈ IRm, z(·) ∈ L1n, such that

{v(·) =

∑m

i=1yi∇Gi(x(·)) + z(·) with

G(x(·)) ∈ NY (y), z(·) ∈ NC∩N (x(·))(6.17)

(in utilizing the fact that NY (y) 6= ∅ implies y ∈ Y ). In the finitely stochastic case, this furnishes acomplete description of NK∩N (x(·)) under the constraint qualification that

∃ x(·) ∈ C ∩ N such that x(ξ) ∈ rintC(ξ) a.s. and Gi(x(·)){< 0 for i = 1, . . . , r,= 0 for i = r + 1, . . . ,m.

(6.18)

In the general stochastic case it furnishes a complete description of NK∩N (x(·)) if m = r, i.e., thereare no equation constraints, and

∃ x(·) ∈ C ∩ N such that Gi(x(·)) < 0 for i = 1, . . . ,m. (6.19)

Moreover the same results hold with K ∩N and C ∩ N replaced everywhere by just K and C.

Proof. There is nothing new here in the finitely stochastic case except for the special languageand notation. Standard rules of convex analysis, e.g. [17, 23.8], support the assertions made aboutLagrange multipliers. This takes into account that N is a subspace, so that having x(·) ∈ N alongwith x(ξ) ∈ rintC(ξ). which corresponds to x(·) ∈ rint C, implies x(·) ∈ rint(C ∩ N ). For the generalstochastic case relative interiors are of no help, but the strict inequalities in (6.19) can serve insteadas the well known Slater condition for obtaining Lagrange multipliers.

Stochastic variational inequalities of Lagrangian expectation form. Again, as with theLagrangian representation for basic constraints, we are able to pass to a wider expression of theseconditions as a single variational inequality in the Lagrange multipliers other variables jointly. Thisemerges from consolidating the condition on the right of (6.17) to

v(·)− yi∇Gi(x(·)) ∈ NC∩N (x(·)) with G(x(·)) ∈ NY (y) (6.20)

having G as in (6.16) and going on then to rewrite the basic stochastic variational inequality−F(x(·)) ∈NC∩N (x(·) in (2.14) as a condition jointly on y ∈ IRm and x(·) ∈ L∞n :

−(−G(x(·)),F(x(·)) +

∑m

i=1yi∇Gi(x(·))

)∈ NY×C∩N (y, x(·)). (6.21)

20

Although this is truly a variational inequality in (y, x(·)), it isn’t one of basic form in our terminology.Likewise, the corresponding stochastic variational inequality in (2.15), similarly rewritten as

x(·) ∈ N and there exist w(·) ∈M and y ∈ IRm such that−(−G(x(·)),F(x(·)) +

∑mi=1 yi∇Gi(x(·))

)+ (0, w(·)) ∈ NY×C(y, x(·)) (6.22)

is not exactly one of extensive form. That doesn’t stop us from referring to both of them as stochasticvariational inequalities in Lagrangian expectation form, but much more can be said in this direction.

Reduction to extensive form with augmented nonanticipativity. For one thing, we caninterpret y as a “decision component” fixed in advance of observing ξ1, . . . , ξN . That way, it can beadjoined to the first-stage component of x(·) as a function y(·) constrained to be constant. In notationfor that, we can think of augmenting x(ξ) to

x(ξ) ∈ IRm × IRn with x1(ξ) = (y(ξ), x1(ξ)) but xk(ξ) = xk(ξ) for k = 2, . . . , N ,

and in parallel augmenting N to

N = { x(·) = (y(·), x(·)) | y(·) ≡ const , x(·) ∈ N }.

Then (6.21) comes out as the following variational inequality in x(·):

−F0(x(·)) ∈ N(Y×C)∩N (x(·)) for F0(x(·)) =(−G(x(·)),F(x(·)) +

∑m

i=1yi(·)∇Gi(x(·))

). (6.23)

This is closer to being a stochastic variational inequality of basic type but still misses the mark becauseF0(x(·) doesn’t fit our prescription of being a function ξ → F0(x(ξ), ξ) due to the nature of the vectorG(x(·)) in (6.16).

We can do better however with the corresponding interpretation of (6.22). For that we have torecognize the need for an additional multiplier element u(·) with Eξ[u(ξ)] = 0 to take care of theconstancy constraint on y(·). Introducing

C(ξ) = Y × C(ξ), yielding C = { (y(·), x(·)) = x(·) ∈ L∞n | x(ξ) ∈ C(ξ) a.s. }, (6.24)

we can express the variational inequality in (6.22) equivalently as

x(·) ∈ N and there exist w(·) ∈M, y ∈ IRm, u(·) with Eξ[u(ξ)] = 0, such that−(−G(x(ξ)ξ)),F(x(ξ)) +

∑mi=1 yi(ξ)∇Gi(x(ξ))

)+ (u(ξ), w(ξ)) ∈ NC(ξ)(y(ξ), x(ξ)).

(6.25)

Why an equivalence? For G(x(·)) = Eξ[g(x(ξ), ξ)] we are invoking yet again the rule that

Eξ[g(x(ξ), ξ)] ∈ NY (y) ⇐⇒ ∃u(·), Eξ[u(ξ)] = 0, with g(x(ξ), ξ) + u(ξ) ∈ NY (y(ξ)) a.s.

Taking this further, we can augment w(·) ∈M to

w(ξ) ∈ IRm × IRn with w1(ξ) = (u(ξ), w1(ξ)) but wk(ξ) = xk(ξ) for k = 2, . . . , N ,

and introduceM = { w(·) = (u(·), w(·)) |Eξ[u(ξ)] = 0, w(·) ∈M}

along with

F (x, ξ) =(−g(x, ξ)), F (x, ξ) +

∑m

i=1yi∇xgi(x, ξ))

). (6.26)

With this in hand we can express the variational inequality in (6.25) as

x(·) ∈ N and ∃ w(·) ∈ M such that − F (x(ξ), ξ)) + w(ξ) ∈ NC(ξ)(x(ξ)) a.s. (6.27)

We have then arrived at a genuine stochastic variational inequality of extensive type, demonstratingonce more that such a model can serve as a target for the reduction of more complicated variationalinequalities.

21

7 Potential applications to models with nonconvex constraints

The stochastic variational inequalities of Lagrangian form in Sections 5 and 6 emerged from multiplierrules invoked for convex constraint systems of basic type or expectation type. However, multiplierrules utilizing other constraint qualifications for nonconvex constraint systems the same kinds ofvariational inequalities can be reached, This offers more fertile ground for our subject than might havebeen recognized so far.

A prominent example in the original history of variational inequalities is their capability of rep-resenting first-order optimality conditions even in optimization without constraint convexity. TheKarush-Kuhn-Tucker conditions in nonlinear programming were expressed in that manner by Robin-son [15] for the sake of studying how the solutions to a problem may depend on the data in theproblem — a program that has continued to this day with ever-wider reach, cf. [7]. That avenue ofresearch could well be followed also for stochastic models of optimization or equilibrium, even withnonconvexity.

The main idea is that many applications could lead to a condition of the same appearance but withC or K nonconvex and normal cones being defined in the broader sense of variational analysis insteadof just convex analysis, as for instance in [26]. Such a stochastic condition with C or K nonconvexcouldn’t properly be called a variational inequality, but still, multiplier rules might be invoked toreduce it to much in the same way as above to a true variational inquality.

22

References

[1] R.P. Agdeppa, N. Yamshita and M. Fukushima, “Convex expected residual models forstochastic affine variational inequality problems and its application to the traffic equilibriummodels,” Pacific Journal of Optimization 6 (2010), 3–19.

[2] C. Castaing and M. Valadier, Convex Analysis and Measurable Multifunctions, SpringerLecture Notes in Mathematics [1977].

[3] X.J. Chen and M. Fukushima, “Expected residual minimization method for stochastic linearcomplementary problems,” Mathematics of Operations Research 30 (2005), 1022–1038.

[4] X.J. Chen, R.J-B Wets and Y. Zhang, “Stochastic variational inequalities: Residual mini-mization smoothing/sample average approximations,” SIAM Journal of Optimization 22 (2012),649–673.

[5] X.J. Chen, Y. Zhang and M. Fukushima, “Robust solution of monotone stochastic linearcomplementarity problems,” Mathematical Programming 117 (2009), 51–80.

[6] R.W. Cottle, F. Giannessi and J.-L. Lions, Variational Inequalities and ComplementaryProblems, Wiley, New York, 1980.

[7] A.L. Dontchev and R.T. Rockafellar, Implicit Functions and Solution Mappings: A ViewFrom Variational Analysis, second edition, Springer-Verlag, 2014.

[8] F. Facchinei and J.-S. Pang, Finite-dimensionsl Variational Inequalities and Complementar-ity Problems, Springer, 2003.

[9] H. Fang, X.J. Chen and M. Fukushima, “Stochastic R0 matrix linear complementarityproblems.” SIAM Journal on Optimization 18 (2007), 482–506

[10] G.!Gurkan, A.Y. Ozge and S.M. Robinson, “Sample path solution of stochastic variationalinequalities,” Mathematical Programming 84 (1999), 313–333.

[11] A. Iusem, A. Jofre and P. Thompson, “Approximate projection methods for monotonestochastic variational inequalities,” Mathematical Programming , to appear.

[12] H. Jiang and H. Xu, “Stochastic approximation approaches to the stochastic variationalinequality problem,” IEEE Transactions Automatic Control 53 (2008), 1462–1475.

[13] C. Ling, L. Qi, G. Zhou and L. Caccetta, “The SC1 property of an expected residualfunction arising from stochastic complementarity problems,” Operations Research Letters 36(2008), 456–460.

[14] M.J. Luo and J.H. Lin, “Expected residual minimization method for stochastic variationalinequality problems,” J. Optimization Theory and Applications 140 (2009), 103–116.

[15] S.M. Robinson, “Generalized equations and their solutions, I. Basic theory, point-to-set mapsand mathematical programming,” Mathematical Programming Study 10 (1979), 128–141.

[16] R.T. Rockafellar, “Local boundedness of nonlinear monotone operators,” Michigan Journalof Math. 16 (1969), 4–25.

23

[17] R.T. Rockafellar, Convex Analysis, Princeton University Press [1970].

[18] R.T. Rockafellar, “On the maximality of sums of nonlinear monotone operators,” Transac-tions American Math. Soc. 149 (1970), 241–250.

[19] R.T. Rockafellar, “Integrals which are convex functionals, II” Pacific Journal of Mathemat-ics 39 (1971), 439–469.

[20] A. Jofre, R.T. Rockafellar and R.J-B Wets, “ Variational inequalities and economicequilibirum,” Mathematics of Operation Research 32 (2007), 32-50.

[21] R.T. Rockafellar, “Convex integral functionals and duality,” in Contributions to NonlinearFunctional Analysis, E. Zarantonello (ed.), Academic Press, 1971, 215–236.

[22] R.T. Rockafellar, Conjugate Duality and Optimization, No. 16 in Conference Board ofMathematical Sciences Series, SIAM Publications, 1974.

[23] R.T. Rockafellar, “Extended nonlinear programming,” in Nonlinear Optimization and Re-lated Topics (G. Di Pillo and F. Giannessi, eds.); Kluwer, 1999, 381–399.

[24] R.T. Rockafellar and R.J-B Wets, “Nonanticipativity and L1-martingales in stochasticoptimization problems,” Mathematical Programming Study 6 (1976), 170-187.

[25] R.T. Rockafellar and R.J-B Wets, “Scenarios and policy aggregation in optimizationunder uncertainty,’ Mathematics of Operations Research 16 (1991), 119–147.

[26] R.T. Rockafellar and R.J-B Wets, Variational Analysis, No. 317 in the series Grundlehrender Mathematischen Wissenschaften, Springer-Verlag [1997].

[27] A. Ruszczynski and A. Shapiro, Stochastic Programming, Handbookds in Operations Re-search and Management Science, Elsevier, 2003.

[28] H. Xu, “Sample average approximation methods for a class of stochastic variational inequalityproblems,” Asia Pacific J. Operations Research 27 (2010), 103–109.

[29] C. Zhang and X. Chen, “Smoothing projected gradient method and its application to stochas-tic linear complementary problems,” SIAM J. Optimization 20 (2009), 627–649.

[30] C. Zhang, X. Chen and A. Sumalee, “Robust Wardrops user equilibrium assignment un-der stochastic demand and supply: expected residual inimization approach,” TransportationResearch B 45 (2011), 534–552.

24