Hamilton-Jacobi-Bellman Equations

Hamilton-Jacobi-Bellman Equations

Analysis and Numerical Analysis

Iain Smears

My deepest thanks go to my supervisor Dr. Max Jensen for his guidance and support during thisproject and my time at Durham University.

Abstract. This work treats Hamilton-Jacobi-Bellman equations. Their relation to several prob-lems in mathematics is presented and an introduction to viscosity solutions is given. The work ofseveral research articles is reviewed, including the Barles-Souganidis convergence argument and theinaugural papers on mean-field games.

Original research on numerical methods for Hamilton-Jacobi-Bellman equations is presented:a novel finite element method is proposed and analysed; several new results on the solubility andsolution algorithms of discretised Hamilton-Jacobi-Bellman equations are demonstrated and newresults on envelopes are presented.

iii

This piece of work is a result of my own work except where it forms an assessment based on groupproject work. In the case of a group project, the work has been prepared in collaboration with other

members of the group. Material from the work of others not involved in the project has beenacknowledged and quotations and paraphrases suitably indicated.

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Chapter 1. Optimal Control and the Hamilton-Jacobi-Bellman Equation . . . . . . . . . . . . . . . . . . . . 11. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. Optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13. The Hamilton-Jacobi-Bellman equation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34. Counter-examples to the Hamilton-Jacobi-Bellman equation in the classical sense . . . . . . 7

Chapter 2. Connections to Monge-Ampere Equations and Mean-Field Games . . . . . . . . . . . . . . . 111. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112. Monge-Ampere equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113. Mean-field games. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 3. Viscosity Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232. Elliptic and parabolic operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233. Viscosity solutions of parabolic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264. Viscosity solutions of Hamilton-Jacobi-Bellman equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Chapter 4. Discrete Hamilton-Jacobi-Bellman Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372. Solubility of discrete Hamilton-Jacobi-Bellman equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373. Semi-smooth Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 5. Envelopes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492. Basics of envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493. Further results on envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Chapter 6. Monotone Finite Difference Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572. The Kushner-Dupuis scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583. The Barles-Souganidis convergence argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614. Convergence rates for the unbounded domain problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645. Numerical experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Chapter 7. Finite Element Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712. Basics of finite element methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723. Hamilton-Jacobi-Bellman equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744. The method of artificial diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765. Numerical scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776. Supporting results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787. Elliptic problem: proof of main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838. Further supporting results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879. Parabolic problem: proof of main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Appendix A. Stochastic Differential Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 991. Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

v

vi CONTENTS

2. Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003. The strong Markov property, generators and Dynkin’s formula. . . . . . . . . . . . . . . . . . . . . . . . . 101

Appendix B. Matrix Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1031. Field of values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032. M-matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

Appendix C. Estimates for Finite Element Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1071. Estimates for finite element methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Appendix D. Matlab Code for the Kushner-Dupuis Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Appendix. Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Introduction

Contents. This work treats the subject of the Hamilton-Jacobi-Bellman (HJB) equation,which is a fully non-linear second order partial differential equation. The main topics coveredare the origins of the HJB equation and some other equations to which it is related, the relevantnotion of generalised solution for a HJB equation, and especially some of the numerical methodsthat may be used to solve it.

The chapters of this report treat these topics in this order. Chapter 1 introduces optimalcontrol problems and shows how they relate to the HJB equation. Chapter 2 then presents furtherapplications of the HJB equation, by showing how some Monge-Ampere equations are equivalentto certain HJB equations, and by showing the role of the HJB equation in mean-field games.

Chapter 3 is an introduction to the theory of viscosity solutions. Its primary aim is to indicatehow viscosity theory leads to well-posedness of HJB equations. The secondary objective is to provideseveral results on viscosity solutions that are needed for the chapters on numerical methods.

Before considering specific numerical methods for HJB equations, chapter 4 details several newresults which aim to answer in a general setting the questions of when and how is it possible to solvethe discrete system of equations typically encountered in the application of a numerical method.

Chapter 5, also consisting for the most part of original work, introduces some analytical tools,called envelopes, that are necessary for the numerical analysis of HJB equations. The approachtaken is novel in its generality and abstractness. Several new results concerning envelopes arepresented, alongside a newly suggested approach towards finding non-monotone numerical methodsfor viscosity solutions.

Having thus established the necessary theory for analysing numerical methods in the previouschapters, chapter 6 treats the proof of convergence to the viscosity solution of monotone finitedifference methods, illustrating the methods of Barles and Souganidis in [5]. In addition, recentprogress in [4] on finding convergence rates for these methods are reviewed, and the results of anoriginally conducted numerical experiment are presented, which serves to illustrate the fruit of thework of chapters 3, 4, 5, and 6.

Chapter 7 reports the principal achievement of this project. It presents research done in collab-oration with Dr. Max Jensen on finite element methods for HJB equations. Novel finite elementmethods are proposed for both elliptic and parabolic HJB equations and their convergence is anal-ysed. The usual Barles-Souganidis convergence argument is not directly transposable to finiteelement methods, thus the heart of the proof features the use of several a-priori estimates forelliptic and parabolic problems as auxiliary tools to analyse the nonlinear HJB equation.

Background material. The above gives a very brief summary of the contents of these chap-ters. However this says nothing of the significant amount of theory from other areas of mathematicalanalysis that are used to treat these topics. In fact, to mention just a few examples, use is made ofstochastic differential equations and the theory of diffusion processes (chapters 1 and 2); Sobolevspaces, a-priori estimates for finite element methods (chapters 2 and 7); approximation by smoothfunctions and mollification (chapters 3 and 7); slant-differentiable functions and semi-smooth New-ton methods (chapter 4); and matrix analysis, properties of positive semi-definite matrices, thefield of values and M-matrices (chapters 2, 4, 6, 7).

For many of these related topics, the required material is either provided as part of the discussionor in the form of an appendix. A notable exception is the subject of Sobolev spaces, weak formsof PDE and finite element methods. A justification is that it would clearly be beyond the scope ofthis work to give a complete treatment of finite element methods for linear PDE in addition to the

vii

viii INTRODUCTION

analysis for HJB equations. Thus it would not make sense to assume from our reader familiaritywith finite element methods, yet not Sobolev spaces. Nevertheless, we indicate that the reader mayfind introductions to these topics in some or all of [1], [8], [13], [15] or [24].

Notation. For the most part, the notation of [15] is adopted throughout this work. This choicewas made for two reasons: [15] is a well-known source and the notation used within it is clearlyreferenced in an appendix. When necessary, some further elements of notation from other sourcesare adopted. Typically, the notation of the source most relevant to the material being discussed isthe one which is adopted. This was done for the reader’s convenience in consulting the sources.

In matters of terminology, in this work preference is given to using an affirmative voice. Thus,we say x is positive if x ≥ 0, rather than non-negative; x is strictly positive if x > 0, etc. Wealso follow the common practice of using the letter C to denote all constants in various estimatesand inequalities. In particular, it is often the case that the constants appearing in two consecutiveinequalities need not be the same, yet both are denoted by C.

CHAPTER 1

Optimal Control and the Hamilton-Jacobi-Bellman Equation

1. Introduction

This chapter introduces the Hamilton-Jacobi-Bellman (HJB) equation and shows how it arisesfrom optimal control problems. First of all, optimal control problems are presented in section 2,then the HJB equation is derived under strong assumptions in section 3. The importance of thisequation is that it can be used as part of a strategy for solving the optimal control problem.

To introduce optimal control problems, we begin by considering some process evolving in time,called the state, denoted x(·), with the dynamics given by a stochastic differential equation (SDE).In some applications, the process is controllable to some extent by a person who wishes to optimisea performance indicator that depends on the process1.

The person does so by choosing a function α(·), called a control, that takes values in a metricspace Λ and influences the dynamics of x(·) through the SDE. Different α(·) lead to different statesat time t, and hence different values for the performance indicator. The optimal control problemis then to answer the following questions.

(1) Which controls α(·) optimise the performance indicator?(2) What value does the performance indicator take for these best controls?

The strategy for solving the optimal control problem that is considered here goes under thename of dynamic programming. However, there are other methods such as those based on thePontryagin maximum principle, which we do not present here, but refer the reader to [14] for anintroduction. The method of dynamic programming consists of answering question 2 first, thenusing this answer to construct an answer for question 1. This method is briefly explained at theend of section 3.

The main focus of this work is on methods for obtaining the answer to question 2, which involvessolving a nonlinear PDE. This PDE is called the Hamilton-Jacobi-Bellman equation (HJB) and wewill give a first derivation of it in section 3.

There are many types of different optimal control problems for modelling different situations.The finite horizon bounded domain optimal control problem is the primary problem that will beconsidered here. In this problem, the state is controlled only in a bounded subset of Rn and untila final time T . The reason for this choice is that the resulting HJB equation is a parabolic PDEwith Dirichlet boundary data and lends itself naturally to numerical methods to be discussed inlater chapters.

Our primary sources for this chapter are [16] which is used for the precise details on the math-ematical framework of optimal control problems, and [23] for some theory on stochastic differentialequations.

2. Optimal control

2.1. Basic definitions. The reader may find the appendix on stochastic differential equations,appendix A, to be helpful for this section. The assumptions on the problem made in this sectionare maintained throughout this work.

1Although in some instances, optimal control problems can be given unusual and very interesting interpretations,see for instance [2] for an application in mathematical finance to pricing derivatives for worst case scenarios of thevolatility.

1

2 1. OPTIMAL CONTROL AND THE HAMILTON-JACOBI-BELLMAN EQUATION

Let T > 0, U ⊂ Rn be an open bounded set, Λ be a compact metric space. Let us callO = U × (0, T ) and ∂O the parabolic boundary of O: ∂O = ∂U × (0, T ) ∪ U × T.

Let (Ω,F ,P) be a probability space and W (s)0≤s≤T be a d-dimensional Brownian motion asdescribed in appendix A. Let A be a subset of all progressively measurable stochastic processesα : [0, T ] × Ω 7→ Λ. For the definition of progressively measurable processes, see definition 1.1 ofappendix A.

An element α(·) ∈ A is called a control. The choice of the control set A will depend on thesituation that is being modelled. A few examples include general stochastic processes adapted tothe σ-algebra generated by the state, or functions of the current state, or even just deterministicprocesses.

Let b : Rn× [0, T ]×Λ 7→ Rn and σ : Rn× [0, T ]×Λ 7→ Rn×d be functions satisfying the followingconditions. Firstly, assume that b and σ are continuous and for every α ∈ Λ, b(·, ·, α) and σ(·, ·, α)are in C1(Rn × [0, T ]).

Secondly, we assume that there exists C ≥ 0 such that for all α ∈ Λ, x, y ∈ Rn and t, s ∈ [0, T ],

|b(x, t, α)− b(y, s, α)| ≤ C (|x− y|+ |t− s|) ; (2.1a)

|σ(x, t, α)− σ(y, s, α)| ≤ C (|x− y|+ |t− s|) ; (2.1b)

and

|b(x, t, α)| ≤ C(1 + |x|); (2.1c)

|σ(x, t, α)| ≤ C(1 + |x|); (2.1d)

where |σ| denotes the Euclidian vector norm on a matrix σ ∈ Rn×d, |σ|2 =∑

ij |σij |2.

2.2. State dynamics. For every α(·) ∈ A and x ∈ U , t ∈ [0, T ], let the statexα(·)(s)

[t,T ]

be an Ito process, solution of the state dynamics SDE

dxα(·)(s) = b(xα(·)(s), s, α(s)

)ds+ σ

(xα(·)(s), s, α(s)

)dW (s) s ∈ (t, T ] ; (2.2a)

xα(·)(t) = x. (2.2b)

The assumptions made on b and σ were made (in part) to guarantee that the SDE does infact have a unique strong solution with continuous paths for any choice of α(·), x and t. For moredetails on this, see [16, p. 403].

Remark 2.1. Usually, the notation omits the dependence of xα(·)(·) on α(·) and we write thestate simply as x(·). The reason why we allow the state to start at arbitrary t ∈ [0, T ] will becomeapparent later.

Example 2.2 (Markov control sets). Some applications might require that α(s) = α(s, x(s)) i.e.the control depends only on the present state. In this case, the SDE of (2.2) becomes

dx(s) = b(x(s), s, α

((s, x(s)

))ds+ σ

(x(s), s, α

(s, x(s)

))dW (s);

which means that for any α(·), x(s)[t,T ] is a diffusion process, with non-stochastic drift b and

volatility σ. A control of this form is called a Markov control2. In such a case, with the aboveassumptions, for any Markov control α(·), the theorem of existence and uniqueness for SDEs,theorem 2.1 of appendix A shows that there exists a unique solution to (2.2).

2 In passing, we signal that the reader may find [23, theorem 11.2.3 p. 244] interesting to learn about the effectof different choices of control sets.

3. THE HAMILTON-JACOBI-BELLMAN EQUATION 3

2.3. Cost functional. The cost functional is the performance indicator mentioned in theintroduction. There are many different optimal control problems, and the principal differencesbetween them arise at this stage. We only consider the finite horizon problems, where the state iscontrolled for times in [0, T ].

Two main categories of finite horizon problems are the bounded domain and unbounded domainoptimal control problems. This work focuses principally on the bounded domain problem, becausegenerally speaking, up to a few changes, most results that hold for the bounded domain problemhave analogues for the unbounded domain problem.

Bounded domain problem. In the bounded domain problem, the process is started in U and isstopped if x(·) exits U .

For given α(·), x ∈ U , t ∈ [0, T ), let τ be the time of first exit of (x(s), s) from O:

τ = infs > t | (x(s), s) /∈ O

. (2.3)

We note that τ is a stopping time, see definition 3.1 of appendix A, τ is measurable with respectto F , and τ < T if x(·) exits U before time T, and τ = T implies x(s) ∈ U for all s ∈ [t, T ]. Letf : O × Λ 7→ R and g : O 7→ R be continuous functions, such that there exists C ≥ 0 such that forall (x, t), (y, s) ∈ O and α ∈ Λ,

|f(x, t, α)− f(y, s, α)| ≤ C (|x− y|+ |t− s|) ; (2.4a)

|f(x, t, α)| ≤ C (1 + |x|) ; (2.4b)

|g(x, t)| ≤ C (1 + |x|) . (2.4c)

The function f is the running cost of the process, the function g is the exit cost.

Definition 2.3. The cost functional J : Rn × [0, T ]×A 7→ R is defined by

J(x, t, α(·)) = Ex,t τ∫t

f (x(s), s, α(s)) ds+ g (x(τ), τ)

. (2.5)

The notation Ex,t means expectation with respect to the measure induced byxα(·)(s)

[t,T ]

started at x.

For given x0 ∈ U , question 1, as stated in the introduction of this chapter, is to find an optimalcontrol α∗(·) ∈ A such that

J (x0, 0, α∗(·)) = min

α(·)∈AJ (x0, 0, α(·)) . (2.6)

At this stage, it is not clear if the minimum is attained, i.e. if an optimal control even exists. Forcertain problems, it can be shown that optimal controls exist, and we refer the reader to [14] for aproof of existence of an optimal control for cases where the state dynamics are deterministic andthe resulting ODE is affine.

Unbounded domain problem. In the unbounded domain problem, the process is allowed to evolveuntil time T regardless of its path, and the cost functional is defined simply as

J(x, t, α(·)) = Ex,t T∫t

f (x(s), s, α(s)) ds+ g (x(T ), T )

.However for the remainder of this work, we will take the cost functional to be defined by equation(2.5).

3. The Hamilton-Jacobi-Bellman equation

3.1. The Hamilton-Jacobi-Bellman equation. As mentioned in the introduction, we beginour study of the problem by attempting to answer question 2 first. For starting data x and t, thebest bound on the performance is termed value function, and it is this function that will satisfy, insome sense, the Hamilton-Jacobi-Bellman equation. To introduce the HJB equation, we derive itunder certain hypotheses.


Definition 3.1. The value function u : O 7→ R is defined by

u(x, t) = infα(·)∈A

J(x, t, α(·)). (3.1)

For fixed α ∈ Λ and a diffusion process of the form

dx(s) = b(x(s), s, α

)ds+ σ

(x(s), s, α

)dW (s);

if we define a(x, t, α) = 1/2σ(x, t, α)σ(x, t, α)T , we call the generator ∂t−Lα of the diffusion processthe differential operator given by

(vt − Lαv) (x, t) = vt + Tr a(x, t, α)D2xv(x, t) + b(x, t, α) ·Dxv(x, t), v ∈ C2(O). (3.2)

Some results on the generators of diffusion processes are included in section 3 of appendix A.

Remark 3.2. The first derivation of the HJB equation makes several strong assumptions, and wesimplify the problem by considering only Markov controls. The reader should not be too concernedthat these assumptions are not verified beforehand or are not the weakest possible. The reason issimply that often these assumptions don’t hold. The purpose here is merely to introduce the HJBequation.

Theorem 3.3 (Hamilton-Jacobi-Bellman equation: classical sense). [23, p. 240]. Let A consist ofall controls of the form α(s) = α(s, x(s)), so that x is a Markov diffusion process. Assume thatu ∈ C2 (O) ∩ C(O) and that for any (x, t) ∈ O, α ∈ Λ and for any stopping time t ≤ τ ,

Ex,t∣∣u(x(t), t)

∣∣+ Ex,t

∣∣∣∣∣∣∣t∫

0

(ut − Lαu)(x(s), s)ds

∣∣∣∣∣∣∣ <∞. (3.3)

Assume there exists an optimal control α∗(·) such that for any (x, t) ∈ O,

u(x, t) = J (x, t, α∗(·)) ,

and that for any (x, t) ∈ ∂O, P(τ = 0) = 1.

Then the value function u satisfies the HJB equation

minα∈Λ

[ut(x, t)− Lαu(x, t) + f(x, t, α)

]= 0 on O; (3.4a)

u = g on ∂O. (3.4b)

It is convenient to rewrite the HJB equation in an alternative manner, for reasons which willbecome apparent in the following chapters. Define the HJB operator in its pointwise sense by

H(x, t,Dxu(x, t), D2

xu(x, t))

= maxα∈Λ

[Lαu(x, t)− f(x, t, α)

]; (3.5)

where the maximum is achieved in view of compactness of Λ and continuity of α 7→ Lαu(x, t) andα 7→ f (x, t, α). Then the value function solves

−ut(x, t) +H(x, t,Dxu(x, t), D2

xu(x, t))

= 0 on O.

Proof. The proof consists of two steps. The first is to show that

minα∈Λ

(ut(x, t)− Lαu(x, t) + f(x, t, α)) ≥ 0 on O;

then to find α ∈ Λ for every (x, t) ∈ O such that ut(x, t)− Lαu(x, t) + f(x, t, α) = 0.

The first step is simple enough to be given in detail, and we follow [23] whilst providing a moredetailed explanation on the obtention of the dynamic programming principle. See equation (3.10)below. However the second step involves further results from stochastic analysis, thus we refer thereader to [23] for details.

First consider (x, t) ∈ ∂O. By the hypothesis that P(τ = 0) = 1, for any control α(·),

J(x, t, α(·)) = g(x, t). (3.6)

Therefore u(x, t) = g(x, t) if (x, t) ∈ ∂O.

3. THE HAMILTON-JACOBI-BELLMAN EQUATION 5

Now let (x, t) ∈ O be fixed. For any stopping time t ≤ τ and any control,

J(x, t, α(·)) = Ex,t τ∫t

f(x(s), s, α(s))ds+ g(τ, x(τ))

= Ex,t

t∫t

f(x(s), s, α(s))ds+

τ∫t


= Ex,t

t∫t

f(x(s), s, α(s))ds+ Ex,t τ∫t


.By iterated conditioning and then by the strong Markov property of diffusion processes, theorem3.2 of appendix A,

Ex,t τ∫t


= Ex,tEx,t

τ∫t


∣∣∣∣∣Ft .

= Ex,tEx(t),t

τ∫t


= Ex,tJ(x(t), t, α(·)).

Thus we have for any stopping time t ≤ τ and any control,

J(x, t, α(·)) = Ex,tt∫t

f(x(s), s, α(s))ds+ Ex,tJ(x(t), t, α(·)). (3.7)

Now let t ∈ (t, T ) and for each control, set

t = infs ∈ [t, t) | (x(s), s)) /∈ U × [t, t)

. (3.8)

From this definition, we know that t ≤ τ and if x (·) exits U before time t, then t = τ . Let α ∈ Λbe arbitrary. By hypothesis, there exists an optimal control α∗(·) such that

u(xα(t), t

)= J

(xα(t), t, α∗(·)

).

This means informally that if we evolve the process to time t under the control α, then we mayswitch to an optimal control. So define

α(y, s) =

α if y ∈ U and s < t;

α∗(y, s) otherwise.

From now on, set x(s) = xα(·)(s). If x(·) exits U before time t, then t = τ and similarly to(3.6), u

(x(t), t

)= J

(x(t), t, α∗(·)

). If x does not exit U before time t, then for any s ≥ t,

α (x(s), s) = α∗ (x(s), s); which implies that u(x(t), t

)= J

(x(t), t, α∗(·)

).

These considerations show that in general, α(x(s), s) = α when s ≤ t and α(x(s), s) =α∗(x(s), s) otherwise, and furthermore

u(x(t), t) = J(x(t), t, α∗(·)

). (3.9)

Therefore, by (3.7) and from the definition of u, we have

u(x, t) ≤ Ex,tt∫t

f(x(s), s, α(s))ds+ Ex,tu(x(t), t). (3.10)

In passing, this last equation is part of the dynamic programming principle and plays a major rolein control theory.


Since u ∈ C2(O), by Dynkin’s formula, theorem 3.4 of appendix A, and by using the fact thatα(x(s), s) = α for any s < t,

Ex,tu(x(t), t) = u(x, t) + Ex,tt∫t

(ut − Lαu) (x(s), s)ds. (3.11)

The boundedness assumption (3.3), together with u ∈ C2(O) ∩ C(O), implies that the quantities

in (3.11) are finite. So we substitute this into (3.10) to obtain

Ex,t

t∫t

f(x(s), s, α(s)) + ut(x(s), s)− Lαu(x(s), s)ds

≥ 0. (3.12)

Dividing (3.12) by Et≥ t > 0 a.s. and by letting t→ 0, which implies that E

t→ 0, and from

the continuity of f, a, b, we have

ut(x, t)− Lαu(x, t) + f(x, t, α) ≥ 0.

To see that for every (x, t) ∈ O, there is α∗(x, t) ∈ Λ such that

ut(x, t)− Lα∗(x,t)u(x, t) + f

(x, t, α∗(x, t)

)= 0; (3.13)

the reader may refer to [23, theorem 9.3.3 p. 195]. However, briefly said, equality is achieved fromthe assumption that an optimal control exists, yielding

u(x, t) = Ex,t τ∫t

f (x(s), s, α∗ (x(s), s)) ds+ g (x(τ), τ)

.This equation allows use of theory relating stochastic processes to boundary value problems toobtain the desired result.

In conclusionminα∈Λ

[ut(x, t)− Lαu(x, t) + f(x, t, α)

]= 0 on O,

which completes the proof.

3.2. Dynammic programming. This section explains how the HJB equation can be usedto solve the optimal control problem. Assume that the assumptions of theorem 3.3 hold and thatthe set of controls is the set of Markov controls.

The first step of the Dynammic Programming method is to solve the HJB equation (3.4) toobtain u. Equation (3.13) from the proof of Theorem 3.3 shows that if α∗(·) is an optimal controlthen

ut(x, t)− Lα∗(x,t)u(x, t) + f

(x, t, α∗(x, t)

)= 0.

If u has been found, this equation can be used to define a function α(x, t) that solves this equationon O - this is the second step of the method. This function then defines an optimal controlα∗(·) = α(x(·), ·)3. To see this, under this control, the state x(·) is a Markov diffusion, solution of

dx(s) = b(x(s), s, α(x(s), s)

)ds+ σ

(x(s), s, α(x(s), s)

)dW (s).

Since u ∈ C2(O), by Dynkin’s formula, theorem 3.4 of appendix A,

Ex,tu(x(τ), τ) = u(x, t) + Ex,tτ∫t

ut(x(s), s)− Lα(x(s),s)u(x(s), s)ds.

From the assumptions of theorem 3.3, u(x(τ), τ) = g(x(τ), τ) with probability 1. By definition ofα, after rearranging we have

u(x(t), t) = Ex,t τ∫t

f(x(s), s, α(x(s), s)

)ds+ g(x(τ), τ)

= J (x, t, α∗(·)) .

3Issues of regularity of α(x, t) are not mentioned here, but are important for justifying these arguments.

4. COUNTER-EXAMPLES 7

Therefore α∗ is an optimal control. This outline of an argument is the basis of theorems known asverification theorems, which prove that this strategy does indeed yield an optimal control.

The Dynammic Programming method therefore consists of solving a functional optimisationproblem by solving a nonlinear PDE then solving a set of algebraic optimisation problems.

4. Counter-examples to the Hamilton-Jacobi-Bellman equation in the classical sense

Unfortunately, the conditions of theorem 3.3 do not always hold, even for simple examples,and very often the value function lacks the smoothness to solve the HJB equation in the sense ofequation (3.4).

This section gives very simple examples which illustrates this situation. We present examplesinspired by [16, chapter 2], yet we examine a new sequence of differentiable a.e. solutions to aparticular HJB equation and provide further discussion for the issue with boundary values.

4.1. Nonsmooth value functions. Often the value function is not smooth enough to satisfythe assumptions of theorem 3.3. Even though the value function might satisfy the HJB equationin the pointwise almost everywhere sense, it might not be the only such function.

Example 4.1. Let Λ = [−1, 1], U = (−1, 1), and consider the one dimensional optimal controlproblem with state dynamics

x(s) = α s ∈ [t, 1],

and cost functional

J(x, t, α(·)) =

τ∫t

1ds.

where τ is the time of first exit from U . It is clear that to minimise the cost functional, an optimalcontrol must make the state reach the boundary as rapidly as possible.

If |x| ≥ t then the boundary can be reached by time T = 1, otherwise it cannot. Therefore anoptimal control is α(·) = sign(x), and the value function is

u(x, t) =

1− |x| if |x| ≥ t,1− t if |x| < t.

The HJB equation of theorem 3.3 simplifies to

−ut + |ux| − 1 = 0 on (−1, 1)× (0, 1); (4.1a)

u = 0 on −1, 1 × (0, 1) ∪ (−1, 1)× 1 . (4.1b)

Clearly the value function is not differentiable along |x| = t and thus does not satisfy the PDE onthis set. It does however satisfy the HJB equation almost everywhere in O. But it is not the onlyfunction to do so.

The following is a novel example of other functions which satisfy this PDE almost everywhere.

Let g(x) = max(x, 0)−max(x− 1, 0). Let k ∈ N and set

hk(x) =1

2k

2k−1∑j=1

g(2k+1x− 4j);

Note that h′k exists a.e. and is either 2 or 0, so

d

dx

(hk(1− |x|)

)=

2 or 0 x < 0,

−2 or 0 x ≥ 0.a.e.

and furthermore hk(0) = 0, hk(1) = 1/2− 1/2k. Define

wk(x, t) = min(1− |x| − hk(1− |x|), 1− t). (4.2)

The function wk satisfies the boundary conditions and the HJB equation almost everywhere.

Since hk(1) = 1/2 − 1/2k, for t < 1 − 1/2k, wk(x, 0) = 1 − |x| − hk (1− |x|), The case k = 1gives w1(x) = 1− |x|, but for k ≥ 2, the functions all differ.


x

w5(x, 0)

-1 -0.5 0.5 1

0.3

0.6

0

Figure 1. Graph of w5(x, 0) as defined by equation (4.2). The derivative existsalmost everywhere, with values 1 and −1, thus satisfying the HJB equation (4.1)almost everywhere.

There are therefore infinitely many pointwise a.e. solutions to the HJB equation. A differentsequence of almost everywhere solutions can be found in [16, p. 61].

4.2. Boundary data and continuity up to the boundary. Not only is it possible for thevalue function to fail to satisfy the HJB equation on the interior of the domain, but it may alsofail to agree with the boundary data in equation (3.4). This happens when it is optimal for theproblem to avoid parts of the boundary.

Example 4.2. [16, p. 61]. Let O = (0, T )× (−1, 1), let the state dynamics be

x(s) = α s ∈ (t, T ],

and define the cost functionalJ (x, t, α(·)) = g (x(τ), τ) ,

with g(x, t) = x. The HJB equation is

− ut + |ux| = 0. (4.3)

It is clear that it is optimal to steer the state towards x = −1. So the optimal control is α∗ = −1 andif x(T ) = x− (T − t) < −1 exit is achieved before time T and g (x(τ), τ) = −1. If x− (T − t) ≥ −1then exit is not achieved and g (x(τ), τ) = x(T ) = x− (T − t). Therefore the value function is

u(x, t) =

−1 if x− (T − t) < −1,

x− (T − t) if x− (T − t) ≥ −1.

So for any t < T , u(1, t) < 1, i.e. u(1, t) < g(1, t) and the boundary condition is not satisfied. Atfirst this situation might seem to be resolved if we define the stopping time to be the time of firstexit from O and not O. Indeed, doing so would guarantee u = g on ∂O. But then another problemhas been introduced, namely that for any t < T , u(x, t) is discontinuous at x = 1, so u /∈ C

(O).

The conclusion is that the conditions u = g and u ∈ C(O)

are not always compatible.

4.3. Sufficient conditions for smoothness and uniform continuity. Examples 4.1 and4.2 involved simple and reasonable optimal control problems. Therefore it should be expected thatsignificant further assumptions would be needed to ensure smoothness and uniform continuity ofthe value function. This short paragraph explains which features of an optimal control problemdetermine these properties, by quoting two theorems which can be found in [16].

Theorem 4.3 (Krylov). [16, p. 162]. If the following hold:

• The set Λ is compact;• U is bounded and ∂U is of class4 C3

4[15, p. 626]. The boundary ∂U ⊂ Rn is of class Ck if for every point x0 ∈ ∂U there is d > 0 and a mapΦ: Rn−1 7→ Rn such that after possibly permuting the axes,

U ∩B (x0, d) = x ∈ B(x0, d) |xn > Φ (x1, . . . , xn) .Roughly speaking, the boundary is locally the graph of a Ck function.

4. COUNTER-EXAMPLES 9

• The functions a,b and f , with their t-partial derivative and first and second x-partialderivatives are in C

(O × Λ

);

• g ∈ C3 ([0, T ]× Rn);

and furthermore, there exists γ > 0 such that for every (x, t) ∈ O and α ∈ Λ, a(x, t, α) is such thatn∑

i,j=1

aij(x, t, α)ξiξj ≥ γ |ξ|2 for all ξ ∈ Rn. (4.4)

Then the HJB equation (3.4) has a unique classical solution w ∈ C(O)

with continuous t-partialderivative and continuous first and second x-partial derivatives.

If the operators Lα satisfy (4.4), then Lα are called uniformly elliptic. Uniform ellipticity playsan important role in the theory of linear PDE and will be studied more closely in the next chapter.Example 4.1 did not satisfy this uniform ellipticity assumption because the HJB equation reducedto a first order equation.

Deterministic optimal control problems do not satisfy this condition of uniform ellipticity. How-ever for those optimal control problems that do satisfy the above conditions, this theorem is notsufficient to claim that the value function is the smooth solution, because of the problem of theboundary value of the value function.

Denote ρ(x) the signed distance from x to U , defined by

ρ(x) =

− infy∈∂U |x− y| if x ∈ U ;

infy∈∂U |x− y| if x /∈ U.

Theorem 4.4. [16, p. 205]. Under the assumptions of section 2, if the following further assump-tions hold, then the value function u ∈ C

(O).

• Assume that ∂U is smooth, i.e. of class C∞, and that there exists smooth α : [0, T ]×Rn 7→ Λsuch that for every (x, t) ∈ ∂U × [0, T ]

Lα(x,t)ρ(x) = −Tr a (x, t, α(x, t))D2xρ(x)− b (x, t, α(x, t)) ·Dxρ(x) < 0. (4.5)

• The lateral boundary data g|[0,T )×Rn can be extended to g ∈ C3b ([0, T ]× Rn) such that

− gt(x, t) +H(x, t,Dxg(x, t), D2

xg(x, t))≤ 0 for all (x, t) ∈ [0, T ]× Rn. (4.6a)

g(x, T ) ≤ g(x, T ) for all x ∈ Rn; (4.6b)

where it is recalled that H was defined by equation (3.5).

Condition (4.5) is about the possibility of making the process exit the boundary if it is so desired.To see this, consider the deterministic optimal control problem, with a = 0. Then condition (4.5)is

b (x, t, α(x, t)) ·Dxρ(x) > 0;

which means that there exists a choice of α such that b points towards the exterior of U and thusallows one to steer the state

x(s) = b (x(s), s, α)

out of U .

In example 4.2, condition (4.5) was satisfied because b could take all values in [−1, 1]. Howeverthe condition of equation (4.6) was not satisfied, because any smooth enough extension g wouldhave to satisfy both

−gt(1, t) + 1 ≤ 0 for all t ∈ [0, T ]

and g constant along (1, t) for t < T .

Example 4.5. An example of a significant type of optimal control problem that does satisfy theconditions of theorem 4.4 and does give a value function assuming the boundary data, is what maybe called the “soonest exit” problem: let g = 0 and let f ≥ 0 for all (x, t) ∈ O and α ∈ Λ and assumethat condition (4.5) is satisfied along with the other assumptions. The optimal control problem ofexample 4.1 is included in this category.


First, we see that u ≥ 0 on ∂O, and it is clear that one should aim to steer the state out of thedomain as soon as possible. Loosely speaking, condition (4.5) then states that because it is possibleto choose α(·) such that J (x, t, α(·)) = 0 for (x, t) ∈ ∂U × [0, T ), then u ≤ 0 on ∂O, thus u = 0 on∂O.

4.4. Conclusion. Part of the Dynammic Programming method for solving the optimal controlproblem consists of “solving” the HJB equation in order to obtain the value function. But theexamples of the preceding paragraph shows that there can be many pointwise a.e. solutions of theequation. In other words, the pointwise a.e. sense of the HJB equation is not a condition restrictiveenough to single out the value function u.

This is why another notion of solution of the HJB equation is needed - this notion must satisfythe criterion that the value function should be the unique solution under this notion of the HJBequation. Chapter 3 introduces this notion of solution, called viscosity solution, and shows that itis the relevant notion for obtaining the value function of the optimal control problem.

However, before leaving the subject of the applications of HJB equations, it will be seen thatHJB equations are relevant to other problems than optimal control. Chapter 2 will show how HJBequations are connected to Monge-Ampere equations and mean-field game equations.

CHAPTER 2

Connections to Monge-Ampere Equations and Mean-Field Games

1. Introduction

In chapter 1, it was seen how optimal control problems are related to Hamilton-Jacobi-Bellmanequations. Although this is the application of HJB equations which is primarily held in mindthroughout this work, it is by no means the only problem related to HJB equations.

This chapter aims to give a concise presentation of some connections between HJB equationsand other PDE, namely certain Monge-Ampere equations and mean-field game equations. As aconsequence, HJB equations can be related to several areas of mathematics.

In section 2, it will be seen that some elliptic Monge-Ampere equations are in fact equivalent toHJB equations. The treatment given here is based on [19], and is restricted to the essential stepsused to prove this equivalence result. A reason for so doing is that the proof is taken to be themain focus, since it features a number of general results in matrix analysis. Further equivalencesof Monge-Ampere equations and HJB equations are detailed in [19].

In section 3, we review some elements of the inaugural works [20], [21] and [22] on mean-fieldgames. Mean-field games are a newly introduced system of PDE, proposed by J.-M. Lasry andP.-L. Lions in 2006, which are conjectured to model a system of a large number of “players” eachsolving their own optimal control problem, where each player’s actions are influenced by the overalldistribution of players.

The mean-field game equations are a coupled system of a Fokker-Planck equation with a HJBequation. In section 3, first of all, the Fokker-Planck equation is derived in the context of an Itodiffusion, then an explanation is given as to why one may conjecture the mean-field game equations.Finally, some results announced in [21] are reviewed.

2. Monge-Ampere equations

In this section we prove the following theorem which relates some Monge-Ampere equations toHamilton-Jacobi-Bellman equations. It would be beyond the scope of this work to give a detailedaccount of Monge-Ampere equations, however let us merely indicate that Monge-Ampere equationsfind applications, amongst others, in differential geometry. For instance the problem of finding aconvex hypersurface in Rn with prescribed Gaussian curvature can be described by a Monge-Ampereequation of the form given below.

All the results of this section are found in [19]. However, a number of points in the proofs havebeen elaborated upon, in particular several calculations summarised in [19] are given in detail.Several other results used in the proofs of [19] have been either detailed or clearly indicated andreferenced.

Let

S(n,R)+ = A ∈ S(n,R) |A positive semi-definite , (2.1)

S1 (n,R)+ =A ∈ S(n,R)+ |TrA = 1

. (2.2)

Theorem 2.1. [19]. Let U be an open set in Rn, n ≥ 2, and u ∈ C2 (U). Let f ∈ C (U × R× Rn)be a strictly positive function. Then u solves the Monge-Ampere equation

detD2u(x) = [f (x, u(x), Du(x))]n on U, (2.3a)

D2u(x) positive definite on U ; (2.3b)

11

12 2. CONNECTIONS TO MONGE-AMPERE EQUATIONS AND MEAN-FIELD GAMES

if and only if u solves the Hamilton-Jacobi-Bellman equation

minA∈S1(n,R)+

[TrAD2u(x)− nf (x, u(x), Du(x)) (detA)

1n

]= 0 on U. (2.4)

Remark 2.2. It will be seen below that several other equivalent phrasings of the above result arepossible. In particular, one may require in the Monge-Ampere equation that D2u(x) be merelypositive semi-definite, and the HJB equation may be written as

min

n∑i=1

Ai∂2u

∂x2i

(x)− nf (x, u(x), Du(x))

(n∏i=1

Ai

) 1n∣∣∣∣∣Ai ≥ 0,

n∑i=1

Ai = 1

= 0,

which shows that the “control” set can be taken to be compact.

2.1. Intermediary results. To show this, we will use several intermediate results. We willalso make use of the geometric-arithmetic mean inequality. For completeness, we recall and provethis inequality.

Lemma 2.3 (Geometric-arithmetic mean inequality). Let n ∈ N and aini=1 be a collection ofpositive numbers, i.e. ai ≥ 0. Then (

n∏i=1

ai

) 1n

≤ 1

n

n∑i=1

ai. (2.5)

Proof. The proof makes use of Young’s inequality, [15, p. 622] and an induction argument. For n = 1,the result is trivially true. Now(

n+1∏i=1

ai

) 1n+1

= (an+1)1

n+1

(n∏i=1

ai

) 1n+1

≤ 1

n+ 1(an+1)

n+1n+1 +

(1− 1

n+ 1

)( n∏i=1

ai

) 1n+1 (1− 1

n+1 )−1

≤ 1

n+ 1an+1 +

n

n+ 1

(n∏i=1

ai

) 1n

.

By the induction hypothesis, (n∏i=1

ai

) 1n

≤ 1

n

n∑i=1

ai;

we conclude that (n+1∏i=1

ai

) 1n+1

≤ 1

n+ 1an+1 +

1

n+ 1

n∑i=1

ai =1

n+ 1

n+1∑i=1

ai,

thus completing the proof.

Lemma 2.4. [19]. If A,B ∈ S(n,R)+, then

n (detAB)1n ≤ TrAB. (2.6)

If B ∈ S(n,R)+, then

(detB)1n =

1

ninf

A∈S(n,R)+

detA=1

TrAB. (2.7)

Proof. First, for M ∈ S(n,R)+, M is similar to the diagonal matrix diag (λini=1), whereby positive semi-definiteness, all λi are positive, i.e. λi ≥ 0. Thus detM =

∏ni=1 λi and TrM =∑n

i=1 λi. A direct use of the geometric-arithmetic mean inequality shows that

n (detM)1n ≤ TrM. (2.8)

Now suppose that A ∈ S(n,R)+ and that B ∈ S(n,R) is positive definite. Then the Choleskyfactorisation theorem, [28, theorem 3.2 p. 90], shows that there exists C ∈M(n,R) such that

B = CCT .

2. MONGE-AMPERE EQUATIONS 13

HenceTrAB = TrACCT = TrCTAC.

Since CTAC ∈ S(n,R)+,

TrCTAC ≥ n(detCTAC

) 1n = n (detAB)

1n ,

which is (2.6) for the special case where B is positive definite. For B ∈ S(n,R) merely positivesemi-definite, we prove (2.6) by perturbation: let ε > 0. Then B + εI, I the n× n identity matrix,is positive definite. Thus for all ε > 0,

TrAB + εTrA ≥ n (detA (B + εI))1n .

Since det : M(n,R) 7→ R is continuous, taking the limit ε→ 0 in the above inequality shows (2.6).

We now show (2.7). Firstly, since A was arbitrary in (2.6), we have

n (detB)1n ≤ inf

A∈S(n,R)+

detA=1

TrAB.

For ε > 0 and B ∈ S(n,R)+, consider

Aε = (det (B + εI))1n (B + εI)−1, (2.9)

where the matrices Aε exist because for all ε > 0, B + εI is symmetric positive definite and thusinvertible. Furthermore detAε = 1 and Aε are positive semi-definite. We furthermore note thatA ∈ GL(n,R) 7→ A−1 ∈ GL(n,R) is a continuous map. Therefore

limε→0

TrAεB = limε→0

(det (B + εI))1n Tr (B + εI)−1B

= (detB)1n Tr I = n (detB)

1n ,

which proves (2.7).

Proposition 2.5. [19]. Let n ≥ 2. If B ∈ S(n,R)+, then for all i ∈ 1, . . . , n− 1,

detB ≤

i∏j=1

Bjj

det B, (2.10)

where B denotes the lower principal (n− i)× (n− i) submatrix of B, i.e.(B)rs

= (B)r+i,s+i for

all r, s ∈ 1, . . . , n− i. In particular the case i = n− 1 implies

detB ≤n∏j=1

Bjj . (2.11)

Proof. We begin by proving it for the case i = 1. Let

A =A ∈ S(n,R)+ | detA = 1, (A)1j = 0, (A)j1 = 0 if j 6= 1

Note that if A ∈ A, then the condition detA = 1 implies that A11 > 0. Since A ⊂ S(n,R)+ ∩A | detA = 1, by (2.7), we have

n (detB)1n ≤ inf

A∈ATrAB.

Now

infA∈A

TrAB = infA11>0

infA∈S(n−1,R)+

det A=1

[A11B11 +

(1

A11

) 1n−1

Tr AB

], (2.12)

where A, B are the lower principal (n − 1) × (n − 1) submatrices of A and B respectively, i.e.(B)rs

= (B)r+1,s+1. Equation (2.12) is justified because if det A = 1, then

det

((1

A11

) 1n−1

A

)=

1

A11,


so the matrix (A11 0

0 A

)∈ A.

By (2.7) applied to B,

infA∈S(n−1,R)+

det A=1

Tr AB = (n− 1)(

det B) 1n−1

,

so

infA∈A

TrAB = infA11>0

[A11B11 + (n− 1)

(B) 1n−1

(1

A11

) 1n−1

]. (2.13)

We will now find the infimum explicitly and detail the calculation. We begin by assuming thatB ∈ S(n,R)+ is positive definite, so that det B > 0. For x > 0,

d

dx

[xB11 + (n− 1)

(det B

) 1n−1

(1

x

) 1n−1

]= B11 −

(det B

) 1n−1

(1

x

) nn−1

.

An minimum is attained at

x =

B11(det B

)1/(n−1)

−n−1

n

= (B11)−n−1n

(det B

) 1n.

Therefore evaluating (2.13) at this minimum point gives

infA11>0

[A11B11 + (n− 1)

(B) 1n−1

(1

A11

) 1n−1

]

= B11 (B11)−n−1n

(det B

) 1n

+ (n− 1)(

det B) 1n−1

(B11)1n

(det B

)− 1n(n−1)

, (2.14)

so

infA11>0

[A11B11 + (n− 1)

(B) 1n−1

(1

A11

) 1n−1

]= n (B11)

1n

(det B

) 1n.

Recalling (2.7) for B, we thus have

n (detB)1n ≤ n (B11)

1n

(det B

) 1n,

or equivalently, for B positive definite,

detB ≤ B11 det B.

Now the case where B is positive semi-definite follows by a perturbation argument, and fromcontinuity of the terms in the inequality. This shows the claim of the proposition for the case i = 1.The case of general i follows by inductively applying the above result to the submatrices of B.

Lemma 2.6. [19]. Let D ∈ S(n,R) and let f ∈ R be a strictly positive number. Then

maxA∈S1(n,R)+

[TrAD + f (detA)

1n

]= 0 (2.15)

if and only if

n (det−D)1n = f, (2.16a)

D negative definite. (2.16b)

Proof. Without loss of generality, we may assume that D is a diagonal matrix. This is becauseD ∈ S(n,R) is diagonalisable, and all of the terms involving D in the claim are conserved underchanges of basis.

2. MONGE-AMPERE EQUATIONS 15

For A ∈ S(n,R)+, by (2.10),

detA ≤n∏i=1

Aii,

therefore, since f > 0,

TrAD + f (detA)1n ≤ TrAD + f

(n∏i=1

Aii

) 1n

.

This shows that

maxA∈S1(n,R)+

[TrAD + f (detA)

1n

]= max

A∈S1(n,R)+

A diagonal

[TrAD + f (detA)

1n

]

= max

n∑i=1

AiDi + f

(n∏i=1

Ai

) 1n∣∣∣∣∣Ai ≥ 0,

n∑i=1

Ai = 1

.

We begin by showing that (2.15) implies (2.16). So assume that

max

n∑i=1

AiDi + f

(n∏i=1

Ai

) 1n∣∣∣∣∣Ai ≥ 0,

n∑i=1

Ai = 1

= 0.

Then in particular, for each i ∈ 1, . . . , n, choosing Aj = δij , we obtain

Di ≤ 0,

Furthermore, we show that Di < 0 for all i. Suppose that for some i ∈ 1, . . . , n, Di = 0. Forε > 0, let Aεj = ε, j 6= i and Aεi = 1−(n−1)ε. Then Aε is diagonal, positive definite, and TrAε = 1,

thus Aε ∈ S1(n,R)+. Also, since Di = 0,

TrAεD = ε∑j 6=i

Dj = ε

n∑j=1

Dj ;

and

f (detAε)1n = f

(1− (n− 1)ε)

(n−1)∏i=1

ε

1n

= fε

(1

ε− (n− 1)

) 1n

.

First, suppose that∑n

i=j Dj = 0, i.e. Dj = 0 for all j. Then

TrAεD + f (detAε)1n = f (detAε)

1n > 0,

which contradicts the assumption (2.15). So∑n

j=1Dj < 0. From the hypothesis that f > 0,

limε→0

∣∣∣∣∣∣fε(

1ε − (n− 1)

) 1n

ε∑n

i=1Di

∣∣∣∣∣∣ = limε→0

f

|∑n

i=1Di|

(1

ε− (n− 1)

) 1n

=∞.

Therefore there exists ε > 0 and hence a corresponding Aε ∈ S1(n,R)+ such that

TrAεD + f (detAε)1n > 0,

which contradicts the assumption (2.15). Therefore Di < 0 for all i ∈ 1, . . . , n and D is negativedefinite.

By hypothesis, the maximum in (2.15) is attained, so there exists A diagonal, positive semi-definite, with TrA = 1, such that

TrAD + f (detA)1n = 0.


Note that detA > 0, because if there were i such that Ai = 0, then the fact that TrA = 1 wouldimply that TrAD < 0 which would be a contradiction. By (2.6) applied to A and −D,

TrA(−D) ≥ n (detA)1n (det(−D))

1n ,

so

TrAD ≤ −n (detA)1n (det(−D))

1n ,

and

(detA)1n

(f − n (det(−D))

1n

)≥ 0.

Since detA > 0, this implies that

f ≥ n (det(−D))1n . (2.17)

To show that f ≤ n (det(−D))1/n, let

Ai =−D−1

i∑nj=1 |Dj |−1 .

Then Ai ≥ 0 since Di < 0 and TrA = 1. By (2.15),

0 ≥ TrAD + f (detA)1n

≥n∑i=1

−1∑nj=1 |Dj |−1 + f

1(∑nj=1 |Dj |−1

)n n∏i=1

(1

−Di

) 1n

.

This gives

n (det−D)1n ≤ f, (2.18)

thus showing that (2.15) implies (2.16).

Now assume (2.16). For any A ∈ S1(n,R)+, by (2.6),

TrAD + f (detA)1n ≤ (detA)

1n

(f − n (det−D)

1n

)= 0,

hence

supA∈S1(n,R)+

[TrAD + f (detA)

1n

]≤ 0.

Now choose as before

Ai =−D−1

i∑nj=1 |Dj |−1 .

Then as before, by hypothesis (2.16),

TrAD + f (detA)1n = 0,

thus showing that

maxA∈S1(n,R)+

[TrAD + f (detA)

1n

]= 0.

2.2. Proof of theorem 2.1. The theorem is simply an application of lemma 2.6. Let U ⊂ Rnbe open, n ≥ 2 and let f ∈ C (U × R× Rn) be such that f(x, r, p) > 0 for all (x, r, p) ∈ U ×R×Rn.Let g = nf . For u ∈ C2(U), let v = −u. Then for each x ∈ U ,

detD2u(x) = [f (x, u(x), Du(x))]n (2.19a)

D2u(x) positive definite (2.19b)

is equivalent to

n(det−D2v(x)

) 1n = g (x, u(x), Du(x))

D2v(x) negative definite.

3. MEAN-FIELD GAMES 17

By lemma 2.6, where g (x, u(x), Du(x)) plays the role of the real number f , we see this is equivalentto

maxA∈S1(n,R)+

[TrAD2v(x) + g (x, u(x), Du(x)) (detA)

1n

]= 0,

which is finally equivalent to

minA∈S1(n,R)+

[TrAD2u(x)− nf (x, u(x), Du(x)) (detA)

1n

]= 0. (2.20)

3. Mean-field games

The reader may wish to recall some of the notation of chapter 1, section 2 and of appendix Aon stochastic differential equations.

Chapter 1 introduced optimal control problems, which ask how a process ought to be steeredin order to minimise a cost functional. Mean-field games extend the idea of an optimal controlproblem for a single agent to a coupled system of optimal control problems for multiple agents. Itis conjectured in [20], [21] and [22] that a coupled system of a HJB equation and a Fokker-Planckequation are suitable models for this situation. The aim of this section is thus to present thebasic concepts of mean-field games and to review some of the results which are announced in thesepapers.

3.1. The Fokker-Planck equation. Before introducing mean-field games, we will derive theFokker-Planck equation in the setting of Ito diffusions. This will be helpful in order to understandthe appearance of this equation in the system of mean-field game equations. A reason for includingthe derivation is that it is often left as an exercise in textbooks such as [23] or [25]. It is interestingbecause the derivation of the strong form of the Fokker-Planck passes via a weak form.

We will make use of the following function space. Let

C1([0, T ];H1(Rn)

)(3.1)

denote the set of continuous functions v : [0, T ] 7→ H1 (Rn) that have a continuous extension

v : (−δt, T + δt) 7→ H1 (Rn) ,

for δt > 0, such that there exists a continuous function ∂tv : (−δt, T + δt) 7→ H1 (Rn) that satisfiesfor every t ∈ (−δt, T + δt),

limh→0

1

h‖v(·, t+ h)− v(·, t)− h∂tv(·, t)‖H1(Rn) = 0.

The restriction of ∂tv to [0, T ] is denoted ∂tv. Let the norm on C1([0, T ];H1 (Rn)

)be defined as

‖v‖C1([0,T ];H1(Rn)) = supt∈[0,T ]

[‖v(·, t)‖H1(Rn) + ‖∂tv(·, t)‖H1(Rn)

]. (3.2)

The reader may find appendix A helpful for this section. Consider a SDE of the form

dx(t) = b (x(t), t) dt+ σ (x(t), t) dW (t); (3.3a)

x(0) = x; (3.3b)

where the assumptions on b, σ, and x are those of theorem 2.1 of appendix A. In addition assumethat for every t ∈ (0, T ), b(·, t), σ(·, t) ∈ C1(Rn). Recall that a(x, t) = 1/2σ(x, t)σ(x, t)T .

Suppose that the random variable x has probability density function p0(·). The question posedis to find the probability density function p(·, t) of the random variable x(t), for t ∈ [0, T ]. It willbe seen that, under certain assumptions, the probability density function p solves a PDE called theFokker-Planck equation.

Theorem 3.1 (Fokker-Planck equation). Suppose that p0 ∈ H1(Rn). Assume that for every t ∈[0, T ], there exists p(·, t) ∈ H1 (Rn) a probability density function of the random variable x(t), suchthat for all f ∈ C∞0 (Rn),

E [f (x(t))] =

∫Rn

f(y)p(y, t)dy. (3.4)


Furthermore, assume that p : t 7→ p(·, t) is in C1([0, T ];H1 (Rn)

), that p(·, 0) = p0(·) and that for

all f ∈ C∞0 (Rn), t ∈ (0, T ), t+ h ∈ (0, T ),

Et+h∫t

f (x(s)) ds =

t+h∫t

∫Rn

f(y)p(y, s)dyds. (3.5)

Then p solves the Fokker-Planck PDE in the weak form: for all f ∈ C∞0 (Rn), a.e. t ∈ (0, T ),∫Rn

pt(y, t)f(y) +n∑

i,j=1

∂

∂xi(aijp) (y, t)

∂f

∂xj(y) +

n∑i=1

∂

∂xi(bip) (y, t)f(y)dy = 0; (3.6a)

p(·, 0) = p0(·). (3.6b)

Now suppose that for all t ∈ (0, T ), σ(·, t) ∈ C2(Rn). If furthermore pt(·, t) ∈ C (Rn) for almostall t ∈ (0, t), p(·, t) ∈ C2 (Rn) for all t ∈ (0, T ), and p(·, 0), p0 ∈ C (Rn), then p solves the Fokker-Planck PDE in the strong form

pt(x, t)−n∑

i,j=1

∂2

∂xi∂xj(aijp) (x, t) +

n∑i=1

∂

∂xi(bip) (x, t) = 0 for a.e. t ∈ (0, T ), all x ∈ Rn;

(3.7a)

p(x, 0) = p0(x) for all x ∈ Rn. (3.7b)

We remark that several further generalisations of the above theorem are possible, with weakerregularity assumptions on p.

Proof. By Dynkin’s formula, theorem 3.4 of appendix A, for any f ∈ C∞0 (Rn), s ∈ (0, T )fixed,

E [f(x(s))] = f(x(0)) + Es∫

0

n∑i,j=1

aij(x(s), s)∂2f

∂xi∂xj(x(s)) +

n∑i=1

bi(x(s), s)∂f

∂xi(x(s))ds.

By substracting the above equation for s = t from the equation for s = t+ h, we find, after usingthe hypotheses on p, that∫Rn

f(y) (p(y, t+ h)− p(y, t)) dy =

t+h∫t

∫Rn

n∑i,j=1

aij(y, s)∂2f

∂xi∂xj(y) +

n∑i=1

bi(y, s)∂f

∂xi(y)

p(y, s)dyds.Since p ∈ C1

([0, T ], H1(Rn)

), and f ∈ C∞0 (Rn) implies that f ∈ L2(Rn),

limh→0

∣∣∣∣∣∣∫Rn

f(y)

(p(y, t+ h)− p(y, t)

h− pt(y, t)

)dy

∣∣∣∣∣∣ ≤ limh→0‖f‖L2(Rn)

1

|h|‖p(·, t+h)−p(·, t)−hpt(·, t)‖L2(Rn) = 0.

Therefore, the previous equation and Lebesgue’s differentiation theorem show that for almost allt ∈ (0, T ), ∫

Rn

f(y)pt(y, t)dy =

∫Rn

n∑i,j=1

aij(y, t)∂2f

∂xi∂xj(y) +

n∑i=1

bi(y, t)∂f

∂xi(y)

p(y, t)dy.Since a(·, t) and b(·, t) are assumed to be in C1(Rn) for all t ∈ (0, T ), integration by parts gives:for almost all t ∈ (0, T )∫

Rn

f(y)pt(y, t)dy = −∫Rn

n∑i,j=1

∂

∂xi(aijp) (y, t)

∂f

∂xj(y) +

n∑i=1

∂

∂xi(bip) (y, t)f(y)dy;

thus showing (3.6).


Under the additional assumptions, the strong form is derived by a further integration by parts∫Rn

n∑i,j=1

∂

∂xi(aijp) (y, t)

∂f

∂xj(y)dy = −

∫Rn

n∑i,j=1

∂2

∂xi∂xj(aijp) f(y)dy;

then using the variational lemma, [1, lemma 7.2.1 p. 274], and continuity, we deduce (3.7).

3.2. Presentation of mean-field games. In this paragraph we present the basic setting ofthe time evolution mean-field game equations, as introduced by [21]. There exists also a stationaryform of mean-field game equations, discussed in [20].

Remark 3.2. It must be signalled that a fully rigorous derivation of the mean-field game equationsof the form we show here is not currently available. In [20] it is claimed that a derivation forthe stationary case exists and in [21] it is claimed that a derivation for the evolutionary case hasbeen rigorously achieved only for certain special cases. We also signal that proofs of the resultsannounced in [21] were, for the most part, not detailed.

Nevertheless, it is possible to put forward arguments, albeit heuristic, explaining the reasons forconjecturing these equations.

In this paragraph, we present the basic concepts involved in a mean-field game, and explainthe reason for considering the mean-field game equations.

In a mean-field game, we consider a large population of “players”, each indexed by i ∈ I, suchthat at each time t ∈ [0, T ], each player is represented by a point xi(t) ∈ Q = [0, 1]n, for i ∈ I.The density of players at a specific x ∈ Q, t ∈ [0, T ], is m(x, t), with m(·, t) a probability densityfunction on Q for all times t ∈ [0, T ]. We assume that the initial density m(·, 0) = m0(·) is given.

As is done in [21], we identify the opposite faces of Q, thus Q = Tn the n-torus.

The position xi(t) of a given player i evolves according to a SDE, and the player steers hisprocess by a choice of control. For a control set A, with controls taking values in Λ, each playerchooses a control αi(·) ∈ A, and the SDE for the player’s position is

dxi(s) = b (xi(s), s, αi(s)) ds+ σ (xi(s), s, αi(s)) dWi(s), (3.8a)

xi(0) = xi; (3.8b)

where Wii∈I are independent Brownian motions and xii∈I is a collection of mutually indepen-dent random variables with probability density m0(·).

Each player attempts to minimise his respective cost functional, wich is similar to the costfunctional of chapter 1, but with the addition of terms representing the interactions between players.More specifically, let V and v0 be maps from a suitable space of functions on Q to another suitablespace of functions on Q. The cost functional is

J (x, t, α(·)) = Ex,t T∫t

f (x(s), s, α(s)) + V [m(·, s)] (x(s))ds+ v0 [m(·, T )] (x(T ))

. (3.9)

Therefore the i-th player calculates his cost as

J (xi, 0, αi(·)) = Exi,0 T∫

0

f (xi(s), s, αi(s)) + V [m(·, s)] (xi(s))ds+ v0 [m(·, T )] (xi(T ))

.Example 3.3. We suggest some potential choices of V to explain its possible interpretations.Suppose that for some given problem, costs are incurred if a player’s position is far from the averageposition of the population. This could be modelled by

V [m] (x) =

∣∣∣∣∣∣∣x−∫Q

ym(y)dy

∣∣∣∣∣∣∣ .


Suppose that for some problem, costs are incurred for being in a highly populated region. For ε > 0fixed, a possible choice of V could be the local average of m over a ball of radius ε,

V [m] (x) =1

VolB(0, ε)

∫B(x,ε)

m(y)dy,

or alternatively, it could be chosen to be

V [m] (x) = m(x).

Remark 3.4. The above assumptions are sometimes summarised by saying that the players are in-distinguishable, i.e. any permutation of two players yields the same system, and that each player hasnegligible effect on the overall system, which results from the fact that only the overall distributionof the players affects the cost functional. The players are also said to have rational expectations,i.e. they aim to minimise the true expectation of their cost functional.

Defining the value function as in chapter 1,


J (x, t, α(·)) ,

we suggest that u might solve the HJB equation, under the usual notation,

−ut(x, t) + supα∈Λ

[Lαu(x, t)− f(x, t, α)− V [m(·, t)](x)] = 0, for all (x, t) ∈ Q× (0, T ), (3.10a)

u(x, T ) = v0 [m(·, T )] (x) for all x ∈ Q. (3.10b)

The HJB PDE may be equivalently written as

−ut(x, t) + supα∈Λ

[Lαu(x, t)− f(x, t, α)] = V [m(·, t)](x).

Assume that there exists a Markov control α∗ such that

u(x, t) = J (x, t, α∗(·)) ,

provided all players opt for the control α∗.

Remark 3.5. An observation is due at this stage. We make the assumption that α∗ is optimalfor a specific set of choices of the player’s controls because the overall distribution m depends onthe choice of control that each player makes. The distribution m thus typically influences the costfunctional. This shows the game-theoretic aspect of a mean-field game.

We now assume that all players opt for the control α∗. As a result, each player chooses thesame control, which is optimal for every player, and for each i ∈ I, the SDE reduces to

dxi(s) = b (xi(s), s, α∗(x(s), s)) ds+ σ (xi(s), s, α

∗(xi(s), s)) dW (s),

xi(0) = xi

Thus, theorem 3.1, leads us to expect that the distribution of players m should solve the Fokker-Planck equation: for all x ∈ Q, t ∈ (0, T ),

mt(x, t)−n∑

i,j=1

∂2

∂xi∂xj[aij (·, t, α∗(·, t))m(·, t)] (x) +

n∑i=1

∂

∂xi[bi (·, t, α∗(·, t))m(·, t)] (x) = 0

(3.11a)

m(x, 0) = m0(x). (3.11b)

The mean-field game equations is then the coupled system (3.10) and (3.11).

To write these equations in the form used in [21], we simplify the problem by assuming thatWi is a n dimensional Brownian motion, σ =

√2νI, I ∈ M(n,R) the identity matrix, ν > 0, and

b(x, t, α) = b(x, α), f(x, t, α) = f(x, α). Then let

H(x, p) = supα∈Λ

[b(x, α) · p− f(x, α)] .


Assume that for all x ∈ Q, H(x, ·) ∈ C1(Rn). From optimal control theory, one expects thatfor all p ∈ Rn,

H(x, p) = b (x, α∗(x)) · p− f(x, α∗(x)).

thus∂H

∂p(x, p) = b (x, α∗(x)) .

Therefore the mean field game equations become

−ut − ν∆u+H(x,Du) = V [m] on Q× (0, T ) (3.12a)

mt − ν∆m+ div

(∂H

∂p(x,Du)m

)= 0 on Q× (0, T ) (3.12b)

m(·, 0) = m0(·), u(·, T ) = v0 [m(·, T )] (·) on Q. (3.12c)

Example 3.6. We give an example for which the assumptions on H stated above can be verified.Let Λ = Rn, f(x, α) = 1/2 |α|2, b(x, α) = α. Then

α · p− 1

2|α|2

is maximised by α∗ = p, thus

H(x, p) = |p|2 − 1

2|p|2 =

1

2|p|2 ,

so∂H

∂p(x, p) = p = α∗.

Remark 3.7. The mean-field game equations constitute a nonlinearly coupled system of parabolicPDE. An important observation is that the HJB equation evolves backward in time, whereas theFokker-Planck equation evolves forward in time. As a result, the boundary data for u, namelyv0 [m(·, T )], is unknown if v0 has a non-trivial dependence on m.

3.3. Announced results. We now quote some of the results claimed in [20], [21] and [22].

Firstly, in [20], it is said that the mean-field game equations can be rigorously derived for theergodic stationary case as the limiting equations for a system with N players, N finite. Howeverfor the time-evolution problem, the equations remain a conjecture.

Secondly, there are some results concerning the well-posedness of the mean-field game equations.As before, let Q = Tn the n-torus, let m0 ∈ C∞(Q) satisfy∫

Q

m0(x)dx = 1, m0(x) > 0 for all x ∈ Q,

Let H : Q×Rn 7→ R satisfy: H is Lipschitz continuous in the x variable, uniformly bounded in thep variable over Rn, convex and C1 in the p variable.

Suppose that V, v0 : Ck,γ(Q) 7→ Ck+1,γ(Q) for all k ∈ N and γ ∈ (0, 1). For example, ifV [m] = m ∗ ηε, ηε the standard mollifier of radius ε, this assumption is satified. Suppose that

sup

‖V [m]‖C1(Q) + ‖v0[m]‖C1(Q) |m ∈ L1(Q),m ≥ 0,

∫Qm(x)dx = 1

<∞.

Furthermore assume either that there is C ≥ 0 such that∣∣∣∣∂H∂p (x, p)

∣∣∣∣ ≤ C (1 + |p|) for all (x, p) ∈ Q× Rn,

or that ∣∣∣∣∂H∂x (x, p)

∣∣∣∣ ≤ C (1 + |p|) for all (x, p) ∈ Q× Rn.

Theorem 3.8. [21]. Under the above assumptions, there exists a smooth solution (u,m) to themean field game equations (3.12).


In the case where V and v0 are not smoothing operators, but, say, of the form V [m](x) =F (x,m(x)), then under other hypotheses, there exists a weak solution to (3.12); see [21].

Theorem 3.9. [21]. Suppose that the operators V and v0 are strictly monotone, i.e.∫Q

(A[m1](x)−A[m2](x)) (m1(x)−m2(x))dx ≤ 0,

implies A[m1] = A[m2], for A = V, v0. Then there is at most one solution to (3.12).

Example 3.10. A very simple example of a monotone operator is A[m] = m. Averaging andmollifying operators are not generally monotone. The following original example illustrates this.Let V : C∞(R) 7→ C∞(R) be defined by

V [m](x) =

1∫0

m(x+ y)dy for all x ∈ R.

Let Q = [0, 2]. Since V is linear, monotonicity is equivalent to requiring that if

2∫0

V [m](x)m(x)dx ≤ 0,

then V [m] = 0. However let m(x) = sin(πx). Then

V [m](x) =

1∫0

sin(π(x+ y))dy =2

πcos(πx).

But2∫

0

V [m](x)m(x)dx =2

π

2∫0

cos(πx) sin(πx)dx

=1

π2sin2(πx)

∣∣∣20

= 0;

but V [m] 6= 0. So V is not monotone.

Finally, when the conditions of these results are supposed to be violated, it cannot be expectedthat these equations are well-posed. See [21] where certain ill-posed PDE are deduced as specialcases of the mean-field game equations.

3.4. Conclusion. This chapter gave a brief demonstration of some applications of HJB equa-tions beyond optimal control problems. In theorem 2.1, it was shown that some Monge-Ampereequations can be recast as HJB equations. In section 3, the basic concepts of mean-field gameswere explained, and a few results from the early papers were reviewed.

Having thus seen some problems in which HJB equations arise in chapter 1 and this one, chapter3 will turn towards the question of how these equations are to be solved, i.e. in what sense can weexpect to find solutions to these PDE. It will be seen that the relevant notion of solution is thatof viscosity solutions. Subsequent chapters will then show how numerical methods may be used tofind viscosity solutions.

CHAPTER 3

Viscosity Solutions

1. Introduction

In section 4 of chapter 1, it was shown that the value function of an optimal control problemneed not always be differentiable. In such cases, it cannot be expected that the value functionshould solve a PDE in any classical, pointwise sense. But is there some way in which the HJBequation distinguishes the value function from other functions? In this chapter, it will be seen thatthis can be the case, and it is then said that the value function is a generalised solution of the HJBequation.

Generalised solutions play an important role in modern PDE theory, and whilst there are anumber of different notions of generalised solutions suited for different situations, in this chapterwe will introduce viscosity solutions. This form of generalised solution is relevant to a certain classof PDE that includes the HJB equation. The reader may find more on other forms of generalisedsolutions, such as weak solutions, in [15].

The basic objectives of this chapter are to define viscosity solutions and to show how thisnotion of solution is particularly relevant to optimal control problems1. To do this, we presentsome essential properties of viscosity solutions, as applied to HJB equations; these include theconsistency, selectivity and uniqueness properties which may be proven for general HJB equations.Existence of viscosity solutions often results from the fact that the value function of an optimalcontrol problem is a viscosity solution of the corresponding HJB equation.

This chapter also accomplishes certain more subtle objectives. The formulation of viscositysolutions is expressed differently by different sources, and in particular there are differences between[12] and [16]. Some results developed in this chapter will play significant roles in future chapters,yet are not found in complete detail in either of [12] or [16]. It is thus helpful to provide furtherdetails on these points in order to support the arguments found in subsequent chapters.

Section 2 is introductory and serves to show the scope of the theory of viscosity solutions,which is not restricted to applications to HJB equations. Its purpose is to make explicit a numberof points treated as well-known facts in [12, section 1]. In section 3, the notion of a viscositysolution is motivated and defined. In particular, an important equivalence result is given, the proofof which is not found in full detail in either [12] or [16]. The comparison property is presentedwithout proof.

Section 4 shows the relevance of viscosity solutions to optimal control problems, namely thatthe value function is a viscosity solution of the HJB equation. This result is proven in a veryabstract framework in [16, chapter 2]. To remain focused on the form of optimal control problemsconsidered in this work, a proof from [15] is adapted to the bounded domain, deterministic optimalcontrol problem.

2. Elliptic and parabolic operators

Certain differential operators satisfy a property generally called ellipticity, which we introducein this section. This property will play a major role in this chapter and subsequent ones. Thenotion of ellipticity yields the motivation for the notion of a viscosity solutions and sets the rangeof applicability of viscosity theory.

1Precisely speaking, we define the strong form of viscosity solutions for parabolic problems with Dirichlet data.There are a number of different forms of viscosity solutions, adapted for different situations; see [12] and [16] formore on viscosity solutions for elliptic boundary value problems, discontinuous viscosity solutions, and the viscositysense of the boundary data.

23

24 3. VISCOSITY SOLUTIONS

Let U ⊂ Rn be an open bounded set and T > 0. Let M(n,R) be the set of n× n real matricesand S(n,R) be the set of n×n symmetric real matrices. Consider an abstract differential operatorF : Rn × R× R× Rn × S(n,R) 7→ R whose associated PDE is

− ut(x, t) + F(x, t, u(x, t), Dxu(x, t), D2

xu(x, t))

= 0 on O = U × (0, T ) (2.1)

2.1. Elliptic operators. To begin with, we will consider the simpler setting of time indepen-dent elliptic operators, then show how parabolic operators arise from elliptic ones.

Example 2.1. Consider a linear differential operator L : C2 (U) 7→ C(U) of the form

Lu(x) = −n∑

ij=1

aij(x)uxixj (x) +n∑i=1

bi(x)uxi(x), (2.2)

with the symmetry condition aij(x) = aji(x) for all x ∈ U . By defining a(x) ∈ S(n,R) by(a(x)

)ij

=

aij(x) and defining b(x) =(b1(x), . . . , bn(x)

), the operator may be rewritten as

Lu = −Tr a(x)D2u(x) + b(x) ·Du(x). (2.3)

The first notion of ellipticity we encounter is the following:

Definition 2.2 (Ellipticity of Linear Operators). If the linear operator L given by (2.3) has theproperty that for every x ∈ U , a(x) is a symmetric positive definite matrix, then we say that L isuniformly elliptic in U . If a(x) is symmetric positive semi-definite for every x ∈ U , then we saythat L is degenerate elliptic in U .

Recall that for U open, u ∈ C2(U) has a maximum at x ∈ U if and only if Du(x) = 0and D2u(x) is negative semi-definite, which we write as D2u(x) ≤ 02. An important property ofuniformly elliptic operators is the following

Proposition 2.3. Let U ⊂ Rn be open. If u ∈ C2(U) has a maximum at x ∈ U and L is a lineardegenerate elliptic operator of the form (2.3), then

Lu(x) ≥ 0. (2.4)

Proof. Because Du(x) = 0, we have Lu(x) = −Tr a(x)D2u(x). Because a(x) is symmetric,there is an orthogonal matrix P ∈ M(n,R) such that A(x) = P DP T with D = diag(λ1, . . . , λn).Because a(x) is positive semi-definite, λi ≥ 0 for i ∈ 1, . . . , n. Using the fact that for P,Q ∈M(n,R), Tr P Q = Tr QP , we have

Tr a(x)D2u(x) = Tr P DP T D2u(x)

= TrDP T D2u(x)P.

Since D is diagonal, with λi ≥ 0 and(P T D2u(x)P

)ii≤ 0 for i ∈ 1, . . . , n because D2u(x) is

negative semi-definite, we have Tr a(x)D2u(x) ≤ 0 which gives (2.4).

Remark 2.4. As an indication of some of the main aspects of viscosity theory to come shortly,we signal that the above proposition is the first step in deriving the maximum principle for linearuniformly elliptic operators. The (weak) maximum principle states that if L is uniformly ellipticand Lu ≤ 0 then u must achieve its maximum on ∂U . If we take u and v to be equal on ∂U , thisthen tells us that “Lu ≤ Lv implies u ≤ v”.

This is a form of comparison property, and it plays a major role in viscosity theory, in particularwith regards to ensuring uniqueness of solutions. For more on the maximum principle, see [15, p.326].

How does this notion of ellipticity for linear operators generalise to general nonlinear operatorsof the form (2.1)? We can do so by taking the conclusion of proposition 2.3 to be the definition ofdegenerate ellipticity. The motivation for this is that it is possible to verify that many nonlinearoperators, including the HJB operator from the HJB equation of chapter 1, satisfy the conclusionof proposition 2.3, even though they are not of the form (2.3).

2This is the positive semi-definite partial ordering of S(n,R): “P ≥ Q ⇐⇒ P −Q is positive semi-definite.”

2. ELLIPTIC AND PARABOLIC OPERATORS 25

Definition 2.5 (Degenerate Ellipticity). [12]. An operator F : Rn × R × Rn × S(n,R) 7→ R iscalled degenerate elliptic on U if for all x ∈ U , P,Q ∈ S(n,R) with P ≥ Q, and any p ∈ Rn wehave

F (x, r, p, P ) ≤ F (x, r, p,Q). (2.5)

The next proposition verifies that definition 2.5 is consistent with definition 2.2 when theoperator is the form (2.3).

Proposition 2.6. Suppose that the linear operator L is of the form (2.3). Then L is degenerateelliptic in the sense of definition 2.5 if and only if a(x) is positive semi-definite and thus L isdegenerate elliptic in the sense of definition 2.2

Proof. The proof of proposition 2.3 proves the “if” part. Let v ∈ Rn and define P = vvT .We see that P is symmetric and since xTPx = (x · v)2, P is positive semi-definite. Therefore bydefinition 2.5 with p = 0, we have F (x, 0, P ) ≤ 0, which is Tr a(x)P = vT a(x) v ≥ 0. Since v wasarbitrary, this shows that a(x) is positive semi-definite.

The theory of viscosity solutions requires that the operator F be degenerate elliptic. A furtherrequirement is that the operator be proper : for all x ∈ U , r ≥ s, any p ∈ Rn and P ∈ S(n,R),

F (x, r, p, P ) ≥ F (x, s, p, P ).

2.2. Parabolic operators. An operator −∂t +F , where F : Rn×R×R×Rn×S(n,R) 7→ R,whose associated PDE is given in (2.1), is called degenerate parabolic ([12]) if for every t ∈ (0, T ),F (·, t, ·, ·, ·) is degenerate elliptic as defined by definition 2.5. Similarly, −∂t + F is proper ifF (·, t, ·, ·, ·) is proper.

Before concluding this section, let us verify that the operator in the HJB equation is indeeddegenerate parabolic and proper. To recall the notation of the previous chapter, the HJB operatorapplied to u ∈ C2(O) is defined by

− ut(x, t) +H(x, t,Dxu(x, t), D2

xu(x, t))

= −ut(x, t) + maxα∈Λ

[−Tr a(x, t, α)D2

xu(x, t)− b(x, t, α) ·Dxu(x, t)− f(x, t, α)], (2.6)

with a = σσT /2 and the maximum is achieved because of compactness of Λ and continuity of thecoefficients.

To write the operator more abstractly and more compactly, we write

− ut +H(x, t,Dxu,D

2xu)

= −ut + supα∈Λ

[Lαu− fα] , (2.7)

where Lα are linear differential operators defined by

Lαu = −Tr a(·, ·, α)D2xu− b(·, ·, α) ·Dxu

and fα = f (·, ·, α).

Proposition 2.7. The HJB operator in equation (2.7) is degenerate parabolic and proper.

Proof. For shorthand, let F be the HJB operator. Because F does not depend on u, it isproper. First we show that for every α ∈ Λ, the linear operator Lα given by

Lαu(x, t) = −Tr a(x, t, α)D2xu(x, t) + b(x, t, α) ·Dxu(x, t)

is degenerate parabolic. Since a = σσT /2, for any v ∈ Rn,

vT a(x, t, α)v =1

2vTσ(x, t, α)

(σT (x, t, α)v

)=

1

2

∣∣σT (x, t, α)v∣∣2 ≥ 0.

So Lα is degenerate parabolic. Let P ≥ Q. Then by proposition 2.6, we have−Tr a(x, t, α)(P−Q) ≤0 for every α. For p ∈ Rn+1 let us write p = (px, pt) ∈ Rn×R. So, omitting to write the arguments

−pt − Tr aP + b · px − f ≤ −pt − Tr aQ+ b · px − f≤ −pt + max

α∈Λ[−Tr aQ+ b · px − f ] .


Therefore

−pt + maxα∈Λ

[−Tr a(x, t, α)P + b(x, t, α) · px − f(x, t, α)] ≤ −pt +H (x, t, px, Q) ;

which is−pt+H (x, t, px, P ) ≤ −pt+H (x, t, px, Q), so the HJB operator is degenerate parabolic.

3. Viscosity solutions of parabolic equations

This section presents the basic notions and properties of viscosity solutions of degenerate par-abolic equations.

There are a number of equivalent ways of defining viscosity solutions for parabolic PDE. In[16], the notion of viscosity solutions is treated in detail, however for this work, it will be morehelpful to use the definition from [12], where a less restrictive condition is placed on the auxiliarynotions of a viscosity subsolution and supersolution.

To be specific, subsolutions and supersolutions will be allowed to be semi-continuous, whereasin [16], these are taken to be continuous. Although this is a technical detail, this formulationwill provide a more general comparison property which will permit a crucial step in convergencearguments from chapters 6 and 7.

As a result of this slight change, it was necessary to obtain independently a key proof for analready well-known equivalence result between different formulations of viscosity solutions, whichwas left as an exercise in [12] and which was shown only for the more restricted case in [16].

The principal aim of this section is to provide the equivalence result, and to show how thenotion of viscosity solutions satisfies two important properties expected of any generalised solution,sometimes called consistency and selectivity. These mean respectively that classical solutions willbe viscosity solutions and that sufficiently regular viscosity solutions will be classical solutions.

3.1. Motivation. For an open set Q ⊂ Rn×R, writing (x, t) ∈ Q, the space of functions oncecontinuously differentiable in the t variable and twice continuously differentiable in the x-variableon Q will be denoted C(2,1) (Q).3 For a compact set O, we define C(2,1)

(O)

to be the set of functions

v such that v may be extended to a function in C(2,1)(Q) for some Q open set with C1 boundary,with O ⊂⊂ Q.

We define the norm of C(2,1)(O)

by

‖v‖C(2,1)(O) = supO

|v(x, t)|+ |vt(x, t)|+∑|α|≤2

|Dαv(x, t)|

,where α is a multi-index on 1, . . . , n and Dαv the corresponding spatial partial derivative of v.

Henceforth let U be a bounded open set in Rn, T > 0 and O = U × (0, T ). We denote theparabolic boundary of O by

∂O = ∂U × (0, T ) ∪ U × T . (3.1)

Let −∂t + F : Rn ×R×R×Rn × S(n,R) 7→ R be a degenerate parabolic differential operator,

and suppose that u ∈ C(2,1) (O) solves

− ut + F(x, t, u(x, t), Dxu(x, t), D2

xu(x, t))

= 0 on O. (3.2)

Suppose that v ∈ C(2,1) (O) is such that u− v has a maximum at (x, t) ∈ O, with u(x, t) = v(x, t).Then one deduces that ut(x, t) = vt(x, t), Dxu(x, t) = Dxv(x, t) and that D2

xu(x, t) ≤ D2xv(x, t).

Degenerate parabolicity therefore implies that

−vt + F(x, t, v(x, t), Dxv(x, t), D2

xv(x, t))≤ −ut + F

(x, t, u(x, t), Dxu(x, t), D2

xu(x, t)).

From the hypothesis that u solves (3.2), this last inequality is

−vt + F(x, t, v(x, t), Dxv(x, t), D2

xv(x, t))≤ 0.

3We introduce the alternative notation C(a,b) (O) to avoid confusion with the Holder spaces Ck,γ (O). We hopethe reader agrees that such a choice reflects well the notation (x, t) ∈ O.

3. VISCOSITY SOLUTIONS OF PARABOLIC EQUATIONS 27

If it were supposed that u− v had a minimum at (x, t), u(x, t) = v(x, t), then

−vt + F(x, t, v(x, t), Dxv(x, t), D2

xv(x, t))≥ 0.

This last inequality no longer involves any reference to the regularity of u. Therefore, if one takesthe statement

“if u ∈ C(O)

is such that for any v ∈ C(2,1) (O),

u− v has a maximum (resp. minimum) at (x, t) ∈ O, u(x, t) = v(x, t),

implies − vt + F(x, t, v(x, t), Dxv(x, t), D2

xv(x, t))≤ 0 (resp. ≥ 0)”

(3.3)

as the defining property of a notion of solution which will be termed viscosity solution of (3.2),then this defines a type of generalised solution and it may permit solutions that are less regularthan C(2,1) (O).

The purpose of this chapter is to show how this is a fruitful notion of solution and how itis relevant to HJB equations and value functions of optimal control problems. However, beforecontinuing on this path, it is helpful to establish equivalence with other formulations of the definitionof viscosity solutions.

3.2. Parabolic superjets and subjets.

Definition 3.1. Let O = U × (0, T ), u : O 7→ R and (x, t) ∈ O. The parabolic superjet P+u(x, t)of u at (x, t) is defined to be the set of all (q, p, P ) ∈ R × Rn × S(n,R) such that for every ε > 0,there is δ > 0 such that for all (h, y) ∈ O, |y|+ |h| < δ implies

u(x+ y, t+ h) ≤ u(x, t) + qh+ p · y +1

2Py · y + ε

(|h|+ |y|2

). (3.4)

The parabolic subjet P−u(x, t) of u at (x, t) is defined by

P−u(x, t) = P+ (−u) (x, t). (3.5)

Remark 3.2. It may be noted that the superjets and subjets depend on the set over which onerequires the inequalities given in the above definition. In [12], the parabolic superjet is denoted

P 2,+O u(x, t) to indicate this dependency. However in this work, it will not be necessary to consider

the superjet and subjet over any other set, so we suppress the dependency in the notation.

The parabolic superjets and subjets are sometimes called superdifferentials and subdifferentials.This is because if u ∈ C(2,1) (O), from the definition of differentiability, one has

P+u(x, t) ∩ P−u(x, t) =(ut(x, t), Dxu(x, t), D2

xu(x, t)). (3.6)

Before re-stating the definition of viscosity solutions, an important result is needed to relatesuperjets and subjets with the discussion of paragraph 3.1 on the motivation of viscosity solutions.

First, denote

USC(O)

=v : O 7→ R | v upper semi-continuous on O

; (3.7a)

LSC(O)

=v : O 7→ R | v lower semi-continuous on O

. (3.7b)

Theorem 3.3. Let u ∈ USC(O)

and (x, t) ∈ O. Then for every (q, p, P ) ∈ P+u(x, t) there exists

v ∈ C(2,1)(O)

such that(vt(x, t), Dxv(x, t), D2

xv(x, t))

= (q, p, P ) and u− v has a strict maximum

over O at (x, t), with u(x, t) = v(x, t).

Conversely, if v ∈ C(2,1) (O) is such that u− v has a local maximum at (x, t), then(vt(x, t), Dxv(x, t), D2

xv(x, t))∈ P+u(x, t).

This result is found in [16, p. 211] with the restriction that the function u ∈ C(O)

and it is leftas an exercise in [12]. However, it is essential in this work that it be shown for the general semi-continuous case. This is because it will be used for the Barles-Souganidis convergence argumentsfor numerical methods, given in chapters 6 and 7.

The fact that the maximum can be taken to be strict and that we may take v ∈ C(2,1)(O)

and

not merely C(2,1) (O) are not superfluous details.


For the proof we will state the principal change to be made to the arguments in [16, p. 211],then refer the reader to this source for the remaining arguments.

Proof. First, suppose that v ∈ C(2,1)(O)

is such that u−v has a local maximum at (x, t) ∈ O.Then by the definition of differentiability of v, for all ε > 0 there is δ > 0 such that |y| + |h| < δimplies∣∣∣∣v(x+ y, t+ h)− v(x, t)− vt(x, t)h−Dxv(x, t) · y − 1

2D2xv(x, t)y · y

∣∣∣∣ < ε(|h|+ |y|2

).

So u(x+ y, t+ h)− u(x, t) ≤ v(x+ y, t+ h)− v(x, t) implies

u(x+ y, t+ h) ≤ u(x, t) + vt(x, t)h+Dxv(x, t) · y +1

2D2xv(x, t)y · y + ε

(|h|+ |y|2

);

thus(vt(x, t), Dxv(x, t), D2

xv(x, t))∈ P+u(x, t).

Now suppose that (q, p, P ) ∈ P+u(x, t), and without loss of generality, assume that (x, t) =(0, 0). The aim is to construct an appropriate v that satisfies the conclusion of the theorem. A firstguess might be to take to be v(y, h) = u(0, 0) + qh+ p · y + 1/2Py · y, however due to the presence

of the term ε(|h|+ |y|2

)in the definition of the superjet, it cannot be concluded that u− v would

have a strict maximum at (0, 0).

We shall find f ∈ C(2,1)(O)

with(ft(0, 0), Dxf(0, 0), D2

xf(0, 0))

= 0 such that v = v + f willsatisfy the claim of the theorem.

We begin by defining h : [0,∞) 7→ R by h(0) = 0 and for r > 0,

h(r) = sup

(u(y, h)− v(y, h))+√|y|4 + h2

∣∣∣∣∣ (y, h) ∈ O and

√|y|4 + h2 ≤ r

; (3.8)

Note that the assumption that u ∈ USC(O)

implies that for r > 0 the supremum in (3.8) is

attained and that h(r) ∈ R. Since O is compact, h is bounded and increasing, and hence integrableon compact subsets of [0,∞) - see [24, proposition 4 p. 56].

Furthermore, from the assumption that (q, p, P ) ∈ P+u(0, 0), we have h(r) → h(0) = 0 asr → 0.

If u were continuous, it could be shown that h would be continuous. However, for upper-semicontinuous u this is not generally the case. To regularise h, let h(0) = 0 and for r > 0,let

h(r) =1

r

2r∫r

h(s)ds;

For r > 0, it follows from the theory of absolutely continuous functions that h is continuous at r,see [24, chapter 6]. Since h is increasing, h(r) ≤ h(r) ≤ h(2r). This inequality, together with the

fact that h(r)→ 0 as r → 0, shows that h is continuous on [0,∞).

The remainder of the proof now follows [16, p. 211]. To summarise it, one defines F : [0,∞) 7→ Rby

F (r) =2

3r

2r∫r

2y∫y

h(s)dsdy;

and set F (0) = 0. Then one defines

f(y, h) = F

(√|y|4 + |h|2

)+ |h|2 +

n∑i=1

y4i .

It is found from continuity of h that f ∈ C(2,1)(O), where continuity of the derivatives around the

origin must be verified from first principles. Again from first principles, one finds that(ft(0, 0), Dxf(0, 0), D2

xf(0, 0))

= 0.


The properties of h, in particular that h(r) ≥ h(r) also imply that v = v + f satisfies u − v has

a strict local maximum at (x, t) over O and u(x, t) = v(x, t) and v ∈ C(2,1)(O), thus showing the

theorem.

3.3. Viscosity solutions. Let −∂t + F be degenerate parabolic operator and consider theequation

− ut + F(x, t, u,Dxu,D

2xu)

= 0 on O. (3.9)

We now state the definition of a viscosity solution.

Definition 3.4 (Viscosity Solutions). A function u ∈ USC(O)

is a viscosity subsolution of equa-tion (3.9) if for every (x, t) ∈ O

− q + F (x, t, u(x, t), p, P ) ≤ 0 for all (q, p, P ) ∈ P+u(x, t). (3.10)

A function u ∈ LSC(O)

is a viscosity supersolution of equation (3.9) if for every (x, t) ∈ O

− q + F (x, t, u(x, t), p, P ) ≥ 0 for all (q, p, P ) ∈ P−u(x, t). (3.11)

A function u ∈ C(O)

is a viscosity solution of equation (3.9) if u is a viscosity subsolution anda viscosity supersolution of (3.9).

Remark 3.5. It readily follows from theorem 3.3 that the above definition is equivalent to definingviscosity solutions with reference to “test” functions, as in (3.3).

The following result establishes another equivalent definition of viscosity solutions. This will behelpful in later chapters and although well known, the proof given here was obtained independently.

Proposition 3.6 (Equivalence with smooth test functions). Let u ∈ USC(O). Let F : U× [0, T ]×

R × Rn × S(n,R) 7→ R be continuous and −∂t + F degenerate parabolic and proper. Then u is aviscosity subsolution of (3.9) if and only u is such that for every w ∈ C∞

(O), u− w has a strict

local maximum at (x, t) ∈ O with u(x, t) = w(x, t), implies

− wt(x, t) + F(x, t, w(x, t), Dxw(x, t), D2

xw(x, t))≤ 0. (3.12)

Proof. The first implication is clear since C∞(O)⊂ C(2,1) (O) and theorem 3.3 establishes

the equivalence between test functions in C(2,1) (O) and the superjets of u.

Now suppose that for every w ∈ C∞(O), if u − w has a strict maximum at (x, t) ∈ O over O

with u(x, t) = w(x, t), then

− wt(x, t) + F(x, t, w(x, t), Dxw(x, t), D2

xw(x, t))≤ 0. (3.13)

Now for (x, t) ∈ O, let (q, p, P ) ∈ P+u(x, t). By theorem 3.3, there exists w ∈ C(2,1)(O)

such that

u− w has a strict maximum at (x, t) ∈ O over O with

(q, p, P ) =(∂tw(x, t), Dxw(x, t), D2

xw(x, t)).

Since w ∈ C(2,1)(O), by definition, w may be extended to w ∈ C(2,1) (Q) where O ⊂⊂ Q, Q an

open set with C1 boundary. See the proof of theorem 3.3.

For ε > 0 sufficiently small such that ε < dist(O, ∂Q

), let wε be the standard mollification

of w of radius ε; see [15, p. 629]. Since ∂Q is of class C1, integration by parts is valid ([15, p.627]) so using the arguments of [15, p. 250], the derivatives of the wε are the mollifications of the

derivatives of w, and using [15, p. 630], wε ∈ C∞(O)

converges to w in C(2,1)(O).

Using the arguments of [15, p. 541] and the fact that u ∈ USC(O), there exists (xε, tε) ∈ O

tending to (x, t) such that u− wε has a local maximum at (xε, tε), in particular with

u (xε, tε)− wε (xε, tε) ≥ u(x, t)− wε(x, t). (3.14)

Alternatively, the reader may look at proposition 2.8 of chapter 5 for justification of this claim.Since u ∈ USC

(O), abusing notation in considering countable subsequences εj∞j=1 → 0, we have

lim supε→0

u (xε, tε) ≤ u(x, t).


Yet, (3.14) implies that

lim infε→0

[u (xε, tε)]− w(x, t) = lim infε→0

[u (xε, tε)− wε (xε, tε)]

≥ u(x, t)− w(x, t).

Thus u(xε, tε)→ u(x, t) = w(x, t).

Set

wε(y, s) = wε(y, s) + u (xε, tε)− wε (xε, tε)−n∑i=1

(yi − (xε)i)4 − (s− tε)4 .

Then u − wε has a strict local maximum at (xε, tε) and u (xε, tε) = wε (xε, tε). Furthermore, by

uniform convergence of wε to w in C(2,1)(O),

limε→0

(wε (xε, tε) , ∂twε (xε, tε) , Dxwε (xε, tε) , D

2xwε (xε, tε)

)=(w(x, t), ∂tw(x, t), Dxw(x, t), D2

xw(x, t)).

(3.15)Since O is open, for ε sufficiently small (xε, tε) ∈ O, thus by hypothesis (3.13),

−∂twε (xε, tε) + F(xε, tε, wε(xε, tε), Dxwε(xε, tε), D

2xwε(xε, tε)

)≤ 0;

which by continuity of F and the above convergence results, implies that

− ∂tw(x, t) + F(x, t, w(x, t), Dxw(x, t), D2

xw(x, t))≤ 0, (3.16)

equivalently,−q + F (x, t, u(x, t), p, P ) ≤ 0.

Therefore u is a viscosity subsolution of (3.9).

It will now be shown that viscosity solutions satisfy two important properties usually requiredof generalised solutions, namely consistency and selectivity. This means here that the definition ofviscosity solutions and classical solutions agree on which functions in C2,1 (O) ∩ C

(O)

are or arenot viscosity solutions and classical solutions of the PDE.

Theorem 3.7 (Consistency and Selectivity). Let u ∈ C(O)∩ C(2,1) (O). Then u is a viscosity

solution of (3.9) if and only if u is a classical pointwise solution of (3.9).

Proof. Suppose u is a classical pointwise solution of (3.9) and for (x, t) ∈ O, let (q, p, P ) ∈P+u(x, t). Then D2

xu(x, t) ≤ P , (q, p) = (ut(x, t), Dxu(x, t)). So degenerate ellipticity implies

−q + F (x, t, u(x, t), p, P ) ≤ −ut(x, t) + F(x, t, u(x, t), Dxu(x, t), D2u(x, t)

)= 0,

and u is a viscosity subsolution. Similarly, it is a viscosity supersolution.

Now suppose that u ∈ C(O)∩C(2,1) (O) is a viscosity solution of (3.9). Then, as was previously

remarked, the definition of differentiability implies that for all (x, t) ∈ O,(ut(x, t), Dxu(x, t), D2

xu(x, t))∈ P+u(x, t) ∩ P−u(x, t),

so from the definition of viscosity solutions

0 ≤ −ut(x, t) + F(x, t, u(x, t), Dxu(x, t), D2u(x, t)

)≤ 0,

thus showing that u is a classical solution.

Example 3.8 (Eikonal equation). [15, exercise 4 p. 564] Two important observations about thenotion of viscosity solution may be made. The first is that if we modify the operator by some alge-braic transformation, the ordering structure of the equation may be modified and thus the resultingviscosity solutions may differ from the viscosity solutions of the original equation. In other words,the equations F = 0 and −F = 0 do not necessarily have the same viscosity solutions.

The second observation is that the the superjets and subjets of a function may be empty. Thedefinition of a viscosity solution only gives a condition on an element of the superjet or subjetprovided it exists. Thus if the subjet or superjet is empty, then the corresponding condition isfulfilled by default.

To illustrate this, consider the Eikonal equation∣∣u′(x)∣∣− 1 = 0, x ∈ (−1, 1); (3.17)


with boundary conditions u(1) = 0, u(−1) = 0. Although this equation is not a parabolic equation,it is clear that an analoguous development to the above discussion shows how to define viscositysolutions for this equation. For more details see the definition of viscosity solutions for ellipticequations in [12].

We show that u(x) = 1− |x| is a viscosity solution as follows.

First of all, by the proof of theorem 3.7, we know that because u solves the equation classicallyin (−1, 0) and (0, 1), u is therefore a viscosity solution on (−1, 0) ∩ (0, 1). Now suppose that forsome ϕ ∈ C2(−1, 1), u− ϕ has a maximum at 0. Thus for all x near 0

1− |x| ≤ 1 + xϕ′(0) +1

2x2ϕ′′(0) + o(|x|2),

or

−1 ≤ sign(x)ϕ′(0) +1

2|x|ϕ′′(0) + o(|x|).

Therefore letting x→ 0, first with x < 0, then x > 0 to obtain two inequalities, we conclude that

−1 ≤ ϕ′(0) ≤ 1,

or |ϕ′(0)| − 1 ≤ 0. So u is a viscosity subsolution.

Note that P−u(0) is empty, and as pointed out above, it follows that u is also a viscositysupersolution. It is then concluded that u is a viscosity solution.

Now consider the equation

−∣∣u′(x)

∣∣+ 1 = 0 x ∈ (−1, 1); (3.18)

with same boundary conditions. Let ϕ(x) = x2 and u as above. Then u(x) − ϕ(x) = 1 − |x| − x2

has a maximum at 0. If u were a viscosity subsolution, then we ought to have

−∣∣ϕ′(0)

∣∣+ 1 ≤ 0;

or equivalently |ϕ′(0)| ≥ 1. But here ϕ′(0) = 0. Therefore u is not a viscosity subsolution of (3.18).In fact −u is a viscosity solution of (3.18).

We now return to the examples of the previous chapter, where an infinite family of a.e. solutionsto a HJB equation was found. The notion of viscosity solution will indeed rule out the alternativecandidates.

Example 3.9. This example concerns the family of functions wk(·, 0)∞k=2 defined by equation(4.2) of chapter 1. Every element of wk(·, 0)∞k=2 satisfies pointwise a.e. the Eikonal equation(3.17) yet a very similar analysis to that of example 3.8 shows that none of the wk(·, 0)∞k=2 areviscosity solutions to the equation.

To see this, note that when 1 − |x| = 5/2k+1, which has solutions xi ⊂ (−1, 1) for k ≥ 2,wk(·, 0) has a local minimum at xi. Let ϕ = 0. Then wk(·, 0)− ϕ has a local minimum at xi, but∣∣ϕ′∣∣− 1 = −1 0.

So for k ≥ 2, wk is not a viscosity solution of (3.17). Similar arguments show that none of wk∞k=2are viscosity solutions of the parabolic HJB equation (4.1) of chapter 1.

3.4. Final value problems and comparison.

Final value problems. Having stated the notion of a viscosity solution to a parabolic PDE, wenow turn towards the notion of a viscosity solution to a parabolic final value problem with Dirichletboundary data on the parabolic boundary. The parabolic final value problem considered is

−ut + F(x, t, u,Dxu,D

2xu)

= 0 on O, (3.19a)

u = g on ∂O, (3.19b)

where g ∈ C (∂O) and ∂O = ∂U × (0, T )∪U ×T is the (final) parabolic boundary . For example,the HJB equation derived in theorem 3.3 of chapter 1 was of this form.

As seen in example 4.2 of chapter 1, for bounded domain problems it cannot always be expectedthat the value function of an optimal control problem agrees with the boundary conditions. Thereis a way of weakening the boundary conditions to give a notion of viscosity solutions of the final


value problem that satisfy the boundary data in what is called in [12] the weak viscosity sense.For more information on the weak viscosity sense, we refer the reader to [12, section 7] and to [16,chapters 2 and 7].

However, in this work, we will restrict our attention to problems where it is assumed that aviscosity solutions satisfies the boundary data in the usual classical sense. A reason for doing sois that the proofs of convergence for numerical methods found in chapters 6 and 7 are suited tothis situation. Therefore, we use the definition of a viscosity solution to the parabolic final valueproblem in the strong sense, as is done in [12, section 8].

Definition 3.10 (Strong viscosity sense of parabolic final value problem). A function u ∈ USC(O)

is a viscosity subsolution of (3.19) if u is a viscosity subsolution of (3.19a) in the sense of definition3.4 and u ≤ g on ∂O.

Likewise, a function u ∈ LSC(O)

is a viscosity supersolution of (3.19) if it is a viscositysupersolution of (3.19a) in the sense of definition 3.4 and u ≥ g on ∂O.

A viscosity solution u ∈ C(O)

is a viscosity solution of (3.19) if it is both a viscosity subsolutionand a viscosity supersolution.

Comparison property. A major result in the theory of viscosity solutions is the comparisonproperty. For the HJB equation, this result takes the following form.

Theorem 3.11. [16, p. 221]. Given assumptions (2.1) and (2.4) of chapter 1, if v ∈ USC(O)

and w ∈ LSC(O)

are respectively a viscosity subsolution and supersolution of the HJB equationwithout boundary data

−ut +H(x, t,Dxu,D

2xu)

= 0 on O

Thensup

U×(0,T ]

[v − w] = sup∂O

[v − w] . (3.20)

Remark 3.12. A simple consequence of this result is that if v and w are respectively a subsolutionand a supersolution of the HJB final value problem of the form (3.19), by definition v ≤ g ≤ won ∂O, we have v ≤ w on U × (0, T ], i.e. viscosity subsolutions of the HJB final value problem liebelow viscosity supersolutions of the HJB final value problem.

This leads to the following uniqueness result for the final value problem (3.19).

Corollary 3.13 (Uniqueness). Given assumptions (2.1) and (2.4) of chapter 2, there is at mostone viscosity solution in the sense of definition 3.10 to the HJB final value problem

−ut +H(x, t,Dxu,D

2xu)

= 0 on O; (3.21a)

u = g on ∂O. (3.21b)

Proof. Suppose u and v are both viscosity solutions to 3.21. Then the definition of viscositysolutions, definition 3.10, implies that u = g = v on ∂O. By remark 3.12, since u is a subsolutionand v is a supersolution, u ≤ v on O, where this inequality is extended to U ×0 because u and vare in C

(O). Similarly, because v is a subsolution and u is a supersolution, v ≤ u on O. Therefore

u = v on O.

4. Viscosity solutions of Hamilton-Jacobi-Bellman equations

In this section we will make the link between the Hamilton-Jacobi-Bellman equation of anoptimal control problem and viscosity solutions. To do so, we will prove that the value functionof a deterministic optimal control problem is in fact a viscosity solution of the associated HJBequation.

We restrict ourselves to the deterministic case because the technicalities of the stochastic caserequire a more involved analysis. However, the value function is in fact also a viscosity solution forthe stochastic case, and we will state without proof the main result on this and refer the reader to[16] for further details.

4. VISCOSITY SOLUTIONS OF HAMILTON-JACOBI-BELLMAN EQUATIONS 33

4.1. Deterministic optimal control. We consider here the finite time, bounded domain,deterministic problem. Let the state dynamics be given by

dxα(·)(s) = b(xα(·)(s), s, α(s)

)ds s ∈ (t, T ] ; (4.1a)

xα(·)(t) = x. (4.1b)

The control set A is chosen to be

A = α : [0, T ] 7→ Λ |α(·) Lebesgue measurable .

We will use the following lemma. Let χA be the indicator function of statement A: χA = 1 ifA, χA = 0 if not A. Recall that τ is the time of first exit of (x(s), s) from U × [t, T ].

Lemma 4.1 (Dynamic Programming Principle). [16, p. 11]. For any h > 0 such that t+ h < T ,if t = min(τ, t+ h), then


t∫t

f (x(s), s, α(s)) ds+ g(x(t), t

)χτ<t+h + u(x(t), t)χt+h≤τ

. (4.2)

The proof of the following theorem is inspired from the proof given in [15, p. 557], that treatsthe unbounded domain problem. We have provided further arguments to extend it to the finitetime, bounded domain optimal control problem, under a weaker assumption on b.

Theorem 4.2 (First order Hamilton-Jacobi-Bellman equation - viscosity sense). Provided thatthe value function u is uniformly continuous up to the boundary, i.e. u ∈ C

(O), u is a viscosity

solution of the HJB equation with no boundary data

−ut + supα∈Λ

[−bα ·Dxu− fα] = 0 on O;

If furthermore u = g on ∂O, then u is a viscosity solution of the HJB equation

−ut + supα∈Λ

[−bα ·Dxu− fα] = 0 on O; (4.3a)

u = g on ∂O. (4.3b)

Proof. By hypothesis, u ∈ USC(O)∩ LSC

(O). First, assume that u is not a viscosity

subsolution of equation (4.3a). Then there is (x0, t0) ∈ O and ϕ ∈ C(2,1)(O)

such that u − ϕ hasa maximum at (x0, t0) ∈ O, but

−ϕt(x0, t0) + maxα∈Λ

(−b(x0, t0, α) ·Dxϕ(x0, t0)− f(x0, t0, α)) > 0.

So there exists α ∈ Λ and ε > 0 such that, after reversing the sign,

ϕt(x0, t0) + b(x0, t0, α) ·Dxϕ(x0, t0) + f(x0, t0, α) < −2ε.

Since ϕ ∈ C(2,1)(O)

and b and f are continuous,, there exists δ > 0 such that for all |x− x0| +|t− t0| ≤ δ,

ϕt(x, t) + b(x, t, α) ·Dxϕ(x, t) + f(x, t, α) < −ε.

To adapt the arguments found in [15], it is necessary to justify that there exists h > 0 suchthat the state remains in O, regardless of the control. This is because if this were not the case,some of the quantities in the following arguments might not be well defined.

First, we may take δ such that B(x0, δ/2) ⊂ U . From the assumption that

|b(x, t, α)| ≤ C(1 + |x|);


if x(·) solves (4.1) with some control α(·) and starting data (x0, t0), then for any h > 0 witht0 + h ≤ T and any s ∈ [t0, t0 + h], using the triangle inequality

|x(s)− x0| ≤t0+h∫t0

|b(x(ξ), ξ, α)| dξ ≤t0+h∫t0

C (1 + |x(ξ)|) dξ

≤ Ch (1 + |x0|) + C

t0+h∫t0

|x(ξ)− x0| dξ.

By Gronwall’s inequality, [15, p. 625], for any control α (·),|x(s)− x0| ≤ Ch (1 + |x0|)

(1 + CTeCT

)s ∈ [t0, t0 + h]. (4.4)

Therefore there exists h, 0 < h ≤ δ/2, such that |x(s)− x0| + |s− t0| ≤ δ for all s ∈ (t0, t0 + h].We may furthermore take t0 + h ≤ τ , because U is open and the above bound is independent ofthe control.

Since u− ϕ has a maximum at (x0, t0),

u(x(t0 + h), t0 + h)− u(x0, t0) ≤ ϕ(x(t0 + h), t0 + h)− ϕ(x0, t0)

≤t0+h∫t0

ϕt(x(s), s) + b(x(s), s, α) ·Dxϕ(x(s), s)ds,

where the last term made use of the regularity of ϕ and the fundamental theorem of calculus. Butsince t ≤ τ , from lemma 4.1,

u(x0, t0) ≤ u(x(t0 + h), t0 + h) +

t0+h∫t0

f(x(s), s, α)ds.

Using the fact that u ∈ C(O) implies that these quantities are finite, we conclude that

0 ≤t0+h∫t0

ϕt(x(s), s) + b(x(s), s, α) ·Dxϕ(x(s), s) + f(x(s), s, α)ds ≤ −hε;

which contradicts ε > 0. Hence

− ϕt(x0, t0) + maxα∈Λ

[−b(x0, t0, α) ·Dxϕ(x0, t0)− f(x0, t0, α)] ≤ 0, (4.5)

and u is a viscosity subsolution.

Now we show it is a supersolution. Let u − ϕ have a minimum at (x0, t0) ∈ O, and supposethat

−ϕt(x0, t0) + maxα∈Λ

[−b(x0, t0, α) ·Dxϕ(x0, t0)− f(x0, t0, α)] < 0

By continuity of b, a and f and regularity of ϕ, there exists δ > 0 and ε > 0 such that if |x− x0|+|t− t0| < δ then

−ϕt(x, t) + maxα∈Λ

[−b(x, t, α) ·Dxϕ(x, t)− f(x, t, α)] < −ε.

Recall that by equation (4.4), there is h > 0 such that |x(s)− x0|+ |s− t0| ≤ δ for all s ∈ [t0, t0 +h]and, importantly, for any control α(·). Again by (4.4), since U is open, we may take h so smallsuch that, for any control, x(s) ∈ U for all s ∈ [t0, t0 + h].

So by lemma 4.1, there exists a control α(·) such that

u(x(t+ h), t+ h) +

t0+h∫t0

f(x(s), s, α(·))ds ≤ u(x0, t0) +ε

2.

We also have∫ t0+h

t0

ϕt(x(s), s) + b(x(s), s, α(·)

)·Dxϕ(x(s), s) ≤ u(x(t+ h), t+ h)− u(x0, t0).

4. VISCOSITY SOLUTIONS OF HAMILTON-JACOBI-BELLMAN EQUATIONS 35

By finiteness of u, we finally have,

ε ≤∫ t0+h

t0

ϕt(x(s), s) + b(x(s), s, α(·)

)·Dxϕ(x(s), s) + f(x(s), s, α(·))ds

≤ u(x(t+ h), t+ h) +

∫ t0+h

t0

f(x(s), s, α(·))ds− u(x0, t0) ≤ ε

2;

which contradicts ε > 0. Therefore

− ϕt(x, t) + maxα∈Λ

[−b(x, t, α) ·Dxϕ(x, t)− f(x, t, α)] ≥ 0, (4.6)

and u is a supersolution. So u is a viscosity solution of the HJB equation.

4.2. Stochastic optimal control. We now state the main result for the stochastic, finitetime finite horizon optimal control problem, which will be studied numerically in the next part.For proof, see [16]. This theorem establishes the relevance of viscosity solutions to the stochasticoptimal control problem of chapter 1.

Theorem 4.3. [16, p. 209]. Consider the stochastic optimal control problem of chapter 1. LetA be the set of all admissible progressively measurable controls. Assume that the value functionu ∈ C

(O), u = g on ∂O and that a stochastic analogue of lemma 4.1 holds, namely [16, property

(2.1) p. 201].

Then u is a viscosity solution of

−ut +H(x, t,Dxu,D

2xu)

= 0 on O; (4.7a)

u = g on ∂O, (4.7b)

where H is as in (2.7).

Remark 4.4. The value function u is in C(O)

and [16, property (2.1) p. 201] holds under theconditions of theorem 4.4 of chapter 1. See [16, p. 205].

Example 4.5. Consider the HJB equation of example 4.1 of chapter 1,

−ut + |ux| − 1 = 0 on (−1, 1)× (0, 1);

u = 0 on −1, 1 × (0, 1) ∪ (−1, 1)× 1 .It may be seen from first principles that a viscosity solution u is given by

u(x, t) =

1− |x| if |x| ≥ t,1− t if |x| < t.

We may also deduce this result by using the above theorems as follows. As stated in example 4.5 ofchapter 1, the properties of theorem 4.4 of chapter 1 are satisfied and the value function u satisfiesu = g on ∂O. Therefore theorem 4.3 above shows that the value function is a viscosity solution ofthe HJB equation.

4.3. Conclusion. In this chapter, it was seen how the notion of viscosity solutions is mean-ingful and relevant to HJB equations and optimal control problems. More specifically, theorem3.7, proving consistency and selectivity, and corollary 3.13, demonstrating uniqueness for HJB finalvalue problems, together show that the definition of a viscosity solution achieves the balancing actof being weak enough to permit less regular solutions, whilst being strong enough to select at mostone solution, which must agree with a classical solution if it exists. Theorem 4.3 shows that thisnotion of solution is the one relevant to the value function of an optimal control problem.

This chapter also detailed several results, namely theorems 3.3 and 3.11 and proposition 3.6,that will have significant roles in the arguments of later chapters.

CHAPTER 4

Discrete Hamilton-Jacobi-Bellman Equations

1. Introduction

The following chapters will present numerical methods for solving HJB equations. Theseschemes often have the common feature that they involve solving one or several discrete HJBequations which are of the form

F (x∗) := supα∈Λ

[Aα(x∗)− dα] = 0, (1.1)

where for each α ∈ Λ, Aα is a (possibly nonlinear) function from Rn to Rn and dα ∈ Rn.The supremum is understood here in the component-wise sense: for a collection xαα∈Λ ⊂ Rn,supα∈Λ [xα] ∈ Rn is defined to have components(

supα∈Λ

[xα]

)i

= supα∈Λ

[(xα)i] , i ∈ 1, . . . , n .

The main result presented in this chapter is a new finding. The aim of this chapter is totreat in a general setting the questions of existence and uniqueness of discrete solutions to thenumerical schemes, and to solve the problem of finding an efficient solver for these equations. Toour knowledge, no source treats these issues in unison in this general setting; although [7] treatsthe case of linear, monotone schemes.

Although equation (1.1) will be treated abstractly in this chapter, the reader may find it helpfulto consider the term Aα as an implicit part of a discretisation of the operator −ut + Lαu and theterm dα regrouping the explicit part of the discretisation of −ut+Lαu and the source term f(x, t, α).A concrete example can be found in section 2.3 of chapter 6.

The first section will examine the question of solubility of equation (1.1) for linear Aα and willprovide sufficient conditions for the existence and uniqueness of solutions. Two approaches aregiven, with a view on applicability to both monotone and non-monotone discretisations of the HJBequation.

The first approach is entirely original and relates the geometry of the ordering structure of theHJB operator to spectral properties of a set of matrices related to the operators Aα. Here we havein mind discretisations of the HJB equation that are not necessarily monotone.

The second approach is well known in the literature, see e.g. [7]. It considers primarily themonotonicity properties of a certain set of matrices. This approach is applicable to monotonediscretisations of the HJB equation.

The second section is also original, and shows how to construct a semi-smooth Newton methodfor general nonlinear Aα. After introducing the concept of slant differentiability, found in [11]and [17], we give conditions for superlinear convergence and global convergence of the method.The results here improve known results that were previously restricted to cases with linearityand monotonicity assumptions. The reason for considering nonlinear operators Aα is that linearmonotone discretisations of the terms Lα can usually only be achieved for low order methods - see[9] and [10].

2. Solubility of discrete Hamilton-Jacobi-Bellman equations

2.1. General linear case. In this section, for x ∈ Rn, let |x| = ‖x‖2 be the Euclidian norm.The reader may find it helpful to consult the section on the field of values in appendix B.

37

38 4. DISCRETE HAMILTON-JACOBI-BELLMAN EQUATIONS

We consider the discrete HJB equation (1.1), with Aα ∈M (n,R) linear maps and Λ a compactmetric space. It is assumed that α 7→ Aα and α 7→ dα are both continuous. As a result, thesupremum in (1.1) is always attained; for all x ∈ Rn and i ∈ 1, . . . , n, there exists αi(x) ∈ Λ suchthat

(F (x))i =(Aαi(x)x− dαi(x)

)i.

Define for each x ∈ Rn, i ∈ 1, . . . , n

Λi(x) =α ∈ Λ | (Aαx− dα)i ≥

(Aβx− dβ

)i, ∀β ∈ Λ

. (2.1)

The set Λi(x) is non-empty for all x ∈ Rn. We define for x ∈ Rn and a choice αi(x)ni=1 of elementsαi(x) ∈ Λi(x), the matrix G(x) ∈M (n,R) given by

(G(x))ij =(Aαi(x)

)ij. (2.2)

The matrix G(x) is composed as a mixture of the rows from the set of matricesAαi(x)

ni=1

. Themap G (or strictly speaking, a choice of G) will play an important role in the next section. Forx ∈ Rn, define d(x) by

(d(x))i =(dαi(x)

)i, (2.3)

where αi(x)ni=1 is the same choice as the one used for G(x). By compactness, there exists C1 ≥ 0such that

|d(x)| ≤ C1 for all x ∈ Rn. (2.4)

By theorem 1.7 of appendix B, there also exists C2 ≥ 0 such that

‖G(x)‖2 ≤ C2 for all x ∈ Rn. (2.5)

Definitions (2.2) and (2.3) imply that for every x ∈ Rn,

F (x) = G(x)x− d(x). (2.6)

The following corollary to Brouwer’s theorem will be used. Its proof is found in [30].

Proposition 2.1. Let B(0, R) = x ∈ Rn : |x| ≤ R for fixed R > 0. Let F : B(0, R) → Rn becontinuous. Suppose that

(F (x), x) ≥ 0, for all x with |x| = R.

Then the equation F (x) = 0 has at least one solution x∗ and |x∗| ≤ R.

The following result is original.

Theorem 2.2. [26]. Let Λ be a compact metric space and let α 7→ Aα ∈M (n,R) and α 7→ dα becontinuous. Suppose that there exists λ > 0 such that for every x ∈ Rn, there is a choice of G(x)for which

infz | z ∈ F

((G(x) +G(x)T )/2

)> λ.

Then the discrete HJB equationmaxα∈Λ

[Aαx− dα] = 0 (2.7)

has at least one solution x∗ with |x∗| ≤ C1/λ. If furthermore, for every x ∈ Rn, there exists achoice of G(x) such that

λ >1√2‖G(x)‖2, (2.8)

then the discrete HJB equation has a unique solution.

Proof. For x ∈ Rn, by the Cauchy-Schwarz inequality and by hypothesis, there is G(x) suchthat

(F (x), x) = (G(x)x− d(x), x)

≥ λ‖x‖2 − C1‖x‖,

where C1 is given by (2.4). So (F (x), x) ≥ 0 for all x with ‖x‖ = C1/λ, and proposition 2.1 showsthat there is a solution x∗ with |x∗| ≤ C1/λ.

2. SOLUBILITY OF DISCRETE HAMILTON-JACOBI-BELLMAN EQUATIONS 39

Now suppose that for every x ∈ Rn, there exists a choice of G(x) such that

λ >1√2‖G(x)‖2,

and to argue by contradiction, assume that there exists x, y both solutions of F (x) = 0, x 6= y.Then we have, for the choices G(x) and G(y) satisfying (2.8),

G(x)y − d(x) ≤ F (y),

and since F (y) = 0 = F (x) = G(x)x− d(x), it follows that

G(x) (y − x) ≤ 0.

Similarly we find that

G(y) (y − x) ≥ 0.

If either G(x) (y − x) = 0 or G(y) (y − x) = 0, then, say,

0 = (G(x) (y − x) , y − x) ≥ λ |y − x|2 ,

which contradicts the assumption that y 6= x. Therefore both G(x) (y − x) and G(y) (y − x) arenon-zero. So

(G(y) (y − x) , G(x) (y − x)) ≤ 0

implies that the angle between G(x) (y − x) and G(y) (y − x) is greater than or equal to π/2. Nowconsider the angle θ1 between y − x and G(x) (y − x):

cos(θ1) =(G(x) (y − x) , y − x)

|G(x) (y − x)| |y − x|

≥ λ

‖G(x)‖2>

1√2.

Therefore θ1 is strictly less than π/4. Similarly we find that the angle between G(y) (y − x) andy− x is also strictly less than π/4. But the angle between G(x) (y − x) and G(y) (y − x) is greaterthan or equal to π/2. This is impossible, even in Rn, and gives a contradiction. Therefore thesolutions of the discrete HJB equation (2.7) are unique.

To illustrate the uses of this result, consider the following example. Some numerical schemeslead to matrices Aα of the form

Aα =1

∆tM + Lα. (2.9)

where M is a symmetric positive definite matrix and ∆t > 0 may be chosen independently of Lα

and M . The spectral radius of M is denoted ρ (M).

Corollary 2.3. Let Λ be a compact metric space and let α 7→ Aα ∈ M (n,R) and α 7→ dα becontinuous. Let M be a symmetric positive definite matrix, with ρ(M) ⊂ [γ, µ], µ <

√2γ. Defining

r = maxα∈Λ

[1

2(‖Lα‖∞ + ‖Lα‖1)

];

if ∆t > 0 satisfies

∆t <

√2γ − µ(

2 +√

2)r, (2.10)

then there is a unique solution to the HJB equation

maxα∈Λ

[1

∆tMx+ Lαx− dα

]= 0.

Proof. By corollary 1.5 section 1 of appendix B, for any x, y ∈ Rn with |y| = 1, one has

(y,G(x)y) ≥ γ

∆t− r,

and by theorem 1.7

‖G(x)‖2 ≤µ

∆t+ 2r.


Rearranging inequality (2.10) gives

µ

∆t+ 2r <

√2( γ

∆t− r).

Therefore condition (2.8) of theorem 2.2 is satisfied and there exists a unique solution to the HJBequation.

Example 2.4. This example shows that it is not sufficient that all linear maps Aα be positivedefinite to show that G(x) is positive definite. Consider

A =

(5 2−6 3

), B =

(1 −22 1

).

One may check that the linear maps represented by these matrices are positive definite. Now let

x =

(−2−1

).

Then we have:

Ax =

(−12

9

), Bx =

(0−5

);

so that

G(x) =

(1 −2−6 3

),

for which we have

xT G(x)x = −9.

2.2. The monotone case. There are another set of possible assumptions that may be usedas sufficient conditions for existence and uniqueness, based on monotonicity of the matrices Aα.The reader may find it helpful to consult the section on M-matrices in appendix B. The results ofthis paragraph are already well known.

Lemma 2.5. Let Λ be a compact metric space, let α 7→ Aα ∈M (n,R) and α 7→ dα be continuous.Suppose that for every α ∈ Λn, the matrix G (α) defined by

(G (α))ij = (Aαi)ij

is non-singular. Then‖G(x)−1‖ |x ∈ Rn

is bounded.

Proof. By hypothesis, the map Λn 7→ R, α 7→ ‖G (α)−1 ‖ is a continuous map from a compactset. Therefore its image in R is a compact set. In particular,

‖G(x)−1‖ |x ∈ Rn

is a subset of

this compact set.

Theorem 2.6. [7]. Let Λ be a compact metric space, let α 7→ Aα ∈ M (n,R) and α 7→ dα becontinuous, and assume that for every x ∈ Rn, G(x) is a non-singular M-matrix. Then there is atmost a unique solution to the HJB equation (2.7). If furthermore

‖G(x)−1‖ |x ∈ Rn

is bounded,

a solution exists.

Proof. Existence in the case where‖G(x)−1‖ |x ∈ Rn

is bounded will be deduced as a

corollary to global convergence properties discussed later, see proposition 3.9.

Uniqueness results from monotonicity. As in the proof of theorem 2.2, if x, y are both solutions,one finds that

G(x) (y − x) ≤ 0 ≤ G(y) (y − x) .

Since G is a nonsingular M-matrix, and is therefore inverse positive, we find that

y − x ≤ 0 ≤ y − x,

which shows uniqueness.

3. SEMI-SMOOTH NEWTON METHODS 41

Example 2.7. An example of a scheme to which we may apply theorem 2.6 is the Kushner-Dupuisscheme presented in chapter 6. The matrices Aα have the form

Aα =1

∆tI + Lα,

where all off-diagonal entries of Lα are negative. We show there is ∆t > 0 such that all the matricesG(x), x ∈ Rn, are non-singular M-matrices.

By compactness of Λ and continuity, there exists c ≥ 0 such that

supα∈Λ

max1≤i≤n

Lαii ≤ c.

So we may write

Aα =

(1

∆t+ c

)I + Lα,

where Lα = Lα − cI has only negative entries.

Since for any x ∈ Rn, there exists (α1, . . . , αn) ∈ Λn such that

G(x) =

n∑i=1

Di

[(1

∆t+ c

)I + Lαi

]=

(1

∆t+ c

)I +

n∑i=1

DiLαi ,

where Di = Diag (0, . . . , 1, . . . , 0) the diagonal matrix with entry 1 in its i-th row. From the resultsof appendix B,

ρ

(n∑i=1

DiLαi

)≤ 4

n∑i=1

r(Lαi),

with r(·) the numerical radius. Compactness of Λ and continuity imply that there is a uniformbound C ≥ 0, C possibly chosen such that C > c and

ρ

(n∑i=1

DiLαi

)≤ C for all (α1, . . . , αn) ∈ Λn.

As a result, for ∆t < 1/(C−c), from the definition of M-matrices given in appendix B, we find thatG(x) is a non-singular M-matrix for all x ∈ Rn. We note that such a bound might be computed byusing proposition 1.6 of appendix B.

3. Semi-smooth Newton methods

As mentioned in section 2, the numerical schemes for HJB equations presented in this worktake the form

supα∈Λ

[Aα(x)− dα] = 0.

In this section, we will prove under very general conditions that it is possible to solve thisequation with a superlinearly convergent algorithm, sometimes called Howard’s algorithm or policyiteration. This algorithm will turn out to be a semi-smooth Newton method.

The main result derived in this section is of my own finding. It shows how to choose theanalogue of the Jacobian used for the classical Newton’s method, in order to obtain a semi-smoothNewton method for the discrete HJB equation.

As it was hinted at in section 2, under some conditions, one should use the matrices G(x)defined in an analoguous form to equation (2.2). The reader may recall that the construction of Ginvolved an element of choice, and it is significant that for the semi-smooth Newton method, anychoice made to define G is permissible.

Furthermore, the result given here improves the results that are already known in this area,such as those in [7], by not requiring any linearity or monotonicity properties of the operators Aα.The linear case was known to us ([27]), before becoming aware of [7] but their work pre-dates ourown.


The principal requirements of a numerical scheme to fit in this framework is that the operatorsAα have a property called slant differentiability. Further compactness and continuity conditionswill then imply that the operator F (x) = supα∈Λ [Aα(x)− dα] is itself slant differentiable.

First of all we introduce the notion of slant differentiability, then we will see how general slantdifferentiable functions lead to a semi-smooth Newton method which, under usual invertibility andboundedness assumptions, has local superlinear convergence1.

3.1. Slant differentiability. The concept of a slant derivative, introduced in [11], is a weak-ening of the notion of derivative: functions from Rn to Rk that are not differentiable potentiallycould be slant differentiable. The corresponding weak derivative is then called slant derivative orslanting function.

To give an indication of what one might expect such functions to be like, perhaps the mostelementary example of a slant differentiable function, that is not classically differentiable, is x 7→ |x|with x ∈ Rn.

The slant differentiability of functions between general Banach spaces is defined in [11] but werestrict our attention to Rn. Let U ⊂ Rn be open.

Definition 3.1. F : U 7→ Rn is slantly differentiable in U if there exists G : U 7→ M (n,R) suchthat for every x ∈ U

limh→0

1

|h||F (x+ h)− F (x)−G(x+ h)h| = 0. (3.1)

G is called a slant derivative of F .

Compare this to the definition of the classical derivative, given by the condition

limh→0

1

|h||F (x+ h)− F (x)−G(x)h| = 0.

We see immediately that all continuously differentiable functions are slant differentiable. Althoughthe definitions resemble each other, there may be significant differences between slant derivativesand classical derivatives. The first difference is that the slant derivative is not necessarily unique.The following elementary example shows this.

Example 3.2. Consider x ∈ R and the function x 7→ |x|. Let

g1(x) =

−1 if x < 0;

1 if x ≥ 0.

First of all, note that for x 6= 0, g1(x+ h) corresponds to the classical derivative of |x| for h small.So it is clear that g1(x) is a slant derivative away from x = 0. Now,

||h| − 0− g1(x+ h)h| = ||h| − sign(h)h| = 0.

Therefore g1(x) is a slant derivative of |x| everywhere in R. But by symmetry, it is obvious that

g2(x) = g1(−x)

is also a slant derivative of |x|. However g1(0) 6= g2(0), so slant derivatives are not in generalunique.

As a result, there is not one single, general algorithm for finding ‘the’ slant derivative. Unlike forclassical derivatives, to find a slant derivative we must give a proposal and verify that it satisfies thedefinition. This is precisely what we will do to find a slant derivative of the discrete HJB equation.

1ymm is said to converge superlinearly to x if

limm→∞

|ym+1 − x||ym − x|

= 0.


Remark 3.3. Another interesting fact about slant differentiable functions is given in [11], whereit is proven that a function is slant differentiable with bounded slant derivative if and only if it isLipschitz continuous, and their proof involves a construction of a slant derivative.

Since all the schemes considered in this work yield Lipschitz continuous operators, it is thusknown a-priori that a slant derivative exists. However, from a practical point of view, we must beable to find a specific slant derivative for use in the semi-smooth Newton algorithm, and we wouldlike to have a general way of finding such a slant derivative for HJB equations. The main result ofthis section achieves this by providing a slant derivative for a general discrete HJB equation.

3.2. Semi-smooth Newton’s method for slant differentiable functions. The semi-smooth Newton’s method for solving F (x) = 0 for a slant differentiable function F : Rn 7→ Rnwith invertible slant derivative G is

Algorithm 1 (Newton’s Method).

(1) Let y1 ∈ Rn.(2) Given ym, m ∈ N, let ym+1 be the unique solution of

G(ym) (ym+1 − ym) = −F (ym)

(3) If ym+1 = ym, stop. Otherwise set m = m+ 1 and return to (2).

By convention, if the algorithm stops after a finite number of iterations M , we set ym = yM+1

for all m ≥M + 1.

We now give the principal result of local superlinear convergence of the semi-smooth Newton’smethod for slant differentiable functions. This result is originally from [11] but we base the proofon [17], with a more detailed explanation of the superlinear convergence property.

Theorem 3.4. Suppose that x∗ solves F (x∗) = 0 and suppose that there is an open neighbourhoodU of x∗ such that F is slantly differentiable in U , with G a slant derivative of F . If G(x) is invertiblefor all x ∈ U and

‖G(x)−1‖ |x ∈ U

is bounded, then there exists δ > 0 such that for the sequence

ymm defined in algorithm 1, if there is m ∈ N such that ym ∈ B(x∗, δ), then ymm → x∗

superlinearly.

Proof. For any m, if ym ∈ U , because F (x∗) = 0, we have,

ym+1 − x∗ = ym − x∗ −G(ym)−1F (ym) +G(ym)−1F (x∗).

Hence,|ym+1 − x∗| ≤ ‖G(ym)−1‖ |F (ym)− F (x∗)−G(ym)(ym − x∗)| .

By hypothesis,‖G(x)−1‖ |x ∈ U

is bounded, so let C > 0 be a bound. By the definition of slant

differentiability, for each ε > 0, there is δε such that if h ∈ B (0, δε), then

|F (x∗ + h)− F (x∗)−G(x∗ + h)h| ≤ ε

C|h| ;

and we may take ε0 small enough such that B(x∗, ε0) ⊂ U . So suppose that for some ε0 ∈ (0, 1),there is m ∈ N such that ym ∈ B(x∗, δε0); then

|ym+1 − x∗| ≤ Cε0

C|ym − x∗|

≤ ε0 |ym − x∗| .(3.2)

Then by induction ymm is well-defined, i.e. the sequence remains in U , and converges to x∗,because ε0 < 1 and the contraction mapping principle applies. Then for all ε ≤ ε0, there is M suchthat for all m ≥M , ym ∈ B(x∗, δε). Hence by repeating the calculation leading to (3.2) for ε,

lim supm

|ym+1 − x∗||ym − x∗|

≤ ε for all ε ≤ ε0. (3.3)

Therefore

limm→∞

|ym+1 − x∗||ym − x∗|

= 0;

thus proving that convergence is superlinear.


Example 3.5. Table 1 illustrates the progressive increase in convergence rate described by the proofof theorem 3.4. Let

F (x) = maxα∈[−1,1]

(6x1 + αx1 x2 − 1

2x1 + 6x2

).

Note that such a function is an example of a discrete HJB operator, with a nonlinear operatorAα. In the next section, we will see what a suitable slant derivative for this function could be. Letus however give the results of two numerical experiments for solving this problem. The numericalsolution found was (0.16515,−0.05505). The first method used is a simple iteration method2 withlinear convergence, and the second is the semi-smooth Newton method.

In both cases the starting value was (0, 0). The semi-smooth Newton method converged to thesolution on the 5-th iterate. The simple iteration method reached the solution within roundingerror after 26 iterates. The errors are measured in the maximum norm and the convergence rateis calculated as rm = em+1/em, em = |ym − x∗|.

Observe in particular how the convergence rate of the semi-smooth Newton method increases asthe iterates approach the solution, just as is explained by inequality (3.2). For the simple iterationmethod, the convergence rate remains bounded away from 0 from below and convergence is linear.

Convergence is strictly local for the semi-smooth Newton method, because if (−10,−10) is takenas the start value, the iterates converge to (−18.165, 6.0550).

Table 1. Numerical results for example 3.5. The semi-smooth Newton method con-verges to the solution superlinearly, whereas the simple iteration method convergesonly linearly.

Simple Iteration Semi-smooth Newtonm Error Convergence Rate Error Convergence Rate

1 0.16515139 0.333333333 0.16515139 0.0091750772 0.055050463 0.453212111 0.001515277 8.27E-053 0.024949537 0.282588947 1.25E-07 6.87E-094 0.007050463 0.216373964 8.60E-16 05 0.001525536 0.154799884 0 -...

......

......

24 8.08E-15 0.24614409525 1.54E-15 0.1905579426 2.22E-16 0.144144144

Remark. Before concluding this section, it is interesting to note that Newton’s method is naturallyconnected to slant differentiability as a result of the fact that we use the iterate ym as the argu-ment for the evaluation of the slant derivative G. This is why the proof of convergence was verystraightforward, in particular with regards to obtaining (3.2).

3.3. Slant derivatives of discrete HJB operators. For y ∈ Rn, let |y| = ‖y‖∞ be themaximum norm of y. The main result of this section is to find a specific slant derivative of thediscrete HJB operator:

F (x) = maxα∈Λ

[Aα(x)− dα] .

As before, define for each x ∈ Rn, i ∈ 1, . . . , n,

Λi(x) =α ∈ Λ | (Aα(x)− dα)i ≥

(Aβ(x)− dβ

)i, ∀β ∈ Λ

. (3.4)

The following theorem is a novel result and constitutes the main contribution of this chapter.

Theorem 3.6. Suppose that:

2The iterates of the simple iteration method are defined as

ym+1 = ym − λF (ym)

for λ ∈ R, |λ| chosen smaller than the Lipschitz constant of F . Here λ was chosen to be 0.2.


(1) Λ is a compact metric space.(2) For each x ∈ Rn, α 7→ Aα(x) and α 7→ dα are continuous.(3) For each α ∈ Λ, Aα is Lipschitz continuous on Rn and the Lipschitz constants3 of Aα,

α ∈ Λ are uniformly bounded in α by L ≥ 0.(4) There are slant derivatives Jα of Aα (c.f. remark 3.3) which are uniform in α in the sense

that for each x ∈ Rn, ε > 0, there exists δ > 0 such that for all h ∈ B(0, δ)

1

|h||Aα(x+ h)−Aα(x)− Jα(x+ h)h| ≤ ε for all α ∈ Λ.

(5) The map α 7→ Jα is continuous in the sense that for each x ∈ Rn and ε > 0, there exists

δ > 0 and for each α ∈ Λ, there is δα > 0 such that for all h ∈ B(0, δ), and β ∈ B(α, δα),

‖Jα(x+ h)− Jβ(x+ h)‖ ≤ ε.

Then G : Rn 7→M(n,R) defined by

(G(x))ij =(Jαi(x)(x)

)ij, αi(x) chosen from Λi(x); (3.5)

is a slant derivative of F , for any choice of αi(x)ni=1.

Proof. Recall that Λi(x) is non-empty for every x ∈ Rn and i ∈ 1, . . . , n. Let x ∈ Rn andε > 0 be fixed. For all h ∈ Rn, all i ∈ 1, . . . , n and all β ∈ Λi(x+ h), α ∈ Λi(x), we have

Ei(h, β) := (Aα(x)− dα)i −(Aβ(x)− dβ

)i≥ 0.

Note that Ei(h, β) is well defined because it is the same for all choices of α ∈ Λi(x). Similarly,

(Aα(x+ h)− dα)i −(Aβ(x+ h)− dβ

)i≤ 0.

So (dβ − dα

)i≤(Aβ(x+ h)−Aα(x+ h)

)i.

Therefore

|Ei(h, β)| ≤∣∣∣(Aα(x)−Aα(x+ h))i −

(Aβ(x)−Aβ(x+ h)

)i

∣∣∣ . (3.6)

By hypothesis 3, and by the triangle inequality, we have for i ∈ 1, . . . , n,|Ei(h, β)| ≤ 2L |h| . (3.7)

By hypothesis 5, there is δ > 0 and for each α ∈ Λ, there exists δα > 0 such that for anyβ ∈ B(α, δα),

‖Jα(x+ h)− Jβ(x+ h)‖ ≤ ε for all h ∈ B(0, δ). (3.8)

So for each i ∈ 1, . . . , n, define

λi =⋃

α∈Λi(x)

B(α, δα). (3.9)

and let λci denote the complement of λi in Λ.

Continuity of Λi(x). Firstly, we wish to show that for small h, Λi(x + h) ⊂ λi for everyi ∈ 1, . . . , n. Let us suppose that there exists hm∞m=1 → 0, βm∞m=1 ⊂ Λ such that for eachm ∈ N,

βm ∈ Λim(x+ hm) ∩ λcim ,for some im ∈ 1, . . . , n. By finiteness of 1, . . . , n, there exists i ∈ 1, . . . , n and a subsequence,also denoted hm∞m=1 and βm∞m=1 with βm ∈ Λi(x+ hm) ∩ λci for all m ∈ N.

Since Λ is compact and λci is closed, there exists a subsequenceβmj

j

of βmm and there

exists β∗ ∈ λci such thatβ∗ = lim

j→∞βmj .

Recall inequality 3.7: we have ∣∣Ei(hmj , βmj )∣∣ ≤ 2L∣∣hmj ∣∣ .

3By ‘the’ Lipschitz constant we mean the infimum of all Lα ≥ 0 such that ‖Aα(x)−Aα(y)‖ ≤ Lα |x− y|


By hypothesis 2 and from the fact that hmj → 0 as j →∞, we have∣∣∣(Aα(x)− dα)i−(Aβ∗(x)− dβ∗

)i

∣∣∣ = limj→∞

∣∣∣(Aα(x)− dα)i−(Aβmjx− dβmj

)i

∣∣∣= lim

j→∞

∣∣Ei(hmj , βmj )∣∣≤ C lim

j→∞

∣∣hmj ∣∣= 0.

(3.10)

Therefore β∗ ∈ Λi(x) ⊂ λi(x) which contradicts β∗ ∈ λci .Hence there exists δ > 0 such that for all h ∈ B(x, δ), Λi(x + h) ⊂ λi for each i ∈ 1, . . . , n.

Now let δ∗ = min(δ, δ, δ) where δ is given by hypothesis 4.

Slant Differentiability. Before the final step, let us recall the facts: for every h ∈ B(0, δ∗),we have

Λi(x+ h) ⊂ λi(x) ∀ i ∈ 1, . . . , n (3.11)

1

|h||Aα(x+ h)−Aα(x)− Jα(x+ h)h| ≤ ε ∀α ∈ Λ (3.12)

‖Jα(x+ h)− Jβ(x+ h)‖ ≤ ε ∀α ∈ Λi(x) and β ∈ B(α, δα); (3.13)

In particular, (3.11) implies that for any chosen βi ∈ Λi(x + h) used to define G, there exists αi,dependent on βi such that βi ∈ B(αi, δαi) So for each i ∈ 1, . . . , n and h ∈ B(0, δ∗), by thetriangle inequality and (3.5),

|(F (x+ h)− F (x)−G(x+ h)h)i| =∣∣∣(Aβi(x+ h)− dβi

)i− (Aαi(x)− dαi)i −

(Jβi(x+ h)h

)i

∣∣∣≤∣∣∣(Aβi(x+ h)−Aβi(x)− Jβi(x+ h)h

)i

∣∣∣+ |Ei(x+ h, βi)|

≤ ε |h|+ |Ei(x+ h, βi)| .(3.14)

By (3.6) and the above facts,

|Ei(x+ h, βi)| ≤∣∣∣(Aαi(x)−Aαi(x+ h))i −

(Aβi(x)−Aβi(x+ h)

)i

∣∣∣≤ |(Aαi(x)−Aαi(x+ h) + Jαi(x+ h)h)i|+

∣∣∣(Aβi(x+ h)−Aβi(x)− Jαi(x+ h)h)i

∣∣∣≤ ε |h|+

∣∣∣(Jβi(x+ h)h− Jαi(x+ h)h)i

∣∣∣+∣∣∣(Aβi(x)−Aβi(x+ h)− Jβi(x+ h)h

)i

∣∣∣≤ 3 ε |h| ;

(3.15)

Since (3.14) and (3.15) hold for all i ∈ 1, . . . , n, we conclude that

1

|h||F (x+ h)− F (x)−G(x+ h)h| ≤ 4ε, (3.16)

which completes the proof.

Therefore we now have a general method of finding slant derivatives for arbitrary discrete HJBequations which satisfy the conditions of theorem 3.6.

For linear Aα, an important simplification occurs.

Corollary 3.7. Suppose that for each α ∈ Λ, Aα is a linear map. Theorem 3.6 holds if Λ is acompact metric space and α : 7→ Aα and α : 7→ dα are continuous; in which case the map x 7→ G(x)defined by equation (2.2) is a slant derivative of F .

Proof. Condition 3 holds because Λ is compact and Aα is linear. We must check that condi-tions 4 and 5 are automatically satisfied for some Jα a slant derivative to Aα. However Aα is a slantderivative to Aα because Aα is linear. Therefore condition 4 is automatically satisfied. Continuityimmediately implies condition 5. Finally, note that in equation (3.5), Jα can be taken to be Aα,thus is equivalent to defining G(x) via equation (2.2).


Example 3.8. For the problem given in example 3.5, we took Jα as

Jα(x1, x2) =

(6 + αx2 αx1

2 6

), (3.17)

and then G was given by

G(x1, x2) =

(6 + sign(x1x2)x2 sign(x1x2)x1

2 6

). (3.18)

Aα is continuously differentiable, therefore satisfies conditions 2 and 3. Explicit calculationsgive

1

|h||Aα(x+ h)−Aα(x)− Jα(x+ h)h| ≤ |α| |h| ,

thus verifiying condition 4. In fact this shows that convergence could be quadratic, which turns outto be the case if x∗1 6= 0 and x∗2 6= 0. Also there is C ≥ 0 such that

‖Jα(x)− Jβ(x)‖ ≤ C |α− β| |x| ,thus satisfying condition 5. Finally, if x is small enough, then for any α, we may conclude that Gsatisfies the conditions of theorem 3.4. This shows why we observed the convergence rates reportedin example 3.5.

3.4. Global convergence in the linear case. The following result is originally due to [7].

Proposition 3.9. Let Λ be a compact metric space, for each α ∈ Λ, let α 7→ Aα and α 7→ dα

continuous, Aα a linear map. If for G defined by (2.2),‖G(x)−1‖ |x ∈ Rn

is bounded, then

for every choice y1 ∈ Rn, the sequence defined by the semi-smooth Newton method converges to asolution of (1.1).

Proof. The iterates are defined by

G(ym) (ym+1 − ym) = −F (ym)

= −G(ym)ym + d(ym),

thusG(ym)ym+1 = d(ym). (3.19)

Recall that compactness and continuity of dα implies inequality (2.4), which says there is C0 ≥ 0such that

|d(x)| ≤ C0 for allx ∈ Rn.By hypothesis there is C1 ≥ 0 such that for any x ∈ Rn, ‖G(x)−1‖ ≤ C1, hence

|ym+1| ≤ C0C1.

Thus by compactness of ym+1 there is a convergent subsequence, also denoted ym, with limit x∗.We must now show that x∗ satisfies F (x∗) = 0.

The proof of theorem 3.6 included a result on the continuous dependence of the set valued mapsx 7→ Λi(x). For the converging sequence ym ⊂ Rn, any choice of G defines a sequence αm ⊂ Λn.By compactness of Λn, there exists a convergent subsequence of αm, similarly denoted αm. Bycontinuous dependence of Λi, we conclude that

limm→∞

(αm)i ∈ Λi (x∗) , i ∈ 1, . . . , n .

Continuity of Aα and dα then implies that there exists

G∗ = limm→∞

G(ym)

andd∗ = lim

m→∞d(ym)

that satisfy equations (2.2) and (2.3), i.e.

F (x∗) = G∗x∗ − d∗.Therefore, combining

limm→∞

G(ym)ym+1 − d(ym) = G∗x∗ − d∗ = F (x∗)


with equation (3.19) gives F (x∗) = 0.

Convergence of the entire sequence ym then follows from theorem 3.4.

Corollary 3.10. Under the hypotheses of theorem 2.6, suppose furthermore that dα ≥ 0 for allα ∈ Λ and that Aα is non-singular for all α ∈ Λ. Then for every α ∈ Λ, the unique solution x∗ of(2.7) satisfies

0 ≤ x∗ ≤ xα, (3.20)

where xα solves Aαxα = dα.

Proof. Let α ∈ Λ and xα solve Aαxα = dα. Note that F (xα) ≥ 0 implies that

G (xα)xα ≥ d (xα) ≥ 0;

so xα ≥ 0. Set y0 = xα and let ym be the sequence defined by the semi-smooth Newtonalgorithm. Theorem 2.6 and proposition 3.9 imply that ym tends to x∗ the unique solution ofF (x∗) = 0. Furthermore, since

G(ym)ym+1 = d(ym),

we have ym+1 ≥ 0, F (ym+1) ≥ 0 and

G(ym) (ym+1 − ym) = −F (ym) ≤ 0.

Therefore ym+1 ≤ ym. Inequality (3.20) is deduced from an induction on m.

3.5. Conclusion. In this chapter, two alternative results were given for deducing existence anduniqueness of solutions to discrete HJB equations, namely theorems 2.2 and 2.6. The notion of slantdifferentiability allows us to show in theorem 3.4 the possibility of superlinear convergence propertiesfor the semi-smooth Newton method applied to slantly differentiable functions. Furthermore, acandidate for the slant derivative to the discrete HJB equation was found in theorem 3.6.

The principal outcome of this chapter is the following. Firstly, for the time dependent HJBequation considered in this work, discretisations that use the method of lines can often be chosento give discrete HJB equations that are solvable. The uniqueness of these solutions is the principaldifficulty to be considered when choosing a discretisation.

Secondly, under the assumptions on the optimal control problem considered in this work, thesenonlinear equations can be solved with a superlinearly convergent algorithm. This is valid for bothmonotone and non-monotone methods. It is sometimes possible to choose the discretisation suchthat the algorithm exhibits global convergence.

This algorithm will be applied to the Kushner-Dupuis scheme for a model problem in paragraph5.3 of chapter 6.

CHAPTER 5

Envelopes

1. Introduction

This brief chapter introduces an analytical tool that is a key ingredient to the proofs of con-vergence for various numerical methods discussed in later chapters. Given an arbitrary sequence offunctions defined on some subsets of a set Q ⊂ Rn, we may always construct two further functions,called upper and lower envelopes of the sequence, which have a semblance to the notions of limitsuperior and limit inferior for sequences of real numbers.

In order to treat a number of different situations in which envelopes will be used in chapters6 and 7, we treat envelopes in a general setting. A number of results on envelopes are usedimplicitly in a certain sources, e.g. [16, chapter 9], yet are not treated in full detail. This chaptertherefore develops these well known results, all proofs given in this chapter having been obtainedindependently.

The primary use of envelopes in this work will be in the proofs of convergence of numericalmethods for finding viscosity solutions, that are given in subsequent chapters. To briefly give anindication of the content of these chapters, consider a bounded sequence xn∞n=1 ⊂ R. Then it isknown that if lim infn xn ≥ lim supn xn, the sequence converges.

In a similar way, in subsequent chapters we will aim to show that the upper envelope of asequence of numerical approximations will lie below the lower envelope of the sequence. This willin turn imply convergence of the numerical approximations, from which it will be deduced that theapproximations converge to the viscosity solution of the continuous problem.

Section 2 develops the already well-known results on envelopes, which serve as auxiliary resultsfor chapters 6 and 7. However, section 3 presents some original results, with a different objective.It is aimed towards finding arguments to show convergence to viscosity solutions of non-monotonenumerical methods. A sketch of how this might be applied is given at the end of section 3.

2. Basics of envelopes

Definition 2.1 (Envelopes). Let Q ⊂ Rn and for each x ∈ Q, let Sn(x, ε)n∈N,ε>0 be a collectionof eventually nonempty subsets of Q, in the sense that for each x ∈ Q and ε > 0, there exists N ∈ Nsuch that for all n ≥ N , Sn(x, ε) 6= ∅.

Let un∞n=1 be a family of functions, such that for each x ∈ Q and ε > 0, there exists N ∈ Nsuch that for all n ≥ N , un is defined and real-valued on Sn(x, ε).

We assume that for a given norm of Rn, for all x ∈ Q, n ∈ N, ε > 0

Sn(x, ε) ⊂ B(x, ε) ∩Q; (2.1)

where B(x, ε) = y ∈ Rn | ‖y − x‖ < ε. We also assume that for all x ∈ Q, n ∈ N and 0 < ε1 < ε2,

Sn(x, ε1) ⊂ Sn(x, ε2). (2.2)

For each x ∈ Q and ε > 0, define

D(x, ε) = lim supn

supy∈Sn(x,ε)

un(y).

Then define the upper envelope u∗ of unn∈N by

u∗(x) = limε→0

D(x, ε). (2.3)

49

50 5. ENVELOPES

Similarly define

D(x, ε) = lim infn

infy∈Sn(x,ε)

un(y);

and define the lower envelope u∗ of unn∈N by

u∗(x) = limε→0

D(x, ε). (2.4)

Remark 2.2. Henceforth, we will only treat the upper envelope, as all results given in this chapterhave their counter-parts for the lower envelopes. In addition, in the following discussion it will betacitly assumed that the conditions in the above definition hold.

The upper envelope u∗ is well defined because (2.2) implies that D(x, ε) is an increasing functionof ε and thus the limit in (2.3) exists in the extended reals and is unique. In particular, for allε > 0,

u∗(x) ≤ D(x, ε), (2.5)

where both values may possibly be ∞.

Example 2.3. In general, it is not true that un real valued implies that u∗ is real valued as well.Take

un(x) = u(x) =

− 1x if x > 0

0 if x = 0

Now let Sn(x, ε) = [0, ε] and Sn(x, ε) = (0, ε]. Then

limε→0

lim supn

supy∈Sn(0,ε)

u(y) = 0, (2.6)

limε→0

lim supn

supy∈Sn(0,ε)

u(y) = limε→0−1

ε= −∞. (2.7)

In [5] or [16], the envelopes are sometimes written as

u∗(x) = lim supy→xn→∞

un(y).

The following result makes the connection between these formulations.

Proposition 2.4. For each x ∈ Q, there exists nj∞j=1 ⊂ N, yj∞j=1 ⊂ Q and εj∞j=1 ⊂ (0,∞)

such that for all j ∈ N, yj ∈ Snj (x, εj) and

limj→∞

yj = x; (2.8a)

limj→∞

unj (yj) = u∗(x). (2.8b)

Proof. If u∗(x) = ∞, then for all ε > 0, D(x, ε) = ∞. Since Sn(x, ε)n∈N,ε>0 is eventuallynon-empty, for all M ∈ R there exists n ∈ N such that

supy∈Sn(x,ε)

un(y) > M,

hence there exists y ∈ Sn(x, ε) with un(y) > M . By inductively choosing Mj∞j=1 an unbounded

increasing sequence, we may choose nj∞j=1 ⊂ N and yj∞j=1 ⊂ Q satisfying the claim.

Now suppose u∗(x) ∈ R. Then for all δ > 0, there is εδ > 0 such that for all ε ≤ εδ,

u∗(x)− δ

2< D(x, ε) < u∗(x) +

δ

2.

So for any N ∈ N, there exists nε ∈ N, nε ≥ N , such that

u∗(x)− δ

2< sup

y∈Snε (x,ε)unε(y) < u∗(x) +

δ

2.

So there exists y ∈ Snε(x, ε) such that

u∗(x)− δ < unε(y) < u∗(x) +δ

2.

2. BASICS OF ENVELOPES 51

Therefore, by inductively choosing sequences δj∞j=1 and εj∞j=1 → 0, εj ≤ εδj , we obtain the

required sequences nj∞j=1 ⊂ N and yj∞j=1 ⊂ Q. The final case, u∗(x) = −∞ is verified in a

similar manner.

Proposition 2.5. Suppose that for all x ∈ Q and all ε > 0, there exists δε > 0 such that for ally ∈ B (x, δε), there exists εy > 0 and Ny ∈ N such that

Sn(y, ε∗) ⊂ Sn(x, ε) for all ε∗ ≤ εy, n ≥ Ny.

Then u∗ is upper semi-continuous on Q.

Proof. Fix x ∈ Q. Let xm∞m=1 → x. Then for each ε > 0, there exists Mε ∈ N such thatxm ∈ B(x, δε) for all m ≥Mε, where δε is as in the hypothesis. By hypothesis, for m ≥Mε,

Sn(xm, ε∗) ⊂ Sn(x, ε) for all ε∗ ≤ εxm , n ≥ Nxm .

Hence, for all ε∗ ≤ εxm , and n ≥ Nxm ,

supy∈Sn(xm,ε∗)

un(y) ≤ supy∈Sn(x,ε)

un(y). (2.9)

Therefore, recalling (2.5), u∗(xm) ≤ D(xm, ε∗) ≤ D(x, ε) for all m ≥Mε. So for every ε > 0,

lim supm

u∗(xm) ≤ D(x, ε).

Now let ε → 0 to obtain lim supm u∗(xm) ≤ u∗(x). We remark that we don’t require an upper

semi-continuous function to necessarily take values in [−∞,∞).

Proposition 2.6. Let x ∈ Q. If xn∞n=1 → x and for each ε > 0, there is N ∈ N such thatxn ∈ Sn(x, ε) for all n ≥ N , then

lim supn

un(xn) ≤ u∗(x).

Proof. For all δ > 0, there is ε0 > 0 and N∗ ∈ N such that for all n ≥ N∗, we have

supy∈Sn(x,ε0)

un(y) ≤ u∗(x) + δ.

So for each n ≥ max(N,N∗), using the hypothesis, we have xn ∈ Sn(x, ε0) and un(xn) ≤ u∗(x) + δ.

Therefore, for all δ > 0, there is N ∈ N such that for all n ≥ N , un(xn) ≤ u∗(x) + δ. This is

lim supn

un(xn) ≤ u∗(x).

Lemma 2.7. Suppose that for every n ∈ N, there exists Sn ⊂ Q such that for every x ∈ Q andε > 0

Sn(x, ε) = Sn ∩B (x, ε),

or alternatively,

Sn(x, ε) = Sn ∩B (x, ε) .

Let v : Q 7→ R be a continuous function and vn∞n=1 a sequence of functions, vn : Sn 7→ R, suchthat

limn→∞

supx∈Sn

|v(x)− vn(x)| = 0. (2.10)

Let (u+ v)∗ be the upper envelope of the sequence un + vn∞n=1. Then

(u+ v)∗ = u∗ + v. (2.11)

Proof. By proposition 2.4, for each x ∈ Q, there is yj∞j=1 ⊂ Q, nj∞j=1 ⊂ N, such that

yj ∈ Snj (x, εj) for some εj > 0, εj∞j=1 → 0, and

limj→∞

yj = x;

limj→∞

unj (yj) = u∗(x).

52 5. ENVELOPES

By assumption (2.2), for all ε > 0, there is J ∈ N such that yj ∈ Snj (x, ε) for all j ≥ J . Byproposition 2.6 and using the fact that the limit superior is the largest accumulation point, we have

(u+ v)∗(x) ≥ limj→∞

(unj (yj) + vn(yj)) = u∗(x) + v(x),

where the last equality is deduced from continuity of v and (2.10). But because un = (un+vn)−vn,by applying the same argument, where un + vn plays the role of un and −vn plays the role of vn,we obtain u∗(x) ≥ (u+ v)∗(x)− v(x).

Proposition 2.8. Suppose that for every n ∈ N, there exists Sn ⊂ Q such that for all ε > 0,Sn(x, ε) is a compact set of the form

Sn(x, ε) = Sn ∩B(x, ε). (2.12)

Let v : Q 7→ R be continuous and vn∞n=1 be a sequence of functions, vn : Sn 7→ R, such that

lim supx∈Sn

|v(x)− vn(x)| = 0.

Suppose that un∞n=1 is such that for all x ∈ Q and ε > 0, there exists N ∈ N such that for alln ≥ N , Sn(x, ε) is non-empty, un is defined and upper semi-continuous on Sn(x, ε).

If u∗ − v has a strict maximum at x ∈ Q over B(x, ε0) ∩ Q for some ε0 > 0, then for everyε ∈ (0, ε0), there exists nj∞j=1 ⊂ N, xj∞j=1 ⊂ Q, such that xj ∈ Snj (x, ε) and(

unj − vnj)

(xj) = maxSnj (x,ε)

(unj − vnj

)(x), (2.13a)

limj→∞

(unj − vnj

)(xj) = u∗(x)− v(x), (2.13b)

limj→∞

xj = x. (2.13c)

Proof. In view of lemma 2.7, it is sufficient to consider the case v = 0, vn = 0. By proposition2.4, there exists nj∞j=1 and yj∞j=1 such that

limj→∞

yj = x;

limj→∞

unj (yj) = u∗(x).

For n sufficiently large Sn(x, ε) is compact and non-empty and un is defined and upper semi-continuous on Sn(x, ε). Therefore the maximum of unj over Snj (x, ε) is attained for j sufficientlylarge, and for j sufficiently large, yj ∈ Snj (x, εj) ⊂ Snj (x, ε). Therefore there exists xj ∈ Snj (x, ε)satisfying (2.13a) and

unj (yj) ≤ unj (xj). (2.14)

Therefore lim infj unj (xj) ≥ u∗(x). Compactness and the fact that ε < ε0 implies that xj → y ∈B(x, ε0) up to a subsequence. By equation (2.12), for each ε sufficiently small, xj ∈ Snj (y, ε) for jsufficiently large. Thus we use proposition 2.6 to find

u∗(x) ≤ lim infj

unj (xj) ≤ lim supj

unj (xj) ≤ u∗(y). (2.15)

Since y ∈ B(x, ε0) and the maximum of u∗ is strict over B(x, ε0), this implies that y = x, otherwisethere would be a contradiction. The inequalities in (2.15) then implies that unj (xj)→ u∗(x).

3. Further results on envelopes

The following results are original and are directed towards determining which broad featuresmight be required of a non-monotone numerical scheme for finding viscosity solutions of PDE.

Although these results concern primarily the envelopes, the reader might understand better theintention after becoming acquainted with the Barles-Souganidis convergence argument of [5] formonotone numerical methods, which is presented in section 3 of chapter 6. Nonetheless, briefly said,in the Barles-Souganidis convergence argument, it is the monotonicity of the discretised operatorswhich is used to ensure that the envelopes are, ultimately, viscosity solutions of the original PDE.

3. FURTHER RESULTS ON ENVELOPES 53

For non-monotone methods, we took interest in the idea of using the degenerate ellipticity ofthe non-discretised operator as the primary means of ensuring that the envelopes are viscositysolutions. It would then seem natural to require enough differentiability of the approximations,at least on some localised subsets of the computational domain, to justify the application of thenon-discretised operator to the approximation.

Some numerical methods yield approximations that have such localised regularity, for examplemethods using discontinuous finite element spaces. One may then think of the sets Sn(x, ε) as beingthe sets over which such smoothness is guaranteed. The questions to be answered are thus whichbroad properties should these methods have (and thus which conditions on Sn(x, ε)) to enable thisstrategy?

We have found certain abstract sufficient conditions that enable us to outline the essentialfeatures of a convergence argument. Although a concrete example of an numerical scheme satisfyingthese conditions is not presented here, the aim is to show how the structure of the envelopes mightbe exploited.

3.1. Further results. We maintain the assumptions and definitions of section 2.

Lemma 3.1. Let O ⊂ Q be a compact set and let k ∈ R. Suppose that Sn(x, ε)n∈N,ε>0 satisfy

the following conditions. Assume that for all x ∈ Q, n ∈ N and ε > 0, Sn(x, ε) is an open set andthat there is Nx,ε such that

(1) x ∈ Sn(x, ε) for all n ≥ Nx,ε,(2) Sn(x, ε) ⊂ Sm(x, ε) for all Nx,ε ≤ n ≤ m.

If k ∈ R is such that supO u∗(x) < k, then there exists N ∈ N such that for all n ≥ N ,

supOun < k. (3.1)

Proof. This result is mainly an application of compactness. Because k is finite, for each x ∈ O,there is εx > 0 such that for all ε ≤ εx,

D(x, ε) < k.

By hypothesis 1, for each x ∈ O, there is Nx = Nx,εx such that x ∈ SNx(x, εx) ⊂ Sn(x, ε) for alln ≥ Nx. Therefore

SNx(x, εx)x∈Ois an open cover of O. So, there is a finite subcover given by (xi, εi, Ni)i=1...M such that

O ⊂M⋃i=1

SNi(xi, εi),

andD(xi, εi) < k.

For each xi there is Ni such that for all n ≥ Ni,

supy∈Sn(xi,εi)

un(y) < k. (3.2)

So let N∗ = maxi=1...M max(Ni, Ni). Then, for each y ∈ O and n ≥ N∗, by hypothesis 2, y ∈Sn(xi, εi) for some i. Therefore by (3.2),

un(y) ≤ supy∈Sn(xi,εi)

un(y) < k,

which after taking the supremum over y ∈ O completes the proof.

This leads to the following result, inspired by [12], but with an additional property. In chapter3, parabolic superjets and subjets were defined. For simplicity, the following result is formulatedfor elliptic superjets.

For a set Q ⊂ Rn and u : Q 7→ R, the elliptic superjet J+Qu(x) is defined to the set of all

(q, P ) ∈ Rn × S(n,R) such that for every δ > 0 there is ε > 0 such that for all h ∈ B (0, ε) ∩Q,

u(x+ h) ≤ u(x) + p · h+ hTPh+ δ |h|2 .

54 5. ENVELOPES

Theorem 3.2. Suppose that the conditions of lemma 3.1 hold. Let un∞n=1 be a collection of realvalued upper semi-continuous functions defined on an open set Q and let x0 ∈ Q. Suppose that u∗

is real valued on Q.

If (p, P ) ∈ J+Qu∗(x0) then there exists nm∞m=1, xm∞m=1 and (pm, Pm) ∈ J+

Qunm(xm) such

that xm ∈ Snm(0, εm) for some εm > 0 and

limm→∞

(xm, unm(xm), pm, Pm) = (x0, u∗(x0), p, P ). (3.3)

Proof. Without loss of generality, assume that x0 = 0. For each δ > 0, there is ε > 0 suchthat for all x ∈ B(0, ε) ⊂ Q

u∗(x)−(p · x+ xTPx+ δ |x|2

)≤ u∗(0).

Write v(x) = p · x+ xTPx+ δ |x|2. Then, with similar arguments to lemma 2.7,

(u− v)∗(x) ≤ u∗(x0).

By proposition 2.4, there is nj∞j=1 and yj∞j=1, yj ∈ Snj (0, ε), with yj → 0 and unj (yj) →u∗(0), as j → ∞. Since unj is upper semi-continuous and real valued, let xj ∈ Snj (0, ε) be a

chosen maximum point of unj (x)− v(x)− δ |x|2. Then, in particular,

unj (yj)− v(yj)− δ |yj |2 ≤ unj (xj)− v(xj)− δ |xj |2 . (3.4)

By compactness of Sn(0, ε), up to a subsequence, xj∞j=1 → x. For all µ > 0, by lemma 3.1, there

is N such that for all n ≥ N and x ∈ Sn(0, ε),

un(x)− v(x) < u∗(x0) + µ. (3.5)

So we have unj (xj)− v(xj) < u∗(x0) + µ for nj ≥ N and hence

lim supj

[unj (xj)− v(xj)

]≤ u∗(0). (3.6)

So we have from (3.4), where j →∞ and from (3.6)

u∗(0) ≤ u∗(0)− δ |x|2 .Since u∗ is real valued, this implies x = 0. Again by compactness, this implies that the entiresequence xj → 0 as j → ∞. Since v is continuous and v(0) = 0, from (3.4) we have u∗(0) ≤lim infj unj (xj). Hence unj (xj)→ u∗(0) as j →∞.

We will use the conditions of lemma 3.1 again. Assume without loss of generality, that n1 ≥ N0,ε

where N0,ε is given by hypotheses 1 and 2 of lemma 3.1. By openness of Sn1(0, ε), and by hypothesis2 of lemma 3.1, we conclude there is an open neighbourhood of 0 contained in

⋂∞j=1 Snj (0, ε).

Since xj eventually reaches this neighbourhood, we conclude that there is Jδ such that for allj ≥ Jδ, xj ∈ Snj (0, ε). Therefore the openness of Sn(0, ε) and Q implies that for j ≥ Jδ

(pj , Pj) = (p+ 4δxj + Pxj , P + 4δI) ∈ J+Qunj (xj).

Finally, we may inductively choose δm → 0 and using m > Jδm to define the desired quantities, weobtain the required result.

Remark. The proof furthermore tells us that we may take εmm → 0.

3.2. Application to non-monotone methods. We now sketch some of the main ideasthat might find use in attempting to show convergence of a non-monotone numerical method.We will consider an abstract numerical method which satisfies a number of assumptions statedbelow. It is not known to us if a method verifying these conditions exists, yet the interest here isprimarily the strategy behind the arguments used to justify the viscosity properties of the envelopes.Improvements in the preceding results would allow weakenings of the assumptions on the scheme.

Consider the equationF(x,Du(x), D2u(x)

)= 0 on U, (3.7)

where U ⊂ Rn is a bounded open set and F is a degenerate elliptic operator, F continuous onU × Rn × S(n,R).

3. FURTHER RESULTS ON ENVELOPES 55

For a parameter h > 0, let Gh =xhiNhi=1⊂ U be a finite set of points and uh a real valued

upper-semicontinuous function.

We assume that there exists a function f : (0,∞) 7→ (0,∞), such that for every ε > 0, thereexists h0 > 0, C ≥ 0 such that for all h < h0

supx∈U

infGh

∣∣∣x− xhi ∣∣∣ < ε; (3.8a)

uh is twice differentiable on⋃Gh

B(xhi , f(ε)

); (3.8b)

maxGh

supy∈B(xhi ,f(ε))

∣∣F (y,Duh(y), D2uh(y))∣∣ < ε; (3.8c)

supx∈U|uh(x)| ≤ C. (3.8d)

Lemma 3.3. Define for x ∈ U and ε, h > 0,

Sh(x, ε) = U ∩B (x, ε) ∩⋃Gh

B(xhi , f(ε)

), (3.9)

Let hn∞n=1 → 0 be a monotone sequence of strictly positive real numbers. For shorthand writeSn(x, ε) = Shn(x, ε). Then the conditions of proposition 2.5, of lemma 3.1 and of theorem 3.2 areall satisfied.

Proof. Firstly, Sh(x, ε) is open, as it is a finite intersection of open sets. Secondly, for eachε > 0, by assumption (3.8a) there exists h0 = h0 (min ε, f(ε)) such that for all x ∈ U , for allh < h0 there exists xhi ∈ Gh with ∣∣∣x− xhi ∣∣∣ < min ε, f(ε) ;

hence x ∈ Sh(x, ε) for h < h0, and the sets are eventually non-empty. Furthermore, for h < h0, theprevious statement gives that

U ⊂⋃Gh

B(xhi , f(ε)

),

hence Sh(x, ε) = U ∩B(x, ε) for h < h0. Since hn∞n=1 is a monotone sequence tending to 0, thereexists N such that for n ≥ m ≥ N , i.e. hn < hm < h0, Sn(x, ε) = Sm(x, ε) = B(x, ε). So thehypotheses of lemma 3.1 and theorem 3.2 are satisfied.

Finally for any x ∈ U , for y ∈ B (x, ε/2) and n ≥ N ,

Sn (y, ε/2) ⊂ Sn (x, ε) .

This shows that Sh(x, ε) satisfies the conditions of proposition 2.5.

Proposition 3.4. With the definitions of lemma 3.3 and the above assumptions, define u∗ to bethe upper envelope of the sequence un∞n=1 = uhn

∞n=1,

u∗(x) = limε→0

lim supn

supy∈Sn(x,ε)

un(y).

Then u∗ is a real valued upper semi-continuous function on U and u∗ is a viscosity subsolution of(3.7).

Proof. Assumption (3.8d) and the fact that uh are upper-semicontinuous shows that |u∗| ≤ Con U , thus it is real valued. Lemma 3.3 shows that proposition 2.5 holds, thus u∗ is upper semi-continuous on U .

To show that u∗ is a subsolution, we argue by contradiction. Assume that there exists µ > 0,x ∈ U and (p, P ) ∈ J+

U u∗(x) such that

F (x, p, P ) ≥ µ > 0. (3.10)

By theorem 3.2, there exists sequences nm and

limm→∞

(xm, unm (xm) , pm, Pm) = (x, u∗(x), p, P ) ;

56 5. ENVELOPES

with (pm, Pm) ∈ J+U unm (xm), xm ∈ Snm (x, εm) for some εm > 0, εm → 0 as m→∞.

By continuity of F , there is δ > 0 such that∣∣∣F (y, p, P)− F (x, p, P )∣∣∣ < µ

4,

if|x− y|+ |p− p|+ ‖P − P‖ < δ.

By convergence of (xm, pm, Pm) and by hypothesis (3.10), there is therefore M1 such that for allm ≥M1

F (xm, pm, Pm) ≥ 3µ

4.

By assumptions (3.8b) and (3.8c), there exists M2 such that for m ≥M2

maxGm

supy∈B(xmi ,f(µ/4))

∣∣F (y,Dunm(y), D2unm(y))∣∣ < µ

4.

Since εm → 0, there is M3 such that for all m ≥M3, xm ∈ Snm (x, εm) ⊂ Snm (x, µ/4).

Because unm is twice differentiable on the open set Snm(x, µ/4) and (p, P ) ∈ J+U unm(xm), we

have pm = Dunm (xm) and D2unm (xm) ≤ P .

Therefore, for m ≥ max(M1,M2,M3), by hypothesis (3.8b) and by degenerate ellipticity of F

F (xm, pm, Pm) ≤ F(xm, Dunm(xm), D2unm(xm)

)<µ

4.

This contradicts inequality (3.10), hence for any (p, P ) ∈ J+U u∗(x)

F (x, p, P ) ≤ 0,

thus showing that u∗ is a viscosity subsolution of (3.7).

The remainder of a strategy for showing convergence to a viscosity solution would then besimilar to the Barles-Souganidis convergence argument. One would show that u∗ is a viscositysupersolution and provided u∗ = u∗ on the boundary ∂U , one would use a comparison property toestablish that u∗ = u∗ on ∂U , hence showing convergence of the numerical scheme to a viscositysolution of (3.7).

CHAPTER 6

Monotone Finite Difference Methods

1. Introduction

This chapter is about how monotone numerical schemes can be used to approximate the viscositysolution of HJB equation. In particular, this chapter explores the Barles-Souganidis convergenceargument, originally detailed in [5] and also presents some recent advances due to Barles andJakobsen, in [4], on obtaining error rates for the unbounded domain problem.

These two principal theoretical results are general in the sense that they apply to any numericalmethod satisfying certain conditions. This chapter illustrates these results through the Kushner-Dupuis finite difference scheme.

Proving convergence of numerical methods to viscosity solutions is not a simple task, in partbecause there is no operator equation for the viscosity solution. For this reason, the emphasis ofthis chapter is on the issue of convergence.

The plan for this chapter is the following. After a brief reminder on difference methods, insection 2 we introduce the Kushner-Dupuis scheme and analyse some of its important properties.Following this, section 3 presents the Barles-Souganidis convergence argument to prove that thelimiting upper (lower) envelope of a monotone finite difference method is a subsolution (supersolu-tion) of the HJB equation.

Obtaining convergence rates for these methods has long been an outstanding problem, so wewill review some recent results found in [4] for the unbounded domain problem, the emphasis beingon how it applies to the Kushner-Dupuis scheme. Finally in 5 we report the results of a numericalexperiment of the Kushner-Dupuis scheme on a model problem, with usage of the semi-smoothNewton methods described in chapter 4.

1.1. Basics of finite difference methods. The family of finite difference methods approxi-mate the solution u to a PDE on a set O = U × (0, T ) by a function uh that is defined on a finiteset of points, called the grid.

The grid, denoted Gh, can have a complicated structure; for example there may be regions withdifferent levels of refinement. Finite difference methods can be used when U has a complicatedgeometry, but it is then more difficult to implement the scheme and boundary conditions. Thereforeit is usual to assume that U is a cube in Rn, so that after a possible change in origin and lengthscale, U = [0, 1]n.

It is also common practice to assume that the grid is (spatially) equispaced in order to simplifythe analysis. This means it is assumed that there exists h = (∆t,∆x) ∈ R2, ∆t,∆x > 0, K,M ∈ N,

such that the set k∆tKk=0 is an equipartition of [0, T ] with interval length ∆t and the set i∆xMi=0is an equipartition of [0, 1], with interval length ∆x. Then the grid is

Gh = k∆tKk=0 × ∆x (i1, i2, . . . , in) | 0 ≤ ij ≤M, j = 1, . . . , n . (1.1)

The total number of points on the grid is (K + 1)N := (K + 1)(M + 1)n. For non-triviality, weassume that K,M ≥ 2.

Let G+h = Gh∩O, ∂Gh = Gh∩∂O, ∂O the backward parabolic boundary of O. A generic point

of the grid is denoted (xi, tk), with i ∈ 1, . . . , N, k ∈ 0, . . . ,K. In particular, (xi, tk) ∈ G+h if

and only if 0 ≤ k < K and xi = ∆x (i1, i2, . . . , in) with 0 < ij < M , j = 1 . . . n. It is possible tochoose the labelling such that

G+h = (xi, tk) | 1 ≤ i ≤ Nh, 0 ≤ k ≤ K − 1 , (1.2)

57

58 6. MONOTONE FINITE DIFFERENCE METHODS

with Nh < N .

A grid function v ∈ Cb(Gh) if v : Gh 7→ R. By finiteness of Gh, v is automatically continuous andbounded. A finite difference operator Fh is a map from Cb(Gh) to Cb(Gh): F : v 7→ F (v) ∈ Cb(Gh).For a choice of operator Fh a finite difference scheme is to solve Fh(uh) = 0 in Gh.

The next section analyses a particular choice of scheme, called the Kushner-Dupuis scheme.This scheme will serve as the main example for the principal two theoretical results in this chapter.

2. The Kushner-Dupuis scheme

2.1. Spatial discretisation. The finite difference operators used for the Kushner-DupuisScheme are

∆+t v(x, t) =

1

∆t

(v(x, t+ ∆t)− v(x, t)

),

∆+i v(x, t) =

1

∆x

(v(x+ ei∆x, t)− v(x, t)

),

∆−i v(x, t) =1

∆x

(v(x, t)− v(x− ei∆x, t)

),

∆iiv(x, t) =1

∆x2

(v(x+ ei∆x, t) + v(x− ei∆x, t)− 2v(x, t)

),

∆+ijv(x, t) =

1

2∆x2

(2v(x, t) + v(x+ ei∆x+ ej∆x) + v(x− ei∆x− ej∆x)

)− 1

2∆x2

(v(x+ ei∆x) + v(x− ei∆x) + v(x+ ej∆x) + v(x− ej∆x)

);

∆−ijv(x, t) =1

2∆x2

(v(x+ ei∆x) + v(x− ei∆x) + v(x+ ej∆x) + v(x− ej∆x)

)− 1

2∆x2

(2v(x, t) + v(x+ ei∆x− ej∆x) + v(x− ei∆x+ ej∆x)

).

Recall that for a function f , the positive part is f+ = max(f, 0) and negative part is f− =max(−f, 0) so that f = f+ − f− and |f | = f+ + f−. The operators

Lαw(x, t) = −Tr a(x, t, α)D2w(x, t)− b(x, t, α) ·Dw(x, t)

is discretised by Lαh : Cb(Gh) 7→ Cb(G+h ), defined for (x, t) ∈ G+

h by

Lαhv(x, t) =−n∑i=1

(aii(x, t, α)∆iiv(x, t) +

∑j 6=i

[a+ij(x, t, α)∆+

ijv(x, t)− a−ij(x, t, α)∆−ijv(x, t)] )

−n∑i=1

[b+i ∆+

i v(x, t)− b−i ∆−i v(x, t)].

(2.1)

For each t ∈ tkKk=0, the restriction of Lαh to time t defines the spatial operator

Lα,th : Cb

(xiNi=1

)7→ Cb

(xiNhi=1

)by

Lα,th v(xi) = Lαh v(xi, tk), 1 ≤ i ≤ Nh, (2.2)

where v is some extension of v to Cb(Gh).

The calculations in the proof of the next lemma form the essence of the monotonicity propertiesof the Kushner-Dupuis scheme.

Lemma 2.1 (Discrete Maximum Principle). For any h > 0, α ∈ Λ and 0 ≤ k ≤ K, if v ∈Cb(xiNi=1) has a local maximum at xr, 1 ≤ r ≤ Nh then

Lα,tkh v(xr, tk) ≥ 0,

if and only if for every xr, 1 ≤ r ≤ Nh , a(xr, tk, α) is weakly diagonally dominant.

2. THE KUSHNER-DUPUIS SCHEME 59

Proof. For shorthand, let us write v(ei) for v(xr +ei∆x, tk) and v(±ei±ej) for v(xr±ei∆x±ej∆x, tk) and let us omit arguments. From the definitions of the difference operators we find that

Lα,tkh v(xr) = − 1

∆x2

n∑i=1

(aαii − 1

2

n∑j 6=i

∣∣aαij∣∣ ) (v(ei) + v(−ei)− 2v)− 1

2

n∑j 6=i

∣∣aαij∣∣ (v(ej) + v(−ej)− 2v)

− 1

2∆x2

n∑i=1

n∑j 6=i

[aα+ij (v(ei + ej) + v(−ei − ej)− 2v) + aα−ij (v(ei − ej) + v(ej − ei)− 2v)

]− 1

∆x

n∑i=1

[bα+i (v(ei)− v) + bα−i (v(−ei)− v)

]. (2.3)

Since a is symmetric,n∑i=1

n∑j 6=i

∣∣aαij∣∣ (v(ej) + v(−ej)− 2v) =

n∑j=1

n∑i 6=j

∣∣aαij∣∣ (v(ej) + v(−ej)− 2v)

=

n∑i=1

n∑j 6=i

∣∣aαji∣∣ (v(ei) + v(−ei)− 2v)

=n∑i=1

n∑j 6=i

∣∣aαij∣∣ (v(ei) + v(−ei)− 2v) ;

where the second equation was obtained by interchanging the labelling i↔ j. So

Lα,tkh v(xr) = − 1

∆x2

n∑i=1

aαii −∑j 6=i

∣∣aαij∣∣ (v(ei) + v(−ei)− 2v)

− 1

2∆x2

n∑i=1

n∑j 6=i

[aα+ij (v(ei + ej) + v(−ei − ej)− 2v) + aα−ij (v(ei − ej) + v(ej − ei)− 2v)

]− 1

∆x

n∑i=1

[bα+i (v(ei)− v) + bα−i (v(−ei)− v)

]. (2.4)

Suppose that v ∈ Cb(xiNi=1) has a local maximum at xr with 1 ≤ r ≤ Nh and that a(xr, tk, α) isweakly diagonally dominant. Then in equation (2.4), all summands are negative, hence

Lα,tkh v(xr) ≥ 0.

For the converse, suppose a(xr, tk, α) is not weakly diagonally dominant on row s. Let

v(xi) =

1 if xi = xr, xi = xr ± ej∆x, for j 6= s, or if xi = xr ± ek∆x± ej∆x, k 6= j,

0 otherwise.

Then

Lα,tkh v(xr) =2

∆x2

ass(xr, tk, α)−n∑j 6=s|asj(xr, tk, α)|

+1

∆x|bs(xr, tk, α)| .

For ∆x small enough, Lα,tkh v(xr) is therefore strictly negative. Therefore diagonal dominance of ais necessary for the discrete maximum principle.

The following consistency estimate will be important in later sections.

Lemma 2.2. For ϕ ∈ C∞(U) and all (xi, tk) ∈ G+h ,∣∣∣Lαϕ(xi, tk)− Lα,tkh ϕ(xi)

∣∣∣ ≤ C (∆x2‖a‖∞‖D4xϕ‖∞ + ∆x‖b‖∞‖D2

xϕ‖∞). (2.5)

Proof. This follows from standard estimates on the truncation error for the finite differenceformulas ∆+

i , ∆−i , ∆+ij , ∆−ij and ∆ii. See [4].


2.2. Matrix and stencil representations. If Cb(xiNi=1) and Cb(xiNhi=1) are given as basis

elements vi(xj) = δij , then Lα,tkh admits a matrix representation Aα,tkh an Nh×N matrix, with theproperty that

Lα,tkh v(xi) =N∑j=1

(Aα,tkh

)ijv(xj).

This paragraph shows how to interpret the discrete maximum principle property of lemma 2.1 interms of the signs of the entries of Aα,tkh . To do this, it is helpful to use the stencil representation

of Lα,tkh .

Definition 2.3. The stencil representation of Lα,tkh is a set S ⊂ Nn and a collection of real numbersLα,tkh (xi, β) | 1 ≤ i ≤ Nh, β ∈ S

,

such that

Lα,tkh v(xi) =∑β∈S

Lα,tkh (xi, β) (v(xi + β∆x)− v(xi)) . (2.6)

From equation (2.4) we see that for the Kushner-Dupuis scheme, the stencil representation is

S = ±ei,±ei ± ej | i 6= j

and

Lα,tkh (xr,±ei) = − 1

∆x2

aii(xr, tk, α)−∑j 6=i|aij(xi, tk, α)|

− 1

∆xb±i (xr, tk, α), (2.7)

Lα,tkh (xr,±(ei + ej)) = − 1

2∆x2a+ij(xr, tk, α), (2.8)

Lα,tkh (xr,±(ei − ej)) = − 1

2∆x2a−ij(xr, tk, α). (2.9)

Finally the matrix representation can be found from the stencil representation through(Aα,tkh

)ii

= −∑β∈S

Lα,tk(xi, β), 1 ≤ i ≤ Nh, (2.10)

and (Aα,tkh

)ij

=

Lα,tk(xi, β) if xj = xi + β∆x, β ∈ S,0 otherwise.

(2.11)

Proposition 2.4. If a(xi, tk, α) is weakly diagonally dominant for all (xi, tk) ∈ Gh, then all diag-

onal terms(Aα,tkh

)ii, 1 ≤ i ≤ Nh, are positive and all off-diagonal terms

(Aα,tkh

)ij

are negative.

Furthermore, if a(xi, tk, α) is strictly diagonally dominant for all (xi, tk) ∈ Gh, the principal Nh×Nh

sub-matrix of Aα,tkh is a M-matrix.

Proof. Equations (2.10) and (2.11) show that diagonal dominance of a implies that Aα,tkh haspositive diagonal entries and negative off diagonal entries. It is also clear that

N∑j=1

(Aα,tkh

)ij

=(Aα,tkh

)ii−

N∑j 6=i

∣∣∣∣(Aα,tkh

)ij

∣∣∣∣ = 0.

From the assumption on the grid, there exists xr ∈ G+h and xj ∈ ∂Gh such that xj = xr + ei∆x.

Therefore equation (2.7) gives(Aα,tkh

)rr−

Nh∑s 6=i

∣∣∣(Aα,tkh

)rs

∣∣∣ ≥ ∣∣∣∣(Aα,tkh

)rj

∣∣∣∣ > 0.

Furthermore, equations (2.7) and (2.11) show that the Nh × Nh principal submatrix of Aα,tk isirreducible. This is because for xi and xj neighbouring nodes on the spatial grid, i.e. xi = xj±ek∆x

3. THE BARLES-SOUGANIDIS CONVERGENCE ARGUMENT 61

for some k,(Aα,tk

)ij

is non-zero. Any two points on the spatial grid can be connected by a path

of immediate neighbours, thus showing the graph of Aα,tk is connected, hence Aα,tk is irreducible.

Proposition 2.2 of appendix B implies that(Aα,tkh

)ij, 1 ≤ i, j ≤ Nh is a M-matrix.

2.3. Time discretisation. The θ-method for the Kushner-Dupuis scheme is to solve

−∆+t uh(xi, tk) + max

α∈Λ

[θ(Lαhuh(xi, tk)

)+ (1− θ)

(Lαhuh(xi, tk+1)

)− f(xi, tk, α)

]= 0, (2.12a)

uh = g on ∂Gh; (2.12b)

for all (xi, tk) ∈ G+h and fixed θ ∈ [0, 1]. If θ > 0 the scheme is implicit and a nonlinear system

must be solved at each step in time.

Remark 2.5 (Solution of the Discrete Problem). As a result of the assumptions of compactnessof Λ and continuity of the Lα, example 2.7 of chapter 4 outlined how it is possible to choose ∆tsmall enough such that a theorem on existence and uniqueness of solutions, theorem 2.6 of chapter4, can be applied.

The timestep ∆t can also be used to ensure that the matrices used in the semi-smooth Newtonmethod have uniformly bounded inverses. Since the discretisations of the operators Lα are linear,proposition 3.9 can then be used to show that convergence of the semi-smooth Newton method isglobal.

In section 5, the Kushner-Dupuis scheme is applied to a model problem. In particular, resultson the performance of the semi-smooth Newton method are presented.

In the next sections this scheme will be analysed as follows. First we consider the BarlesSouganidis convergence argument from [5] to show how monotonicity of the scheme ensures thatthe limiting envelopes have the viscosity property. To complete the proof of convergence wouldthen require to show uniform convergence of the envelopes to the boundary data, but this is notpursued here, although the reader may consult [16] for details in the case of the explicit method.

Instead, a summary of the results from [4] is given, and it will be shown how to use their resultsto obtain a convergence rate for the unbounded domain problem.

3. The Barles-Souganidis convergence argument

The Barles-Souganidis convergence argument was originally set out in [5], which showed whichgeneral properties guarantee convergence of a numerical scheme to the viscosity solution1. In par-ticular, consistent and monotone numerical methods guarantee at least that the limiting envelopes,as defined in chapter 5 section 2, have the viscosity property, i.e. the upper envelope is a subsolutionand the lower envelope is a supersolution to the HJB equation.

In order to make precise the monotonicity property and for the reader’s convenience, we refor-mulate the scheme in the notation of [5] and [4]. For h = (∆t,∆x) > 0, (x, t) ∈ G+

h , r ∈ R andv ∈ Cb(Gh), define [v]x,t (y, s) = v(x+ y, t+ s). Let

S(h, xi, tk, r, [v]x,t

)= max

α∈Λ

[(1

∆t+ θ

(Aα,tkh

)ii

)r + θ

∑β∈S

Lα,tkh (xi, β) [v]x,t (β∆x, 0)

−(

1

∆t− (1− θ)

(Aα,tk+1

h

)ii

)[v]xi,tk (0,∆t)+(1−θ)

∑β∈S

Lα,tk+1

h (xi, β) [v]xi,tk (β∆x,∆t)+f(xi, tk, α)

].

(3.1)

The θ-method for the Kushner-Dupuis scheme is thus equivalent to

S(h, xi, tk, uh(xi, tk), [uh]xi,tk

)= 0 on G+

h (3.2a)

uh = g on ∂Gh (3.2b)

The scheme has the following consistency properties with the HJB operator.

1Their article treats convergence to discontinuous viscosity solutions, which is beyond the scope of this work.


Proposition 3.1 (Consistency). Let ϕ ∈ C(2,1)(O). For any sequence (xim , tkm)∞m=1, such that

(xim , tkm) ∈ G+hm

for all m ∈ N, where hm → 0 and

limm→∞

(xim , tkm) = (x, t) ∈ O,

we have

limm→∞

S(hm, xim , tkm , ϕ(xi, tk), [ϕ]xi,tk

)= −ϕt(x, t) +H

(x, t,Dxϕ(x, t), D2

xϕ(x, t)). (3.3)

Theorem 3.2 (Monotonicity). Suppose that for every (x, t, α) ∈ O × Λ, a(x, t, α) is diagonallydominant. For h > 0 suppose that

(1− θ) ∆t

∆x2

2n∑i=1

aii(xi, tk, α)− 1

2

n∑j 6=i|aij(xi, tk, α)|

+ ∆xn∑i=1

|bi(xi, tk, α)|

≤ 1, (3.4)

for all (xi, tk, α) ∈ Gh × Λ. Then the following monotonicity property holds. For any f(t) =a+ b(T − t), a, b ∈ R, and u ≤ v ∈ Cb(Gh), we have

S(h, xi, tk, r + f(tk), [u+ f ]xi,tk

)≥ S

(h, xi, tk, r, [v]xi,tk

)+ b for all (xi, tk) ∈ G+

h . (3.5)

Proof. Equations (2.7) and (2.10) show that equation (3.4) is equivalent to

1

∆t− (1− θ)

(Aα,tkh

)ii≥ 0.

Therefore, proposition 2.4 gives

−(

1

∆t− (1− θ)

(Aα,tk+1

h

)ii

)[u]xi,tk (0,∆t) + (1− θ)

∑β∈S

Lα,tk+1

h (xi, β) [u]xi,tk (β∆x,∆t)

≥ −(

1

∆t− (1− θ)

(Aα,tk+1

h

)ii

)[v]xi,tk (0,∆t) + (1− θ)

∑β∈S

Lα,tk+1

h (xi, β) [v]xi,tk (β∆x,∆t),

andθ∑β∈S

Lα,tkh (xi, β) [u]x,t (β∆x, 0) ≥ θ∑β∈S

Lα,tkh (xi, β) [v]x,t (β∆x, 0).

Furthermore, Lαhf(tk) = 0 for all 0 ≤ k ≤ K. As a result,

S(h, xi, tk, r + f(tk), [u+ f ]xi,tk

)= S

(h, xi, tk, r, [u]xi,tk

)−∆+

t f(tk)

≥ S(h, xi, tk, r, [v]xi,tk

)+ b.

The Barles-Souganidis convergence argument shows that for a numerical method that satisfiesthe conclusions of proposition 3.1 and theorem 3.2, the numerical solutions will converge to theviscosity solution of the continuous problem uniformly on O if they converge uniformly to theboundary data on the parabolic boundary of the domain ∂O. Recall that for (x, t) ∈ O and ε > 0,

B (x, t; ε) =

(y, s) ∈ O | |x− y|+ |t− s| < ε.

Theorem 3.3 (Barles-Souganidis). [5]. Let hmm∈N be a sequence with hm = (∆tm,∆xm) → 0as m→∞, hm > 0 and let Gm = Ghm define a sequence of grids. Consider an abstract scheme ofthe form (3.2) with solutions umm∈N, um ∈ Cb(Gm) for m ∈ N.

Define for (x, t) ∈ O, Sm(x, t; ε) = B(x, t; ε) ∩ Gm and let the upper and lower envelopes ofumm∈N be defined by

u∗(x, t) = limε→0

lim supm

sup um(xi, tk) | (xi, tk) ∈ Sm(x, t; ε) ; (3.6a)


lim infm

inf um(xi, tk) | (xi, tk) ∈ Sm(x, t; ε) . (3.6b)

If for every m ∈ N, the scheme satisfies the conclusions of proposition 3.1 and theorem 3.2, thenu∗ and u∗ are respectively a viscosity subsolution and a viscosity supersolution of the HJB equation

− ut +H(x, t,Dxu,D

2xu)

= 0 on O. (3.7)

3. THE BARLES-SOUGANIDIS CONVERGENCE ARGUMENT 63

If furthermore the comparison property holds for (3.7), see theorem 3.11 of chapter 3, and u∗ =u∗ = g on ∂O then umm∈N tends uniformly to the unique viscosity solution u of

−ut +H(x, t,Dxu,D

2xu)

= 0 on O; (3.8a)

u = g on ∂O; (3.8b)

on compact subsets of U × (0, T ], in the sense that for every Q ⊂⊂ U × (0, T ],

limm→∞

max(xi,tk)∈Gh∩Q

|um(xi, tk)− u(xi, tk)| = 0. (3.9)

Proof. 1. u∗ is a viscosity subsolution. It is not difficult to show that Sm(x, t; ε) satis-fies the condition of proposition 2.5 of chapter 5; and by assumption u∗ is real valued, thereforeproposition 2.5 shows that u∗ ∈ USC(O).

Let (x, t) ∈ O and suppose that (q, p, P ) ∈ P+u∗(x, t). By theorem 3.3 of chapter 3, there exists

ϕ ∈ C(2,1)(O)

such that u∗−ϕ has a strict maximum at (x, t) ∈ O, with q = ϕt(x, t), p = Dxϕ(x, t)

and P = D2xϕ(x, t).

By proposition 2.8 of chapter 5, for every ε > 0 there exists subsequences, denoted here byhm and (xim , tkm) ∈ Gm such that (xim , tkm) satisfy

limm→∞

(xim , tkm) = (x, t);

limm→∞

um(xim , tkm)− ϕ(xim , tkm) = limm→∞

max(xi,tk)∈Sn(x,t;ε)

um(xi, tk)− ϕ(xi, tk) = u∗(x, t)− ϕ(x, t).

By choosing ε small enough such that B(x, t; ε) ⊂ O, we conclude that (xim , tkm) ∈ G+m and after

possibly modifying ϕ outside of B(x, t; ε), we may assume that

um(xim , tkm)− ϕ(xim , tkm) = max(xi,tk)∈Gm

um(xi, tk)− ϕ(xi, tk).

Setting µm = um(xim , tkm)− ϕ(xim , tkm) and using theorem 3.2 with f = −µm and v = ϕ,

S(hm, xim , tkm , um(xim , tkm)− µm, [um − µm]xim ,tkm

)≥ S

(hm, xim , tkm , ϕ(xim , tkm), [ϕ]xim ,tkm

).

(3.11)By taking u = v = um and f = −µm followed by u = v = um − µm and f = µm, theorem 3.2 alsoimplies that

S(hm, xim , tkm , um(xim , tkm)− µm, [um − µm]xim ,tkm

)= S

(hm, xim , tkm , um(xim , tkm), [um]xim ,tkm

).

The definition of the scheme and equation (3.11) imply that

S(hm, xim , tkm , ϕ(xim , tkm), [ϕ]xim ,tkm

)≤ 0. (3.12)

By taking the limit of the above inequality, proposition 3.1 then implies that

−ϕt(x, t) +H(x, t,Dxϕ(x, t), D2

xϕ(x, t))≤ 0.

Hence u∗ is a viscosity subsolution of the HJB equation on O.

2. u∗ is a viscosity supersolution. The proof for this part is in every way identical to theprevious one because all of the results used previously have their equivalents for subjets and lowerenvelopes.

3. Convergence to the viscosity solution. Assume now that u∗ = u∗ = g on ∂O. By thecomparison property, theorem 3.11 of chapter 3, we conclude that

u∗ ≤ u∗ on U × (0, T ].

By definition of the envelopes, u∗ ≥ u∗ on O. Therefore u∗ = u∗ on U × (0, T ] is a continuous func-tion that is both a viscosity subsolution and supersolution that satisfies the boundary conditions.

By corollary 3.13 of chapter 3, u = u∗ = u∗ is the unique viscosity solution of the HJB equation.

We now show that the entire sequence um - and not just the subsequence considered previously- tends uniformly to u on Q ⊂⊂ U × (0, T ] in the sense that

limm→∞

max(xi,tk)∈Gh∩Q

|um(xi, tk)− u(xi, tk)| = 0. (3.13)


Let ε > 0. Since u ∈ C (Q) is uniformly continuous, there exists δ > 0 such that if (x, t), (y, s) ∈ Owith |x− y| + |t− s| < δ, then |u(x, t)− u(y, s)| < ε. By definition of the envelopes, for all(x, t) ∈ Q, there exists δx,t ∈ (0, δ) such that

lim supm

supSm(x,t;δx,t)

um(xi, tk)− u(x, t) < ε; (3.14)

lim infm

infSm(x,t;δx,t)

um(xi, tk)− u(x, t) > ε. (3.15)

Therefore there exists Mx,t such that for all m ≥Mx,t, and (xi, tk) ∈ S (x, t; δx,t)

|um(xi, tk)− u(x, t)| < ε.

Because Q is compact, there exists an open coverB(xj , tj , δxj ,tj

)Jj=1

of Q. Because Sm(x, t; ε) =

Gm ∩ B(x, t; ε), for any m ≥ maxMxj ,tj

Jj=1

and (y, s) ∈ Gm, there exists (xj , tj) such that

|y − xj |+∣∣s− tj∣∣ < δxj ,tj and thus ∣∣um(y, s)− u(xj , tj)

∣∣ < ε.

By uniform continuity of u, using the fact that δxj ,tj < δ, for any (y, s) ∈ Gm, m ≥ maxMxj ,tj

Jj=1

,

|um(y, s)− u(y, s)| < 2ε;

which proves uniform convergence in the sense of equation (3.13).

Remark 3.4 (Applicability of the Barles Souganidis Argument). The proof of theorem 3.3 makesuse of all the major results on the theory of viscosity solutions that have been presented for theHJB equation in previous chapters. In particular, the strict maximum property of theorem 3.3 wascombined with the theory of envelopes in order to relate the superjet of the envelope to superjets(over the grid) of the numerical solutions. Most importantly, the comparison property guaranteedconvergence of the numerical solutions as a whole.

The argument is in principle applicable to other PDE that have, amongst other properties, acomparison property. In terms of numerical methods, it is not restricted to the Kushner-Dupuisscheme, nor monotone finite difference methods in general - the main requirements are monotonicityand consistency, but also some properties on the sets of approximation Sm(x, t; ε). By abstractingthe theory on envelopes, our aim has been to highlight the importance of the structure of the setsSm(x, t; ε) on which the numerical solution approximates the viscosity solution.

Remark 3.5. Some of the assumptions used in theorem 3.3 are left unverified in this work. Inparticular, justifying the convergence to the boundary values requires additional work. A first reasonfor not pursuing this route here is that these issues arise again and will be treated in chapter 7 onfinite element methods. A second reason is that [16, chapter 9] treats this issue for finite differencemethods.

4. Convergence rates for the unbounded domain problem

This section reports recent findings by Barles and Jakobsen from [3] and [4], in which errorbounds are proven for large classes of finite difference methods. Their results apply to the un-bounded domain problem U = Rn. According to their paper, the first findings for error boundsfor the second order HJB equation were found by Krylov in 1997 and 2000 - whereas optimal errorrates for first order equations have been known since the 1980s.

This has therefore been a difficult problem and although their findings are not directly applicableto the bounded domain problem considered so far, it is nonetheless encouraging. The purpose hereis to show what their results are and how to apply them to specific schemes in order to deriveconvergence rates.

4. CONVERGENCE RATES FOR THE UNBOUNDED DOMAIN PROBLEM 65

4.1. Assumptions on the problem. Under the usual notation, the HJB equation to besolved is

−ut +H(x, t,Dxu,D

2xu)

= 0 on Rn × (0, T ); (4.1a)

u = g on Rn × T . (4.1b)

In addition to the assumptions of chapter 1, it is assumed that g(·), σ(·, ·, α), b(·, ·, α), f(·, ·, α) arebounded uniformly in α in the following norm:

‖v‖1 = ‖v‖L∞(R×Rn) + [v]1 ; (4.2)

where

[v]1 = sup(x,t)6=(y,s)

|v(x, t)− v(y, s)||x− y|+

√|t− s|

. (4.3)

In chapter 1, inequalities (2.1) guarantee the Holder continuity conditions of (4.2) holds for a andb. Therefore the newly introduced assumptions are that σ, b, f, g are uniformly bounded and thatf and g satisfies the Holder continuity condition.

Previously, it was assumed for simplicity that Λ was compact. For the following results toapply, this assumption may be weakened to only assuming that Λ is a seperable metric space, notnecessarily compact.

4.2. Assumptions on the scheme. The results achieved in [4] apply to any scheme satisfyingthe assumptions that will soon be stated. However, to be concrete, we will show how it applies tothe Kushner-Dupuis scheme. Because the HJB equation is to be solved on an unbounded domain,it is assumed that the finite difference grid is infinite:

Gh = tkKk=0 × ∆x (i1, . . . , in) | ij ∈ Z, 1 ≤ j ≤ n . (4.4)

This is not practicable, so there is room for future development to truncated grids and boundeddomain problems. As before denote G+

h = Gh∩O and ∂Gh = Gh∩∂O. A grid function v ∈ Cb(Gh)if it is bounded. Consider an abstract scheme written as

S(h, xi, tk, uh(xi, tk), [uh]x,t

)= 0 on G+

h , (4.5a)

with terminal condition

uh = g on ∂Gh. (4.5b)

For example this may be the Kushner-Dupuis scheme. The first condition on the scheme is that itis monotone, in a similar way to the conclusion of theorem 3.2.

Assumption 4.1 (Monotonicity). There exists λ, µ ≥ 0, h0 > 0 such that if |h| < h0, u ≤ v are

functions in Cb(Gh), and f(t) = eµ(T−t) (a+ b (T − t)) + c for a, b c ≥ 0, then for any r ∈ R,

S(h, xi, tk, r + f(t), [u+ f ]xi,tk

)≥ S

(h, xi, tk, r, [v]xi,tk

)+ b/2− λc in G+

h .

Theorem 3.2 shows that this assumption is satisfied for the Kushner-Dupuis scheme.

Assumption 4.2 (Regularity). For every h and v ∈ Cb(Gh),

(xi, tk) 7→ S(h, xi, tk, v(xi, tk), [v]xi,tk

), (xi, tk) ∈ G+

h

is bounded and continuous, and

r 7→ S(h, xi, tk, v(xi, tk), [v]xi,tk

)is uniformly continuous for bounded r, uniformly in (xi, tk) ∈ G+

h .

As a result of the uniform bounds on a, b and f and equation (3.1), the Kushner-Dupuis schemesatisfies this assumption.


Assumption 4.3 (Subconsistency). There exists a positive function E1 (K,h, ε) such that for anysequence ϕεε>0 of smooth functions satisfying∣∣∣∂β0t Dβϕε

∣∣∣ ≤ Kε1−2β0−|β| in O,

for any β0 ∈ N, β ∈ Nn an n-multiindex, the following inequality holds

S(h, xi, tk, ϕε(xi, tk), [ϕε]xi,tk

)≤ −∂tϕε +H

(xi, tk, Dxϕε(xi, tk), D

2xϕε(xi, tk)

)+ E1 (K,h, ε) ,

for all (xi, tk) ∈ G+h .

Assumption 4.4 (Superconsistency). There exists a positive function E2 (K,h, ε) such that forany sequence ϕεε>0 of smooth functions satisfying∣∣∣∂β0t Dβϕε

∣∣∣ ≤ Kε1−2β0−|β| in O,

for any β0 ∈ N, β ∈ Nn an n-multiindex, the following inequality holds

S(h, xi, tk, ϕε(xi, tk), [ϕε]xi,tk

)≥ −∂tϕε +H

(xi, tk, Dxϕε(xi, tk), D

2xϕε(xi, tk)

)− E2 (K,h, ε) ,

for all (xi, tk) ∈ G+h .

Verification. The reason Barles and Jakobsen introduced these assumptions is that theyobtain the bounds through mollification arguments, and derive independently an upper bound anda lower bound for the error. The following proposition verifies these assumptions are satisfied bythe Kushner-Dupuis scheme.

Proposition 4.1 (Consistency). Under the assumptions stated so far, in particular that a and bare Lipschitz continuous in time, for ϕ ∈ C4(O), and all (xi, tk) ∈ G+

h ,∣∣∣−ϕt(xi, tk) +H(xi, tk, Du(xi, tk), D

2u(xi, tk))− S

(h, xi, tk, ϕ(xi, tk), [ϕ]xi,tk

)∣∣∣≤ C

(∆t‖ϕtt‖∞ + ∆x2‖D4

xϕ‖∞ + ∆x‖D2xϕ‖∞

)+ (1− θ)C∆t

(‖∂tD2

xϕ‖∞ + ‖∂tDxϕ‖∞ + ‖D2xϕ‖∞ + ‖Dxϕ‖∞

). (4.6)

Proof. First of all,

|Lαϕ(xi, tk)− θLαhϕ(xi, tk)− (1− θ)Lαhϕ(xi, tk+1)| ≤ θ |Lαϕ(xi, tk)− Lαhϕ(xi, tk)|+ (1− θ) |Lαϕ(xi, tk)− Lαϕ(xi, tk+1)|+ (1− θ) |Lαϕ(xi, tk+1)− Lαhϕ(xi, tk+1)| . (4.7)

By lemma 2.2, for j = 0, 1,

|Lαϕ(xi, tk+j)− Lαhϕ(xi, tk+j)| ≤ C(∆x2‖D4ϕ‖∞ + ∆x‖Dϕ‖∞

).

From the assumption that a is Lipschitz continuous in time and uniformly bounded,∣∣Tr a(xi, tk, α)D2xϕ(xi, tk)− Tr a(xi, tk+1, α)D2

xϕ(xi, tk+1)∣∣

≤n∑

i,j=1

|aij(xi, tk, α)| |∂ijϕ(xi, tk)− ∂ijϕ(xi, tk+1)|+ |∂ijϕ(xi, tk+1)| |aij(xi, tk)− aij(xi, tk+1)|

≤ C∆t(‖∂tD2

xϕ‖∞ + ‖D2xϕ‖∞

).

Similarly,

|b(xi, tk, α) ·Dxϕ(xi, tk)− b(xi, tk, α) ·Dxϕ(xi, tk+1)| ≤ C∆t (‖∂tDxϕ‖∞ + ‖Dxϕ‖) .

The result follows by adding the source term f(xi, tk, α) then using the fact that |sup(a)− sup(b)| ≤sup(|a− b|).

4. CONVERGENCE RATES FOR THE UNBOUNDED DOMAIN PROBLEM 67

4.3. Convergence rates. After stating the principal result from [4], we will show how tocalculate convergence rates for various cases of the HJB equation and the Kushner-Dupuis scheme.

Theorem 4.2. [4] Under the assumptions stated so far, if the scheme (4.5) admits a uniquesolution uh ∈ Cb(Gh), then for h sufficiently small, the following inequalities hold.

Upper bound: there exists C depending on ‖σ‖1, ‖b‖1, ‖g‖1, ‖f‖1 and µ such that for all(xi, tk) ∈ Gh,

u(xi, tk)− uh(xi, tk) ≤ eµ(T−tk)‖ (g − uh(·, T ))+ ‖∞ + C minε>0

(ε+ E1(‖u‖1, h, ε)) . (4.8)

Lower bound: there exists C depending on ‖σ‖1, ‖b‖1, ‖g‖1, ‖f‖1 and µ such that for all(xi, tk) ∈ Gh,

u− uh ≥ −eµ(T−tk)‖ (g − uh(·, T ))− ‖∞ − C minε>0

(ε1/3 + E2(‖u‖1, h, ε)

). (4.9)

Proposition 4.1 tells us for the Kushner-Dupuis scheme, to find the convergence rate, we shouldtake

E1(K,h, ε) = E2(K,h, ε) = CK(∆tε−3 + ∆x2ε−3 + ∆xε−1

)+ (1− θ)CK∆t

(ε−3 + ε−2 + ε−1 + 1

). (4.10)

To find the rate of convergence, we should minimise ε + E1(K,h, ε) and ε1/3 + E2(K,h, ε) withrespect to ε > 0. An involved approach would be to take derivatives and solve the resultingpolynomial in ε, but a simple argument reveals the best possible bound that can be achieved forthe general problem2.

The argument is the following: consider a special case of the HJB equation and Kushner-Dupuisscheme, namely the case where b = 0 on O × Λ and θ = 1. Then the estimates become

E1(K,h, ε) = CK(∆t+ ∆x2

)ε−3. (4.11)

Choose the norm on R2 to be |h| =√

∆t+ ∆x2. Then the upper bound is minimised by

ε∗ = (3C)1/4 |h|1/2 ,

where C is the constant in (4.11). Then the upper bound involves the term

ε∗ + E(K,h, ε∗) = O(|h|1/2 + |h|2−3/2

)= O

(|h|1/2

).

The lower bound is minimised by

ε∗ = (9C)3/10 |h|3/5 ,and

(ε∗)1/3 + E(K,h, ε∗) = O

(|h|1/5 + |h|2−9/5

)= O

(|h|1/5

).

For the general problem, the best achievable rate can only be worse than or equal to that of thisspecial case. We now check that in fact this rate is achieved for the general problem, by taking ε∗

and ε∗ as above, and using

ε∗ + E (K,h, ε∗) = O(|h|1/2 + |h|2−3/2 + |h|1−1/2 + (1− θ) |h|2

(|h|−3/2 + |h|−1 + |h|−1/2 + 1

))= O

(|h|1/2

);

and

ε∗ + E (K,h, ε∗) = O(|h|1/5 + |h|2−9/5 + |h|1−3/5 + (1− θ) |h|2

(|h|−9/5 + |h|−6/5 + |h|−3/5 + 1

))= O

(|h|1/5

).

This proves the following statement for the Kushner-Dupuis scheme.

2To be precise, this is the best achievable rate given these current estimates.


Proposition 4.3. Suppose h is small enough such that theorem 4.2 holds. If uh ∈ Cb(Gh) solves(4.5) for the Kushner-Dupuis scheme and u is the viscosity solution of the HJB equation, then thereexists C > 0 such that

− ‖ (g − uh(·, T ))− ‖∞ − C |h|1/5 ≤ u− uh ≤ C |h|1/2 + ‖ (g − uh(·, T ))+ ‖∞. (4.12)

5. Numerical experiment

We conclude this chapter with a study of the Kushner-Dupuis scheme applied to the HJBequation described in example 4.1, section 4 of chapter 1. We recall that the HJB equation was

− ut + |ux| − 1 = 0 on (−1, 1)× (0, 1); (5.1a)

u = 0 on −1, 1 × (0, 1) ∪ (−1, 1)× 1 . (5.1b)

The viscosity solution of this equation is the value function

u(x, t) = min (1− |x| , 1− t) .

5.1. Application of the Kushner-Dupuis Scheme. The reader may find the Matlab codeused for the numerical experiments reported here in appendix D.

We choose a spatially equispaced grid Gh ⊂ [−1, 1] × [0, 1]. Using equations (2.1) and (2.12),we find that the θ-method for the Kushner-Dupuis scheme is to solve for k ∈ 0, . . . ,K − 1

−∆+t uh(xi, tk)+

maxα∈−1,1

[θ(−α+∆+

x u(xi, tk) + α−∆−x u(xi, tk))

+ (1− θ)(−α+∆+

x u(xi, tk+1) + α−∆−x u(xi, tk+1))]

= 1,

(5.2)

with uh(−1, tk) = uh(1, tk) = 0 and uh(xi, 1) = 0 as a result of the boundary conditions.

This reduces to

−∆+t uh(xi, tk)+max

[−θ∆+

x uh(xi, tk)− (1− θ)∆+x uh(xi, tk+1), θ∆−x uh(xi, tk) + (1− θ)∆−x uh(xi, tk+1)

]= 1.

For example, if θ = 0, the scheme may be re-written as

uh(xi, tk) = uh(xi, tk+1)− ∆t

∆xmax [uh(xi, tk+1)− uh(xi+1, tk+1), uh(xi, tk+1)− uh(xi−1, tk+1)]+∆t.

For general θ ∈ [0, 1], we may write the scheme as follows. Introduce

L1 =

1 −1 0 . . . 0

0 1 −1. . .

......

. . .. . .

. . ....

.... . . 0 1 −1

0 . . . . . . 0 1

,

and let L−1 =(L1)T

. Define then Aα = I+θ∆t/∆xLα, for α = −1, 1. For each k ∈ 0, . . . ,K − 1,the scheme consists of solving

maxα∈−1,1

[Aαuh(·, tk)− dαk ] = 0, (5.3)

where

dαk = (I − θ∆t/∆xLα)u(·, tk+1) + ∆t. (5.4)

Theorem 3.2 shows that the scheme is monotone provided

(1− θ) ∆t

∆x≤ 1.

5. NUMERICAL EXPERIMENT 69

Edd θ = 0 θ = 1/2 θ = 15 0.067 0.096 0.1186 0.049 0.069 0.0857 0.035 0.049 0.0618 0.025 0.035 0.0439 0.018 0.025 0.03010 0.012 0.018 0.022

Table 1. Absolute errors in the maximum norm for the Kushner-Dupuis scheme,in terms of the number of degrees of freedom

∣∣G+h

∣∣ = (2d+ 1)2. The approximationsare accurate only to one or two digits.

d

log(Ed)log(2)

5 6 7 8 9 10

-7

-6

-5

-4

-3θ = 1/2θ = 1

θ = 0

Figure 1. The error of the approximation Ed of the Kushner-Dupuis scheme in themaximum norm as a function of grid size

∣∣G+h

∣∣ = (2d + 1)2, on a logarithmic scale,for a fully implicit scheme, a semi-implicit scheme and an explicit scheme. The errordecays as (∆t+ ∆x)1/2.

5.2. Error rates. Table 1 gives the absolute errors in the discrete maximum norm

Ed = maxxi|u(xi, 0)− uh(xi, 0)| , (5.5)

as a function of mesh size for∣∣G+

h

∣∣ = (2d + 1)2, d = 5, . . . , 10, for θ = 0, 1/2, 1. This was computed

for a grid G+h with 2d+ 1 spatial points and 2d+ 1 time-steps. Figure 1 shows that the convergence

rate is h1/2 = (∆t+ ∆x)1/2. The explicit scheme gave the best approximations, but only by aproportionality constant.

5.3. Semi-smooth Newton method. The nonlinear equation (5.3) was solved with thesemi-smooth Newton method described in section 3, chapter 4.

For illustration, the matrix G(x) used in the algorithm might have the form

G(x) =

1 + a −a 0 . . . . . . 0

−a 1 + a 0. . .

. . ....

.... . .

. . .. . .

. . ....

.... . . 0 1 + a −a

......

. . .. . . −a 1 + a 0

0 . . . . . . . . . 0 1 + a

,

with a = θ∆t/∆x. In fact, G(x) is always tridiagonal, nonsingular and diagonally dominant. Thusone can solve efficiently the equation for the Newton iterates,

G(ym) (ym+1 − ym) = −Fk(ym),

whereFk(x) = max

α∈−1,1[Aαx− dαk ] , x ∈ Rn.


d

log(Cd)log(4)

5 6 7 8 9 10

1

2

3

4

5θ = 1/2θ = 1

Figure 2. The total number of semi-smooth Newton iterations Cd required as afunction of the total number of degrees of freedom

∣∣G+h

∣∣ = (2d+1)2 on a logarithmic

scale. The number of iterations required grows proportional to K = 2d + 1 thenumber of time-steps used.

Furthermore, theorem 2.1 of appendix B shows that for any x ∈ Rn, G(x) is a non-singular M-matrix. Theorem 2.6 of section 2, chapter 4, implies that the scheme admits a unique solutionuh.

It is advantageous to use the uh(·, tk+1) as the first guess for the semi-smooth Newton methodto find uh(·, tk). This is because it may be expected that uh(·, tk+1) is already close to uh(·, tk),and thus the iterates would converge rapidly.

This is confirmed in numerical experiments, as illustrated in figure 2. The number of degrees offreedom of the grid was taken to be

∣∣G+h

∣∣ = (2d+1)2, d = 5, 6, . . . , 10, where∣∣G+

h

∣∣ is the cardinality

of G+h , corresponding to Nh = 2d + 1 spatial mesh points and K = 2d + 1 time mesh points. The

total number of semi-smooth Newton iterations required to find uh over the grid Gh was recordedas Cd. Convergence was determined by a residual error of less than 10−10. On average, for bothfully implicit and semi-implicit schemes, one or two Newton iterations per time-step are sufficientfor convergence.

5.4. Conclusion. Overall, the accuracy of the Kushner-Dupuis schemes used for these com-putations are low, as demonstrated in table 1. The convergence rate shown in figure 1 is foundto agree with the upper bound in inequalities (4.12), even though this bound was proven for theunbounded domain problem.

Although not detailed above, it was observed that the error in the maximum norm of u(·, 0)−uh(·, 0) was consistently achieved at the node xi = 0, i.e. the approximations uh are least accuratenear the point where u is not differentiable.

For this simple problem, it is known a-priori that the semi-smooth Newton method would beglobally convergent and locally superlinearly convergent. Figure 2 shows that it can be very effectiveat solving the nonlinear discrete HJB equations. Further testing not reported above indicates thatin the case of the fully implicit method, which is unconditionally monotone, the number of iterationsrequired per time-step is approximately proportional to ∆t/∆x.

CHAPTER 7

Finite Element Methods

1. Introduction

This chapter presents the main original work undertaken during this project. These findingswere achieved in collaboration with Dr. Max Jensen. We propose a monotone finite element methodto solve a class of elliptic and parabolic HJB equations and we prove convergence to the viscositysolution under the usual assumptions.

The principal results of this chapter are novel and up to an occasional exception, all the proofs ofthe auxiliary results given in this chapter were found independently, since they were unavailable inour sources. In fact some of these supporting results are of independent interest, see e.g. proposition8.5.

It would be beyond the scope of this work to introduce finite element methods and Sobolevspaces; thus it is assumed that the reader is familiar with the basics of these topics. Nevertheless,nothing more than what may be found in graduate textbooks such as [1], [8], [13] or [15] is used.

Generally, finite element methods do not lead to monotone discretisations. However, it is knownin the literature from [9] that monotonicity can be achieved by combining strictly acute mesheswith the addition of a small perturbation to the differential operator. This is called the method ofartificial diffusion.

In comparing finite element methods with, for instance, the finite difference methods discussedin chapter 6, we observe that finite element methods can treat problems set on geometricallycomplicated domains without additional difficulties.

Yet there is an aspect of finite element methods which is the reason for much of the preparatorywork done in certain sections of this chapter. This is the fact that for typical meshes, one cannotexpect the discretisation of the weak form of a differential operator, when applied to the interpolantof a smooth function, to be consistent with the strong form in a specific sense (see lemma 6.1). Theopposite is true for finite difference methods, and it could be argued that the discretisation of theoperator is designed to provide this form of consistency.

The end result of this problem is that the proofs of convergence of the method is different tothe usual Barles-Souganidis convergence proof in that it requires the use of various a-priori errorestimates of finite element methods for linear problems to obtain projections onto the approximationspace with the right properties. These estimates are gathered and quoted in appendix C.

A treatment of elliptic HJB equations is included for a few reasons. Firstly, the elliptic casemakes for a more accessible read, as it involves fewer and less involved auxiliary results. Secondly,the auxiliary results needed to treat the parabolic problem make use of other supporting resultsused for the elliptic setting. Thus the parabolic problem is treated as an extension of the ellipticproblem.

This chapter is structured differently to the previous ones. First of all, we give some definitionsand set the notation and terminology used throughout this chapter. This is included to be usedin conjunction with appendix C. In section 3, the HJB equations to be treated in this chapter aredetailed. Section 4 explains the implementation of the method of artificial diffusion.

Then in section 5 the numerical schemes are described and the main results of this chapter areindicated. Finally, sections 6 and 7 provide the proofs for the elliptic problem and sections 7 and8 give proofs for the parabolic problem.

71

72 7. FINITE ELEMENT METHODS

2. Basics of finite element methods

Let U be a non-empty, open, bounded, polyhedral subset of Rn, for n ∈ 1, 2, 3. In thisparagraph we give some basic definitions that will make precise the finite element method presentedin this chapter.

2.1. Meshes.

Definition 2.1. [8]. A subdivision of U is a finite collection of closed bounded non-empty setsTi with piecewise smooth boundary, such that

(1)Ti ∩

Tj = ∅ if i 6= j,

(2)⋃i Ti = U .

A triangulation of U is a subdivision of U consisting respectively of intervals, triangles or tetrahedra,respectively in one, two or three dimensions, i.e. n-simplices, with the property that

(3) no vertex of any simplex lies in the interior of a face of another simplex.

A triangulation T = Ti of U is also called a mesh.

The mesh size of a mesh T is defined as

h = maxT∈T

diamT. (2.1)

A mesh T with mesh size h is denoted T h. To justify convergence of the method, it is helpful toconsider a family of meshes

T h

0<h≤1on the domain U .

The properties of finite element methods usually depend on the geometry of the collection ofmeshes

T h

0<h≤1, so we introduce some terminology to describe the meshes.

Definition 2.2 (Chunkiness parameter). Suppose K is a non-empty, closed, bounded subset of Rn,star shaped with respect to a Euclidian ball B. Let

ρmax = sup ρ |K is star shaped with respect to a Euclidian ball of radius ρ .The chunkiness parameter of K is

γK =diamK

ρmax.

Definition 2.3. [8] and [13]. A family of meshesT h

0<h≤1is said to be

• non-degenerate [8], also called shape-regular [13], if there exists ρ > 0 such that forevery h ∈ (0, 1], T ∈ T h,

diamBT ≥ ρdiamT,

where BT is the ball of largest radius in T such that T is star-shaped with respect to BT ;• quasi-uniform if there exists ρ > 0 such that for all h ∈ (0, 1],

minT∈T h

diamBT ≥ ρh;

• uniformly strictly acute if there exists θ ∈ (0, π/2) such that for every h ∈ (0, 1] and

T ∈ T h, if ein+1i=1 is a complete set of unit vectors co-linear with the heights of the simplex

T with orientation from face to vertex, then

maxi 6=j

ei · ej ≤ − sin θ. (2.2)

We call θ the acuteness constant ofT h

0<h≤1.

As stated in [8, p. 108], quasi-uniform families of meshes are non-degenerate and non-degeneracyis equivalent to the chunkiness parameter γT being uniformly bounded from below for all h ∈ (0, 1],T ∈ T h.

It will be seen that strict acuteness of a mesh T is related to properties of the Laplacian of aset of basis elements of the finite element space of piecewise linear functions on T , and is key tothe method of artificial diffusion.

2. BASICS OF FINITE ELEMENT METHODS 73

Assumption 2.1. Henceforth,T h

0<h≤1is assumed to be a quasi-uniform, uniformly strictly

acute family of meshes.

2.2. Finite elements. This paragraph makes precise the construction of the approximationspace to be used. We quote the definitions given in [8].

Definition 2.4 (Finite element). [8, p. 69]. Let

(1) K ⊂ Rn be a bounded closed set with non-empty interior and piecewise smooth boundary.K is called the element domain,

(2) P be a finite dimensional space of functions on K,(3) N = N1, · · · , Nk be a basis for P∗ the dual space of P.

The ordered triplet (K,P,N ) is called a finite element.

The notion of affine equivalence of finite elements is needed because it is a property requiredin a certain results quoted from [8] and [13] that will be used later on, and we wish to show it is aproperty satisfied for the finite element spaces used in this work.

Definition 2.5 (Affine equivalence of finite elements). [8, p. 82]. Let (K,P,N ) be a finite element

and let F (x) = Ax + b be an affine map, A non-singular. The finite element (K, P, N ) is affineequivalent to (K,P,N ) if1

(1) F (K) = K,

(2) F ∗(P) = P,

(3) F∗ (N ) = N .

Let (K,P1(K),N (K)) be a finite element, called reference element, with K a n-simplex, P =P1(K) the space of polynomials of total degree 1 on K and N = N1, . . . , Nn+1 the set of basis

dual elements Ni : p 7→ p(xi), with xin+1i=1 the vertices of K.

Then for any other non-empty, closed, bounded n-simplex T , the finite element

(T,P1(T ),N (T ))

where N consists of evaluation at the vertices of T , is affine equivalent to (K,P1(K),N (K)). Thisis because first order polynomials remain first order polynomials under composition with affinemaps, and affine maps map vertices of n-simplices to vertices.

This shows that for each T ∈ T h, T h a mesh on U , h ∈ (0, 1], there is a unique finite element(T,P,N ) which is affine equivalent to (K,P1(K),N (K)), and in particular P = P1(T ) and N isthe set of dual basis elements that evaluate elements of P at the vertices of T .

Definition 2.6 (Finite element approximation space). Let T h be a mesh on U . The approximationspace Vh of C(U) Lagrange piecewise linear finite elements on T h is defined by

Vh =v ∈ C

(U)| v|T ∈ P1(T ) for all T ∈ T h

. (2.3)

We note it follows from the above setting that Vh ⊂W 1,∞(U).

The trace operator is denoted γ∂U : W 1,p(U) 7→ Lp (∂U). The reader may find [1] or [15] tobe helpful references for the definition of traces of functions in Sobolev spaces. Furthermore, it ishelpful to introduce the test space Vh,0

Vh,0 = Vh ∩H10 (U). (2.4)

A function v ∈ Vh belongs to Vh,0 if and only if γ∂U (v) = 0.

Definition 2.7 (Interpolant). Let T h be a mesh on U . The interpolant Ih : C(U) 7→ Vh of Vh isdefined by

Ihv∣∣∣T

=n+1∑i=1

Ni(v)ϕi for each T ∈ T h, (2.5)

where ϕi ⊂ P1(T ) forms a dual basis to N (T ).

1Recall that for p ∈ P, the pull-back is F ∗p = p F , and for N ∈ N , the push-forward is defined by F∗N(p) =N (F ∗p).


It is the choice that N (T ) should consist of evaluation at the vertices of T , together with thefact that the mesh T h forms a triangulation of U that ensures well-definedness and continuity ofthe interpolant.

For piecewise linear finite elements, the interpolant may be written as

Ihv =N∑i=1

v(xi)vi.

The boundary interpolant Ih∂U : C (∂U) 7→ C (∂U) is defined by

Ih∂Uv =N∑

i=Nh+1

v(xi) vi|∂U .

2.3. Further notation. We now introduce some additional notation for the following sections.

LetT h

0<h≤1be a quasi-uniform, uniformly strictly acute family of meshes. For h ∈ (0, 1],

abusing notation, let xiNi=1 ⊂ U be the set of all vertices of the elements T ∈ T h. The elements

of xiNi=1 are called nodes of the mesh.

For each h ∈ (0, 1], again with an abuse of notation, there exists a unique set viNi=1 ⊂ Vhwhich satisfies

vi(xj) = δij ,

with δij the Kronecker delta. Furthermore, viNi=1 is a basis for Vh. Assume without loss of

generality that for some Nh < N , viNhi=1 is a basis for Vh,0.

So far in this work, the gradient of a function v was denoted Dv, in accordance with [15].However to emphasise that for v ∈ Vh, Dv is piecewise constant, we write ∇v and omit theargument, even when evaluating the gradient. The reason for this is that the piecewise constantproperty is used repeatedly in the discussion that will follow, and this notation should help to serveas a reminder for it.

For each element T ∈ T h, there are precisely n + 1 basis functions vi such that T ⊂ supp vi.Furthermore, if T ⊂ supp vi, then ∇vi|T is colinear with the height in T from the vertex xi, in thedirection of face to vertex. In addition, |∇vi|T | is the inverse of the length of the height from xi.The strict acuteness property thus implies that

∇vi|T · ∇vj |T ≤ − sin θ |∇vi| |∇vj | , i 6= j,

whenever T ⊂ supp vi ∩ supp vj .

For T ∈ T h, define the minimal ratio of diameter to height

σT = hT min |∇vi|T | |T ⊂ supp vi .

The L1 normalised basis functions viNi=1 are defined byvi

‖vi‖L1(U). (2.6)

Remark 2.8. In this chapter, we will sometimes need to make explicit reference to a sequence ofmeshes used to obtain discrete solutions that should converge to the viscosity solution of the problem,and will abuse notation in so doing. For a sequence hn∞n=1 ⊂ (0, 1], hn → 0; a correspondingsequence of meshes will be denoted T n∞n=1. The mesh size of T n is then denoted hn, but shouldnot be confused with hT the diameter of T ∈ T . The approximation space associated to the meshT n is denoted Vn rather than Vhn. Similar abuses are made for other related symbols.

3. Hamilton-Jacobi-Bellman equations

Elliptic Hamilton-Jacobi-Bellman equation. Given g : ∂U 7→ R, the first problem considered is

supα∈Λ

[Lαu− fα] = 0 on U ; (3.1a)

u = g on ∂U. (3.1b)

3. HAMILTON-JACOBI-BELLMAN EQUATIONS 75

Parabolic Hamilton-Jacobi-Bellman equation. Given uT : U 7→ R, the second problem consid-ered is

−ut + supα∈Λ

[Lαu− fα] = 0 on O = U × (0, T ); (3.2a)

u(·, T ) = uT on U ; (3.2b)

u = 0 on ∂U × (0, T ). (3.2c)

Assumption 3.1 (Compactness and continuity). For both problems (3.1) and (3.2), it is assumedthat for each α ∈ Λ, the linear elliptic operator Lα is defined by

Lαu(x) = −aα∆u(x) + bα(x) ·Du(x) + cα(x)u(x).

The set Λ is assumed to be compact metric space and there exists γ > 0 such that the map

Λ 7→ R× C0,γ(U,Rn)× C0,γ(U)× C0,γ(U),

α 7→ (aα, bα, cα, fα)(3.3)

is continuous.

Assumption 3.2 (Data). We assume that U is such that the conditions of theorem 1.7 of appendixC hold when applied to the Poisson problem.

Furthermore, we assume that for all α ∈ Λ, aα > 0, cα(x) ≥ 0 on U and that there exists c0 > 0such that for every α ∈ Λ, the bilinear form 〈Lα·, ·〉 : H1

0 (U)×H10 (U) 7→ R is coercive with constant

c0, c0 independent of α,

c0‖v‖2H1(U) ≤ 〈Lαv, v〉.

Assume that for every α ∈ Λ, fα ≥ 0 on U .

For problem (3.1), assume that g ∈ C0,1 (∂U), g ≥ 0 and that g has a lifting into C(U)∩H1(U),

i.e. there is ug ∈ C(U)∩H1(U) such that ug|∂U = g.

For problem (3.2), assume that uT ∈ C(U), uT ≥ 0 and for consistency, uT = 0 on ∂U .

Assumption 3.3 (Viscosity solutions). It is assumed for both problems that the strong comparisonproperty holds for upper-semicontinuous viscosity subsolutions and lower semi-continuous viscositysupersolutions of (3.1a) and that there exists a viscosity solution u that assumes the boundary datacontinuously.

In other words, for problem (3.1), if v ∈ USC(U)

and w ∈ LSC(U)

are respectively a viscositysubsolution and a viscosity supersolution, then

supU

[v − w] = sup∂U

[v − w] ;

and there exists a viscosity solution u ∈ C(U)

of (3.1) that satisfies (3.1b) pointwise.

For problem (3.2), if v ∈ USC(O)

and w ∈ LSC(O)

are respectively a viscosity subsolutionand a viscosity supersolution, then

supU×(0,T ]

[v − w] = sup∂O

[v − w] ,

where ∂O = U × T ∪ ∂U × (0, T ] is the parabolic boundary of O. Assume there is a viscositysolution u ∈ C

(O)

of (3.2) that satisfies (3.2b) and (3.2c) pointwise.

It can be seen from the proof of the elliptic comparison property found in [12] that if in additionto the above assumptions, cα > 0 on U , then the strong comparison property for problem (3.1) willhold.


4. The method of artificial diffusion

This section reviews relevant parts of the work found in [9] and will show how to construct themethod of artificial diffusion. The analysis of the properties of this method is found in section 6.

However, briefly said, artificial diffusion leads to a discrete maximum principle and will enablea strategy for the convergence argument similar to the Barles-Souganidis argument, which waspresented in section 3 of chapter 6.

For a family of uniformly strictly acute meshes, the method of artificial diffusion consists ofintroducing some artificial diffusion term εh ≥ 0 chosen sufficiently large, such that the operators

Lαhu = −εh∆u+ Lαu

will satisfy a monotonicity property. The operators Lαh will then be used as part of the discretescheme. The scheme will be consistent in the limit, since it will be seen that the artificial diffusionterm εh will tend to 0 as h tends to 0.

4.1. Artificial diffusion. For a compact set K ⊂ U and α ∈ Λ, define the norm ‖b‖∞,2,K onb : α 7→ bα ∈ C(K,Rn) by

‖b‖∞,2,K = supα∈Λ

(n∑i=1

‖bαi ‖2C(K)

) 12

. (4.1)

The hypotheses that Λ is compact and that b : Λ 7→ C(U,Rn

)is continuous imply that ‖b‖∞,2,K <

∞. Leta = inf

α∈Λaα.

ForT h

0<h≤1a uniformly strictly acute family of meshes with acuteness constant θ, and T ∈ T h,

h ∈ (0, 1], choose cT ∈ R such that

cT > max

(1

(n+ 1)σT sin θ− a

‖b‖∞,2,ThT, 0

). (4.2)

Remark 4.1. For a non-degenerate, uniformly strictly acute family of meshes, under the currenthypotheses, there exists C ≥ 0 such that for every h ∈ (0, 1] and T ∈ T h,

max

(1

(n+ 1)σT sin θ− a

‖b‖∞,2,ThT, 0

)≤ C.

For η0 > 0 chosen, h ∈ (0, 1] and T ∈ T h, let η1 be the element-wise constant function definedby

η1|T

=

cT ‖b‖∞,2,ThT if ‖b‖∞,2,T > 0;

η0hT if ‖b‖∞,2,T = 0.

Define η2 the element-wise constant function by

η2|T

=h2T

(n+ 1)σT sin θsupα∈Λ‖cα‖C(T ).

We may leave η1 and η2 undefined on ∂T for T ∈ T h.

Proposition 4.2. LetT h

0<h≤1be a uniformly strictly acute family of meshes. For every h ∈

(0, 1] there exists εh ∈ R and C ≥ 0 independent of h, such that

εh ≥ (η1 + η2)|T

for all T ∈ T h, (4.3)

andεh ≤ Ch. (4.4)

Proof. From the assumption that Λ is compact and that α 7→ (aα, bα, cα) is continuous, thereexists bound, uniform in α, on ‖bα‖∞,2,U and ‖cα‖C(U), and as remarked previously, there exists

C ≥ 0 such that for every h ∈ (0, 1], T ∈ T h, we may choose cT ≤ C. Furthermore η0 may bechosen independently of h and T ∈ T h.

5. NUMERICAL SCHEME 77

So using the fact that h ≤ 1, there exists C ≥ 0 such that for all T ∈ T h,

(η1 + η2)|T≤ Ch, (4.5)

C independent of h. We may therefore set εh = Ch where C is the constant in inequality (4.5).

Remark 4.3. In practice, a more sophisticated choice of εh than that given in the proof of proposi-tion 4.2 may be desirable, as it is known that too much artificial diffusion can reduce the quality ofthe approximation for certain problems. More details on this effect may be found in the numericalexperiments presented in [9].

5. Numerical scheme

For a quasi-uniform, uniformly strictly acute family of meshesT h

0<h≤1, let εh ∈ R satisfy

inequalities (4.3) and (4.4). The operators Lαh are defined by

Lαhu = −εh∆u+ Lαu. (5.1)

The plan for the remainder of the chapter is the following. First the schemes are presented and theprincipal results of this chapter are given. In the following section, a number of auxiliary resultsare demonstrated. These will be sufficient to analyse the scheme for the elliptic problem, as is donein section 7. Further auxiliary results are required for the analysis of the scheme for the parabolicproblem, and this will be done in sections 8 and 9.

5.1. Elliptic HJB equation. The scheme for solving (3.1) proposed is to find uh ∈ Vh suchthat

supα∈Λ

[〈Lαhuh, vi〉 − (fα, vi)] = 0, i ∈ 1, . . . , Nh ; (5.2a)

γ∂U (uh) = Ih∂Ug. (5.2b)

The main results of this chapter for this scheme are the following. The first two results holdunder assumptions 2.1, 3.1 and 3.2.

Theorem 5.1. For each h ∈ (0, 1] there exists a unique solution uh ∈ Vh to (5.2) and uh ≥ 0 onU .

In the case of homogeneous boundary data g ≡ 0, there exists C ≥ 0 independent of h such that

‖uh‖H1(U) ≤ C infα∈Λ‖fα‖L2(U). (5.3)

and for every x ∈ ∂U ,lim infy→xh→0

uh(y) ≥ 0. (5.4)

Proposition 5.2. If there exists w ∈ H2(U) ∩W 1,∞(U) such that for some α ∈ Λ,

Lαw ≥ fα a.e. on U (5.5a)

γ∂U (w) = g; (5.5b)

then there is C ≥ 0 such that for every h ∈ (0, 1], the solution uh of (5.2) satisfies

‖uh‖L∞(U) ≤ C, (5.6)

and furthermore, for all x ∈ ∂U ,lim supy→xh→0

uh(y) ≤ g(x). (5.7)

The following holds under the above assumptions and assumption 3.3

Theorem 5.3. Suppose that uh remains bounded in L∞(U) as h → 0 and that uh → g near theboundary, i.e. suppose that for all x ∈ ∂U ,

limy→xh→0

uh(y) = g(x).

Then uh converges uniformly on U to u the unique viscosity solution of equation (3.1).


5.2. Parabolic HJB equations. For simplicity, we describe the fully implicit backward Eulerscheme. For each h ∈ (0, 1], let ∆th > 0 be the time-step used in conjunction with the mesh T h,with ∆th → 0 as h→ 0. Let ∆th be such that T/∆th is an integer and let

Sh =

sk = k∆th | k = 0, . . . ,

T

∆th

,

and

S+h =

sk = k∆th | k = 0, . . . ,

T

∆th− 1

.

Let ∆ht : C

(O)7→ C

(U × [0, T −∆th]

)be the difference operator

∆htw (·, t) =

1

∆th(w (·, t+ ∆th)− w (·, t)) .

The scheme is to find uh(·, sk) ∈ Vh,0 for each sk ∈ S+h such that

−∆ht uh(xi, sk) + sup

α∈Λ[〈Lαhuh(·, sk), vi〉 − (fα, vi)] = 0 for all i ∈ 1, . . . , Nh ; (5.8a)

uh(·, T ) = IhuT . (5.8b)

The following two results hold under assumptions 2.1, 3.1 and 3.2.

Theorem 5.4. For each h ∈ (0, 1], there exists a unique uh solving (5.8), uh ≥ 0 on U × Sh, andfor all sk ∈ Sh

‖uh(·, sk)‖L∞(U) ≤ ‖uT ‖C(U) + T supα∈Λ‖fα‖C(U). (5.9)

Furthermore, for all x ∈ U ,

lim(y,s)→(x,T )

h→0

uh(y, s) = uT (x). (5.10)

Proposition 5.5. If there exists w ∈ C1([0, T ];C2

(U))

such that for some α ∈ Λ

−wt + Lαw ≥ fα on O; (5.11a)

w(·, T ) = uT on U ; (5.11b)

w = 0 on ∂U × (0, T ); (5.11c)

then for all x ∈ ∂U , t ∈ (0, T )

lim(y,s)→(x,t)

h→0

uh(y, s) = 0. (5.12)

The following holds under the above assumptions and assumption 3.3.

Theorem 5.6. If (5.12) holds, then uh converges uniformly on compact subsets of U × (0, T ] to uthe unique viscosity solution of equation (3.2).

6. Supporting results

6.1. Consistency and convergence properties. We prove some properties of Lαh under theabove assumptions.

Lemma 6.1 (Consistency of elliptic projections). Let w ∈ C2(U). Then there exists a unique

Lhw ∈ Vh the elliptic projection of w, that solves

〈−∆Lhw, v〉 = (−∆w, v) for all v ∈ Vh,0, (6.1a)

γ∂U

(Lhw

)= Ih∂Uw. (6.1b)

There exists C ≥ 0 and h0 > 0 such that for all h < h0,

‖w − Lhw‖W 1,∞(U) ≤ Ch‖w‖C2(U). (6.2)

6. SUPPORTING RESULTS 79

Furthermore, let hn ⊂ (0, 1] be a sequence tending to 0, and xn ⊂ U be a sequence of pointsconverging to x ∈ U , with xn a node of T n and vn ∈ Vn its associated basis function. Then forevery α ∈ Λ,

limn→∞

〈LαnLnw, vn〉 = Lαw(x) uniformly over Λ. (6.3)

As a consequence,

limn→∞

supα∈Λ

[〈LαnLnw, vn〉 − (fα, vn)] = supα∈Λ

[Lαw(x)− fα(x)] . (6.4)

Proof. Existence and uniqueness follows from the Lax-Milgram lemma and from coercivity of−∆ on Vh,0 ⊂ H1

0 (U) - see [8]. The error estimate (6.2) is from theorem 1.7 of appendix C, whereit is assumed in assumption 3.2 that it holds.

We now show (6.3). Let α ∈ Λ, then from the definition of Lhw,

〈LαnLnw, vn〉 = (aα + εn) 〈−∆Lnw, vn〉+ (bα ·DLnw + cαLnw, vn)

= (aα + εn) (−∆w, vn) + (bα ·DLnw + cαLnw, vn) .

So by Holder’s inequality, assumption 3.2 and inequality (4.4),

|〈LαnLnw, vn〉 − (Lαw, vn)| ≤ εn‖∆w‖L∞(U)‖vn‖L1(U) + |(bα ·D (w − Lnw) + cα (w − Lnw) , vn)|≤ Ch‖w‖C2(U) + C‖w − Lnw‖W 1,∞(U),

where C is independent of α. For n sufficiently large, i.e. hn < h0 as in (6.2), we therefore have

|〈LαnLnw, vn〉 − (Lαw, vn)| ≤ Ch‖w‖C2(U). (6.5)

From the assumption that the coefficients of Lα are in Holder spaces and assumption 3.1, Lαw isuniformly continuous on U , uniformly in α. Therefore for every ε > 0, there is δ > 0 such that forall y ∈ B (x, δ) ∩ U and all α ∈ Λ.

|Lαw(x)− Lαw(y)| ≤ ε.For n sufficiently large, by convergence of xn, |x− xn| + hn < δ, thus supp vn ⊂ B (x, δ). Sincevn ≥ 0 and ‖vn‖L1(U) = 1, this implies

|Lαw(x)− (Lαw, vn)| ≤ ε.Therefore (Lαw, vn)→ Lαw(x) as n→∞. This fact and (6.5) imply equation (6.3).

Since∣∣∣∣supα∈Λ

[Lαw(x)− fαw(x)]− supα∈Λ

[〈LαnLnw, vn〉 − (fα, vn)]

∣∣∣∣ ≤ supα∈Λ

[|Lαw(x)− 〈LαnLnw, vn〉|+ |fα(x)− (fα, vn)|] ,

Assumption 3.1 implies that fα is uniformly continuous, uniformly over Λ which, in conjunctionwith the fact that (6.3) holds uniformly over Λ, implies that

limn→∞

supα∈Λ

[|Lαw(x)− 〈LαnLnw, vn〉|+ |fα(x)− (fα, vn)|] = 0,

thus giving (6.4).

Proposition 6.2 (Uniform convergence of finite element projections). LetT h

0<h≤1be a uni-

formly strictly acute, quasi-uniform family of meshes on U ⊂ Rn, n ∈ 1, 2, 3, and let w ∈H2(U) ∩W 1,∞(U). Then for every α ∈ Λ, there exists a unique Pαh w ∈ Vh solving

〈LαhPαh w, v〉 = 〈Lαhw, v〉 for all v ∈ Vh,0, (6.6a)

γ∂U (Pαh w) = Ih∂Uw; (6.6b)

and a unique Qαhw ∈ Vh solving

〈LαhQαhw, v〉 = (Lαw, v) for all v ∈ Vh,0, (6.7a)

γ∂U (Qαhw) = Ih∂Uw. (6.7b)

Furthermore there exists C ≥ 0 independent of h ∈ (0, 1] and α ∈ Λ such that

‖w − Pαh w‖L∞(U) ≤ Ch |w|W 1,∞(U) + Cd(n, h)h |w|H2(U) , (6.8)


and‖w −Qαhw‖L∞(U) ≤ Ch |w|W 1,∞(U) + Cd(n, h)h‖w‖H2(U); (6.9)

where

d(n, h) =

1 n = 1;

1 + |log h| n = 2;

h−1/2 n = 3.

As a result,

limh→0‖w − Pαh w‖L∞(U) + ‖w −Qαhw‖L∞(U) = 0 uniformly over Λ. (6.10)

Proof. Since w ∈W 1,∞(U), proposition 1.1 of appendix C implies that there is a representa-tive of w in C

(U). Therefore γ∂U (w) has a lifting in C

(U)∩H1(U).

Since εh ≥ 0 and since Lα are coercive, uniformly over Λ by assumption 3.2, proposition 1.3 ofappendix C implies that equations (6.6) and (6.7) respectively admit unique solution Pαh w, Q

αhw ∈

Vh. Since Ihw is well defined, proposition 1.3 also shows that

‖w − Pαh w‖H1(U) ≤(

1 +‖〈Lαh ·, ·〉‖

c0

)‖w − Ihw‖H1(U).

Lemma 1.6 of appendix C shows that

‖w −Qαhw‖H1(U) ≤(

1 +‖〈Lαh ·, ·〉‖

c0

)‖w − Ihw‖H1(U) +

1

c0supv∈Vh,0

∣∣(εhDIhw,Dv)∣∣‖v‖H1(U)

. (6.11)

Now, using Holder’s inequality and the Cauchy-Schwarz inequality, then using inequality (4.4) ofproposition 4.2, there is C ≥ 0 independent of h and α such that∣∣∣∣∣∣εh

∫U

DIhw(x) ·Dv(x)dx

∣∣∣∣∣∣ ≤ εh‖DIhw‖L2(U)‖Dv‖L2(U)

≤ Ch‖Ihw‖H1(U)‖v‖H1(U).

(6.12)

Proposition 1.2 of appendix C and the triangle inequality then imply that

supv∈Vh,0

∣∣(εhDIhw,Dv)∣∣‖v‖H1(U)

≤ Ch‖w‖H2(U).

Assumption 3.1 and inequality (4.4) imply that there exist C ≥ 0 such that for all α ∈ Λ,

‖〈Lαh ·, ·〉‖ := ‖〈Lαh ·, ·〉‖H1(U)×H1(U) ≤ C.

Hence, from the uniform coercivity of 〈Lα·, ·〉 over H10 (U) and (6.11), there exists C ≥ 0 independent

of h and α such that

‖w −Qαhw‖H1(U) ≤ C‖w − Ihw‖H1(U) + Ch‖w‖H2(U). (6.13)

From the quasi-uniformity of the meshes, using twice the error bound for the interpolant,proposition 1.2 of appendix C, for the cases p =∞, m = 1 and p = 2, m = 2; and by the discretePoincare inequality for Vh, proposition 1.4 of appendix C,

‖w − Pαh w‖L∞(U) ≤ ‖w − Ihw‖L∞(U) + ‖Ihw − Pαh w‖L∞(U)

≤ Ch |w|W 1,∞(U) + Cd(h, n)‖Ihw − Pαh w‖H1(U)

≤ Ch |w|W 1,∞(U) + Cd(h, n)(‖Ihw − w‖H1(U) + ‖w − Pαh w‖H1(U)

)≤ Ch |w|W 1,∞(U) + Cd(h, n)‖Ihw − w‖H1(U)

≤ Ch |w|W 1,∞(U) + Cd(h, n)h |w|H2(U) .

where the before-last inequality was found using (6.13). Similarly,

‖w −Qαhw‖L∞(U) ≤ Ch |w|W 1,∞(U) + Cd(h, n)(‖w − Ihw‖H1(U) + h‖w‖H2(U)

)≤ Ch |w|W 1,∞(U) + Cd(h, n)h‖w‖H2(U).

6. SUPPORTING RESULTS 81

Convergence of Pαh w and Qαhw to w follows from the fact that d(n, h)h → 0 as h → 0 for n =1, 2, 3.

6.2. Monotonicity properties. The first step towards obtaining a monotone scheme foruniformly strictly acute meshes is the following lemma. We quote [9], with a few additional details.

Lemma 6.3. Let T h be a strictly acute mesh, with acuteness constant θ. Suppose that v ∈ Vh islocally minimal at an interior node xi, i ∈ 1, . . . , Nh. Then for every T ⊂ supp vi, if we callωT = angle (∇v|T ,∇vi), we have

cosωT ≤ − sin θ. (6.14)

Proof. Without loss of generality, by relabelling, we may suppose that the vertices of T ∈ T h arexin+1

i=1 and that the minimum of v is attained at x1. For shorthand, let us write ∇v|T simply as ∇v.Define

Gij =∇vi|∇vi|

· ∇vj|∇vj |

i, j ∈ 1, . . . , n+ 1

and

δi = (v(xi)− v(x1)) |∇vi| i ∈ 1, . . . , n+ 1Define the components of ∇v parallel to ∇v1 by

∇v‖ =∇v · ∇v1

|∇v1|2∇v1;

and orthogonal to e1 by

∇v⊥ = ∇v −∇v‖.We have

v − v(x1)|T =

n+1∑i=2

(v(xi)− v(x1)) vi,

so

∇v =

n+1∑i=2

δi∇vi|∇vi|

;

from which we find that

∇v‖ =

(n+1∑i=2

δiG1i

)∇v1

|∇v1|,

and

∇v⊥ =

n+1∑i=2

δi

(∇vi|∇vi|

−G1i∇v1

|∇v1|

).

Since Gij ≤ − sin θ < 0 for i 6= j, and δi ≥ 0,∣∣∇v‖∣∣2 =

n+1∑i=2

δ2iG

21i + 2

∑2≤i<j

δiδj G0iG0j︸︷︷︸≥0

≥n+1∑i=2

δ2iG

21i

≥ |∇v|2 sin2 θ;

and

|∇v⊥|2 =

n+1∑i=2

δ2i (1−G2

1i) + 2∑

2≤i<j

δiδj (Gij −G0iG0j)︸︷︷︸≤0

≤n+1∑i=2

δ2i (1−G2

1i)

≤ |∇v|2 cos2 θ.

.

Re-arranging, this givescos2 ωT

1− cos2 ωT≥ sin2 θ

1− sin2 θ.

The function x 7→ x/(1 − x) is increasing on [0, 1), so cos2 ωT ≥ sin2 θ. It is clear that cosωT ≤ 0, hencecosωT ≤ − sin θ.

The introduction of the artificial diffusion provides a discrete maximum principle, which willnow be explained. The following result is adapted from the results in [9]. Recall that here negativeand positive mean respectively less than or equal to 0 and greater than or equal to 0.


Proposition 6.4 (Monotonicity). LetT h

0<h≤1be a uniformly strictly acute family of meshes.

For every α ∈ Λ, h ∈ (0, 1], Lαh defined in (5.1) has the following monotonicity property. Letv, w ∈ Vh be such that v − w has a negative minimum at a node xi, i ∈ 1, . . . , Nh. Then

〈Lαhv, vi〉 ≤ 〈Lαhw, vi〉. (6.15)

Proof. By linearity it is sufficient to consider w ≡ 0. Temporarily writing ∇v = ∇v|T for eachT ⊂ supp vi, lemma 6.3 holds. Therefore, for any α ∈ Λ, from inequality (4.3),

(εh + aα) (∇v,∇vi)T ≤ − sin (η1 + η2 + a) θ |T | |∇v| |∇vi| .

By the Cauchy-Schwarz inequality and from the fact that ∇v is constant on each element,

(bα · ∇v, vi)T ≤ |∇v|

n∑j=1

(bαj , vi

)2 12

≤ |∇v|

‖vi‖2L1(T )

n∑j=1

‖bαj ‖2L∞(T )

12

≤ |∇v| ‖b‖∞,2,T|T |n+ 1

.

First assume that ‖b‖∞,2,T > 0 on T and let us write

cT = dT +1

(d+ 1)σT sin θ− a

‖b‖∞,2,ThTFrom the definition of cT , dT > 0, so

η1 = dT ‖b‖∞,2,ThT +‖b‖∞,2,ThT

(d+ 1)σT sin θ− a;

From the fact that σT ≤ hT |∇vi|, we find that

(aα + η1) (∇v,∇vi)T + (bα · ∇v, vi)T ≤ − sin θdT ‖b‖∞,ThT |T | |∇vi| |∇v| .

Now suppose that b = 0 on T . Then

(aα + η1) (∇v,∇vi)T + (bα · ∇v, vi)T ≤ − sin θ (η0 + aT )hT |T | |∇vi| |∇v| .

In both cases there is CT,h > 0 such that

(aα + η1) (∇v,∇vi)T + (bα · ∇v, vi)T ≤ −CT,h |∇v| . (6.16)

If v ≤ 0 on T , then (cαv, vi) ≤ 0 and (η2∇v,∇vi) ≤ 0 so the result follows. If v becomes positive onT , then from the assumption that it has a negative minimum and the fact that v is piecewise linear, for allx ∈ T

v(x) = v(xi) +∇v · (x− xi) ≤ hT |∇v| ,hence

(cαv, vi)T ≤ |∇v|hT|T |n+ 1

‖cα‖C(T ).

Yet, again using the fact that hT∇vi ≥ σT ,

(η2∇v,∇vi)T ≤ − |∇v|hT|T |n+ 1

supα∈Λ‖cα‖C(T );

so

(cαv, vi)T + (η2∇v,∇vi)T ≤ 0.

Summing these inequalities over all elements T ⊂ supp vi concludes the proof.

Corollary 6.5 (Discrete maximum principle). [9]. Let v ∈ Vh such that for every i ∈ 1, . . . , Nh,there exists αi ∈ Λ for which

〈Lαih v, vi〉 ≥ 0.

Then

minUv ≥ min

∂Umin(v, 0). (6.17)

Proof. If v ≥ 0 on U , there is nothing to show. If v achieves a strictly negative minimum inU , since v is piecewise linear, it achieves its minimum at a node xi of the mesh. Then inequality(6.16) shows that

0 ≤ 〈Lαih v, vi〉 ≤ −∑

T⊂supp vi

CT,h |∇v|T | ,

7. ELLIPTIC PROBLEM: PROOF OF MAIN RESULTS 83

so ∇v|T = 0 for all T ⊂ supp vi and v achieves its minimum at all neighbouring nodes. By

induction, we find that v is constant over U and

minUv = min

∂Uv.

Both cases imply inequality (6.17).

7. Elliptic problem: proof of main results

7.1. Proof of theorem 5.1. For α ∈ ΛNh , let G (α) ∈ L(Vh,0;RNh

)be defined by

(G (α) vj)i = 〈Lαih vj , vi〉.To see that G (α) is an isomorphism, suppose that there exists v, w ∈ Vh,0 such that G (α) (v −w) = 0. Then corollary 6.5 implies that v ≥ w and w ≥ v, hence G is injective. Furthermore,dimVh,0 = Nh, so G (α) is an isomorphism of vector spaces.

Furthermore, the monotonicity property of proposition 6.4, applied to vj , j ∈ 1, . . . , Nh,shows that (G (α) vj)i ≤ 0 if i 6= j. Theorem 2.1 of appendix B in conjunction with corollary 6.5shows that the matrix of G (α) is a non-singular M-matrix.

By assumption 3.2, g ∈ C0,1 (∂U), so set

gh =N∑

j=Nh+1

g(xj)vj ,

and note that γ∂U (gh) = Ih∂Ug. Therefore the numerical scheme is equivalent to finding uh,0 ∈ Vh,0such that

supα∈Λ

[〈Lαhuh,0, vi〉 − ((fα, vi)− 〈Lαhgh, vi〉)] = 0; (7.1a)

uh = uh,0 + gh. (7.1b)

Let dαi = (fα, vi) − 〈Lαhgh, vi〉. Note that for j > Nh, vj has a negative minimum at xi,i ∈ 1, . . . , Nh, so proposition 6.4 and the assumption that g ≥ 0 implies that 〈Lαhgh, vi〉 ≤ 0 forall i ∈ 1, . . . , Nh. By assumption 3.2, fα ≥ 0, so we conclude that dαi ≥ 0.

Furthermore, assumption 3.1 with the fact that G (α) is represented by a non-singular M-matrix, imply that the hypotheses of lemma 2.5, theorem 2.6 and corollary 3.10 of chapter 4 are allsatisfied. Therefore applying these results shows that there exists a unique solution uh,0 to equation(7.1) and that for every α ∈ Λ, uh,0 satisfies

0 ≤ uh,0 ≤ wαh,0 on U,

where wαh,0 solves

〈Lαhwαh , v〉 = (fα, v)− 〈Lαhgh, v〉 for all v ∈ Vh,0;

γ∂U(wαh,0

)= 0.

Since gh ≥ 0, uh = uh,0 + gh solves (5.2) and satisfies

0 ≤ uh ≤ wαh on U, (7.3)

where wαh solves

〈Lαhwαh , v〉 = (fα, v) for all v ∈ Vh,0; (7.4a)

γ∂U (wαh ) = Ih∂Ug. (7.4b)

This proves the first part of theorem 5.1.

We now show that if g ≡ 0 there exists C ≥ 0 such that

‖uh‖H1(U) ≤ C infα∈Λ‖fα‖L2(U).

Since uh ≥ 0 on U , uh ∈ Vh,0 and for every i ∈ 1, . . . , Nh〈Lαhuh, vi〉 ≤ (fα, vi) .


Multiplying this last inequality by uh(xi) ≥ 0, summing this last inequality over i ∈ 1, . . . , Nhand using linearity, coercivity and the Cauchy-Schwarz inequality, we find that

c0‖uh‖2H1(U) ≤ 〈Lαhuh, uh〉 ≤ C‖fa‖L2(U)‖uh‖H1(U).

where C is independent of α by assumption 3.1. Therefore

‖uh‖H1(U) ≤ C infα∈Λ‖fα‖L2(U).

This completes the proof of theorem 5.1.

7.2. Proof of proposition 5.2. Suppose that there exists w ∈ H2(U)∩W 1,∞(U) and α ∈ Λsuch that

Lαw ≥ fα a.e. on U

γ∂U (w) = g;

Then by proposition 6.2, for every h ∈ (0, 1] there exists Qαhw solving

〈LαhQαhw, v〉 = 〈Lαw, v〉 for all v ∈ Vh,0;

γ∂U (Qαhw) = Ih∂Ug.

Since vi is positive on U , for every i ∈ 1, . . . , Nh,

〈Lαh (uh −Qαhw) , vi〉 ≤ 0.

Because uh −Qαhw ∈ Vh,0, the discrete maximum principle, corollary 6.5, implies that

uh ≤ Qαhw on U.

so by inequality (7.3), 0 ≤ uh ≤ Qαhw. Furthermore, proposition 6.2 implies that Qαhw converges

uniformly to the continuous representative of w on U . Hence, there exists C ≥ 0 independent of hsuch that

‖uh‖L∞(U) ≤ ‖Qαhw‖L∞(U) ≤ C.

This proves (5.6).

Since there is a representative w ∈ C(U), with w = g on ∂U , for every ε > 0 there is δ > 0 such

that if |x− y| < δ, x ∈ ∂U , y ∈ U , then |g(x)− w(y)| < ε. Furthermore Qαhw converges uniformly

to w on U , so there is h0 > 0 such that for all h < h0, |w(y)−Qαhw(y)| < ε.

Thus for all h < h0, x ∈ ∂U , y ∈ U ,

uh(y)− g(x) ≤ Qαhw(y)− g(x) ≤ 2ε,

which shows that

lim supy→xh→0

uh(y) ≤ g(x).

7.3. Proof of theorem 5.3. Let hn ⊂ (0, 1] be a sequence tending to 0. As noted in remark

2.8, terms such as Lαhn will be abbreviated by Lαn, etc. For x ∈ U , define S(x, ε) = U ∩B(x, ε) anddefine the upper and lower envelopes by

u∗(x) = limε→0

lim supn

supy∈S(x,ε)

un(y); (7.6)

u∗(x) = limε→0

lim infn

infy∈S(x,ε)

un(y). (7.7)

From the hypothesis that un remains bounded in L∞(U) and by proposition 2.5 of chapter 5,u∗ ∈ USC

(U)

and u∗ ∈ LSC(U).

The hypothesis that uh → g near ∂U implies that u∗ = u∗ = g on ∂U .

7. ELLIPTIC PROBLEM: PROOF OF MAIN RESULTS 85

u∗ is a subsolution. Let w ∈ C2(U)

be such that u∗ − w has a strict maximum at x ∈ U ,u∗(x) = w(x). Fix α ∈ Λ and define the finite element projections Qαnw of w by

〈LαnQαnw, v〉 = (Lαw, v) for all v ∈ Vhn,0; (7.8a)

γ∂U (Qαnw) = In∂Uw. (7.8b)

Since w ∈ C2(U)

implies that w ∈ H2(U)∩W 1,∞(U), proposition 6.2 implies that Qαnw exist and

converge to w uniformly on U .

With the current definitions, the assumptions of proposition 2.8 of chapter 5 are satisfied. Solet ε > 0 be small enough such that S(x, ε) ⊂ U and u∗ − w achieves a strict maximum at x overS(x, ε). Applying proposition 2.8 of chapter 5 shows there is a subsequence of hn, also denotedby hn ⊂ (0, 1] and xn ⊂ S(x, ε) such that

un(xn)−Qαnw(xn) = maxy∈S(x,ε)

un(y)−Qαnw(y), (7.9a)

limn→∞

un(xn)−Qαnw(xn) = u∗(x)− w(x) = 0, (7.9b)

limn→∞

xn = x. (7.9c)

For n sufficiently large, i.e. hn < ε, S(x, ε) contains interior nodes of the mesh T n. Sinceun − Qαnw is piecewise linear and reaches its extrema at nodes of the mesh, we conclude that xnis then an interior node of the mesh. Let vn be the re-normalised hat function associated with thenode xn.

Let µαn = un(xn)−Qαnw(xn). Equation (7.9a) implies that un−Qαn−µαn has a positive maximumat xn, which for n large, is an interior node of the mesh. Therefore the monotonicity property,proposition 6.4 implies that

〈Lαnun, vn〉 ≥ 〈Lαn (Qαnw + µαn) , vn〉.

From the definition of Qαnw, we find that

〈Lαnun, vn〉 − (fα, vn) ≥ (Lαw − fα, vn) + µαn (cα, vn) .

From the definition of the scheme,

〈Lαnun, vn〉 − (fα, vn) ≤ 0.

Assumption 3.1 and the fact that ‖vi‖L1(U) = 1 imply there exists C ≥ 0 such that

supα∈Λ|(cα, vi)| ≤ sup

α∈Λ‖cα‖C(U) ≤ C,

Since µαn → 0 by (7.9b), it therefore holds that

limn→∞

µαn (cα, vn) = 0.

From the arguments of the proof of lemma 6.1,

limn→∞

(Lαw, vn) = Lαw(x);

and since fα ∈ C0,γ(U),

limn→∞

(fα, vn) = fα(x).

Therefore, taking the limit in inequality (7.3),

Lαw(x)− fα(x) ≤ 0.

Since α was arbitrary, we conclude that

supα∈Λ

[Lαw(x)− fα(x)] ≤ 0,

thus showing that u∗ is a viscosity subsolution of (3.1a).


u∗ is a supersolution. Let w ∈ C2(U)

be such that u∗−w has a strict local minimum at x ∈ U ,

with u∗(x) = w(x). Let Lnw = Lhnw be defined as in lemma 6.1, i.e.

〈−∆Lnw, v〉 = (−∆w, v) for all v ∈ Vh,0;

γ∂U (Lnw) = In∂Uw.Under assumption 3.2, by lemma 6.1, Lnw converges to w uniformly on U . Let ε > 0 be such thatS(x, ε) ⊂ U and u∗−w has a strict minimum at x over S(x, ε). Then by proposition 2.8 of chapter5, there exists a subsequence of hn, also denoted hn, and xn ⊂ S(x, ε), such that

un(xn)− Lnw(xn) = miny∈S(x,ε)

un(y)− Lnw(y); (7.10a)

limn→∞

un(xn)− Lnw(xn) = u∗(x)− w(x) = 0; (7.10b)

limn→∞

xn = x. (7.10c)

For n sufficiently large, i.e. hn < ε, S(x, ε) contains interior nodes of the mesh T n. Since un−Lnwis piecewise linear over T n, we conclude that for n sufficiently large, xn is a node of the mesh T n.Let vn be the re-normalised hat function associated to the node xn.

Assumption 3.1 implies that for each xn, there exists αn ∈ Λ such that

〈Lαnn un, vn〉 − (fαn , vn) = 0.

Let µn = un(xn) − Lnw(xn). By (7.10a), un − Lnw − µn achieves a negative minimum at xnan interior node of the mesh. Therefore by the monotonicity property, proposition 6.4,

〈Lαnn un, vn〉 ≤ 〈Lαnn Lnw, vn〉+ µn (cαn , vn) .

So0 ≤ sup

α∈Λ[〈LαnLnw, vn〉 − (fα, vn) + µn (cα, vn)] . (7.11)

From the consistency property, lemma 6.1, in particular using (6.4),

limn→∞

supα∈Λ

[〈LαnLnw, vn〉 − (fα, vn)] = supα∈Λ

[Lαw(x)− fα(x)] . (7.12)

As before, by (7.10b) and by assumption 3.1,

limn→∞

µn (cα, vn) = 0 uniformly over Λ.

Therefore taking the limit in (7.11) gives

0 ≤ supα∈Λ

[Lαw(x)− fα(x)] ,

so u∗ is a viscosity supersolution of (3.1a)

Convergence to the viscosity solution. Since u∗ ∈ USC(U)

and u∗ ∈ LSC(U)

are respectivelya viscosity subsolution and supersolution, and from the assumption that u∗ = u∗ = g on ∂U , thestrong comparison property assumed in 3.3 implies that

supU

[u∗ − u∗] = sup∂U

[u∗ − u∗] = 0,

hence u∗ ≤ u∗ on U . By definition, u∗ ≥ u∗ on U , therefore u∗ = u∗ is a viscosity solution ofequation (3.1). Again by the comparison property, there is a unique viscosity solution u to (3.1).

For completeness, we now show that the uh tends uniformly to u. Let ε > 0. Since u ∈ C(U)

and U compact, u is uniformly continuous and there exists δ > 0 such that if x, y ∈ U with|x− y| < δ, then |u(x)− u(y)| < ε. By definition of the envelopes, for all x ∈ U , there existsδx ∈ (0, δ) such that

lim supn

supy∈B(x,δx)

un(y)− u(x) < ε; (7.13)

lim infn

infy∈B(x,δx)

un(y)− u(x) > ε. (7.14)

Therefore there exists Mx such that for all m ≥Mx, and y ∈ B (x, δx)

|um(y)− u(x)| < ε.

8. FURTHER SUPPORTING RESULTS 87

Because U is compact, there exists an open coverB(xj , δxj

)Jj=1

of U . For anym ≥ maxMxj

Jj=1

and y ∈ U , there exists xj such that |y − xj | < δxj and thus

|un(y)− u(xj)| < ε.

By uniform continuity of u, using the fact that δxj < δ, for any y ∈ U , m ≥ maxMxj

Jj=1

,

|un(y)− u(y)| < 2ε.

Finally note that the sequence hn was arbitrary, so we conclude that

limh→0

uh = u uniformly on U ;

thus concluding the proof of theorem 5.3.

8. Further supporting results

The space C1([0, T ];C2

(U))

is defined to be the set of continuous functions v : [0, T ] 7→ C2(U)

that have a continuous extension v : (−δt, T + δt) 7→ C2(U), δt > 0 such that there exists a

continuous function ∂tv : (−δt, T + δt) 7→ C2(U)

that satisfies for every t ∈ (−δt, T + δt),

lims→0

1

|s|‖v(·, t+ s)− v(·, t)− s∂tv(·, t)‖C2(U) = 0.

The restriction of ∂tv to [0, T ] is denoted ∂tv. Let the norm on C1([0, T ];C2

(U))

be defined as

‖v‖C1([0,T ];C2(U)) = supt∈[0,T ]

[‖v(·, t)‖C2(U) + ‖∂tv(·, t)‖C2(U)

]. (8.1)

Other similar spaces are similarly defined, see [13, p. 280]. Under assumptions 2.1, 3.1 and 3.2, thefollowing hold.

Proposition 8.1. Let w ∈ C0([0, T ];C2

(U))

. Then for every t ∈ [0, T ] and h ∈ (0, 1], there existsa unique Qαhw (·, t) ∈ Vh solving

〈LαhQαhw(·, t), v〉 = 〈Lαw(·, t), v〉 for all v ∈ Vh,0; (8.2a)

γ∂U (Qαhw(·, t)) = Ih∂Uw(·, t); (8.2b)

and there exists a unique Lhw(·, t) solving

〈−∆Lhw(·, t), v〉 = (−∆w(·, t), v) for all v ∈ Vh,0; (8.3a)

γ∂U

(Lhw(·, t)

)= Ih∂Uw(·, t). (8.3b)

There exists C ≥ 0 independent of h and α such that

‖Qαhw‖C0([0,T ];Vh) ≤ C‖w‖C0([0,T ];C2(U)), (8.4)

where the norm on Vh is taken to be the supremum norm; and

‖Lhw‖C0([0,T ];Vh∩W 1,∞(U)) ≤ C‖w‖C0([0,T ];C2(U)), (8.5)

where the norm on Vh ∩W 1,∞(U) is taken to be the W 1,∞(U) norm.

For any w ∈ C0([0, T ];C2

(U))

, α ∈ Λ,

limh→0‖w −Qαhw‖C0([0,T ];C(U)) = 0 uniformly over Λ; (8.6)

and there exists h0 > 0 and C ≥ 0 independent of h such that for all h < h0,

‖w − Lhw‖C0([0,T ];W 1,∞(U)) ≤ Ch‖w‖C0([0,T ];C2(U)). (8.7)


Proof. Let w ∈ C0([0, T ];C2

(U))

. Then for every t ∈ [0, T ], since w(·, t) ∈ C2(U)

and

C2(U)→ H2(U)∩W 1,∞(U), proposition 6.2 implies that there exists a unique solution Qαhw(·, t)

to (8.2).

The main property used in the following is that the operators Lα and Lαh do not depend on t

and are linear. Thus for v, w ∈ C0([0, T ];C2

(U))

, s, t ∈ [0, T ], we have

Qαhv(·, t)−Qαhw(·, s) = Qαh (v(·, t)− w(·, s)) , (8.8)

where this is shown by simple verification that both sides solve (8.2) and by the above uniquenessproperty.

For every t ∈ [0, T ], the convergence result of proposition 6.2, namely inequality (6.9), impliesthat

‖Qαhw(·, t)‖C(U) ≤ ‖w(·, t)−Qαhw(·, t)‖C(U) + ‖w(·, t)‖C(U)

≤ C‖w(·, t)‖C2(U) ≤ C‖w‖C0([0,T ];C2(U)),(8.9)

where the constant C may be taken to be independent of h and α as a result of the fact thath ∈ (0, 1] and d(n, h)h remains bounded.

Using this previous inequality with w(·, t) replaced by w(·, t+ h)− w(·, t) and using (8.8) alsoshows that Qαhw : [0, T ] 7→ Vh is continuous as a consequence of the continuity of w. As a result,Qαhw ∈ C0 ([0, T ];Vh), thus showing (8.4).

Equation (8.6) follows from the convergence result of proprosition 6.2, inequality (6.9), followedby taking the supremum over [0, T ].

The proof for Lhw is similar and makes use of theorem 1.7 of appendix C, which is assumed tohold as stated in assumption 3.2, and also uses the fact that C2

(U)→W 2,∞(U).

Corollary 8.2. If w ∈ C1([0, T ];C2

(U))

, then Qαhw ∈ C1 ([0, T ];Vh), and

∂tQαhw = Qαh∂tw. (8.10)

Similarly, if w ∈ C1([0, T ];C2

(U))

, then Lhw ∈ C1([0, T ];Vh ∩W 1,∞(U)

)and

∂tLhw = Lh∂tw. (8.11)

In addition

limh→0‖w −Qαhw‖C1([0,T ];C(U)) = 0 uniformly over Λ; (8.12)

and there exists h0 > 0 and C ≥ 0 independent of h such that for all h < h0,

‖w − Lhw‖C1([0,T ];W 1,∞(U)) ≤ Ch‖w‖C1([0,T ];C2(U)) (8.13)

Proof. If w ∈ C1([0, T ];C2

(U))

, by proposition 8.1, Qαhw and Qαh∂tw exist and are unique.We show that Qαh∂tw is the derivative of Qαhw from first principles as follows. By equation (8.8),and inequality (8.9), after considering an extension of w and Qαhw to (−δt, T + δt), there is C ≥ 0independent of h and α such that for all s, |s| < δt,

1

|s|‖Qαhw(·, t+ s)−Qαhw(·, t)− sQαh∂tw(·, t)‖C(U) =

1

|s|‖Qαh (w(·, t+ s)− w(·, t)− s∂tw(·, t)) ‖C(U)

≤ C 1

|s|‖w(·, t+ s)− w(·, t)− s∂tw(·, t)‖C2(U).

(8.14)

Thus, using differentiability of w, we see that Qαhw is differentiable and ∂tQαhw = Qαh∂tw. It follows

from proposition 8.1 applied to ∂tQαhw that ∂tQ

αhw ∈ C0 ([0, T ];Vh) so Qαhw ∈ C1 ([0, T ];Vh).

Equation (8.12) follows from (8.6), again using the fact that Qαh∂tw = ∂tQαhw.

Similar arguments are used to show the analoguous properties of Lhw by making use of propo-sition 8.1.

8. FURTHER SUPPORTING RESULTS 89

Lemma 8.3 (Convergence of finite differences). For every ε > 0 there is δ > 0 such that for allh+ ∆th < δ, α ∈ Λ, i ∈ 1, . . . , Nh and t ∈ [0, T −∆th],∣∣∣(∂tw(·, t), vi)−∆h

tQαhw(xi, t)

∣∣∣ ≤ ε, (8.15)

and

‖∂tw(·, t)−∆ht Lhw(·, t)‖W 1,∞(U) ≤ ε. (8.16)

Proof. By corollary 8.2, for every h ∈ (0, 1] and α ∈ Λ, there exists a unique Qαhw ∈C1 ([0, T ];Vh) solving (8.2) and Qαhw tends to w in C0

([0, T ];C

(U))

uniformly in α.

Let ε > 0. Since w ∈ C1([0, T ];C2

(U))

, there is a continuous extension of w, also denoted

w ∈ C1((−δt, T + δt);C2

(U))

, δt > 0. Let Qαhw be similarly extended.

By uniform continuity of w and ∂tw on, say, [−δt/2, T + δt/2], there is δ ∈ (0, δt/2) such that(x, t), (y, s) ∈ O with |x− y|+ |t− s| < δ implies

|∂tw(x, t)− ∂tw(y, s)| < ε,

So if h < δ/2, we have for all i ∈ 1, . . . , Nh, t ∈ [0, T ]

|∂tw(xi, t)− (∂tw(·, t), vi)| ≤ ε. (8.17)

Again by uniform continuity, we may take δ ∈ (0, δt/2) such that for all s ∈ [−δ, δ], t ∈ [0, T ],

‖∂tw(·, t+ s)− ∂tw(·, t)‖C2(U) ≤ ε. (8.18)

and such that

‖∂tQαhw − ∂tw‖C0([0,T ];C(U)) ≤ ε. (8.19)

Using inequality (8.14), the fundamental theorem of calculus and the fact that partial derivativespermute, we find by (8.18) that if ∆th < δ, then

‖∆htQ

αhw(·, t)− ∂tQαhw(·, t)‖C(U) ≤ C‖∆

htw(·, t)− ∂tw(·, t)‖C2(U)

≤ C supx∈U

∑|β|≤2

∣∣∣∆htD

βw(x, t)−Dβ∂tw(x, t)∣∣∣

≤ C supx∈U

∑|β|≤2

∣∣∣∆htD

βw(x, t)− ∂tDβw(x, t)∣∣∣

≤ C supx∈U

∑|β|≤2

sups∈[0,∆th]

∣∣∣∂tDβw(x, t+ s)− ∂tDβw(x, t)∣∣∣

≤ C supx∈U

∑|β|≤2

sups∈[0,∆th]

∣∣∣Dβ∂tw(x, t+ s)−Dβ∂tw(x, t)∣∣∣

≤ C sups∈[0,∆th]

‖∂tw(·, t+ s)− ∂tw(·, t)‖C2(U) ≤ Cε.

where the constant C is independent of h and α as shown by (8.14).

This shows that, for all ε > 0, after possibly redefining δ, there is δ > 0 such that for allh+ ∆th < δ, (8.17) and (8.19) hold and that for all t ∈ [0, T −∆th], i ∈ 1, . . . , Nh, α ∈ Λ,∣∣∣∆h

tQαhw(xi, t)− ∂tQαhw(xi, t)

∣∣∣ ≤ ε. (8.20)

We conclude by using the triangle inequality and inequalities (8.17), (8.19) and (8.20) that for allε > 0, there is δ > 0 such that for all h+ ∆th < δ, (x, t) ∈ O, i ∈ 1, . . . , Nh, t ∈ [0, T −∆th],∣∣∣(∂tw(·, t), vi)−∆h

tQαhw(xi, t)

∣∣∣ ≤ 3ε, (8.21)

which after redefining δ gives (8.15). The proof of (8.16) is again very similar.


Lemma 8.4. [29, p. 249]. For every h ∈ (0, 1] and ∆th > 0, α ∈ ΛNh, let G (α) ∈ L(Vh,0;RNh

)be defined by

(G(α)v)i = v(xi) + ∆th〈Lαih v, vi〉.

Then G (α) is represented by a non-singular M-matrix. Considering Vh,0 with the nodal basis

viNhi=1, the matrix representing the inverse of G(α), denoted (G(α))−1 satisfies ‖ (G(α))−1 ‖∞ ≤ 1.

Proof. G(α) is represented by a non-singular M-matrix as a result of corollary 6.5 and theorem

2.1 of appendix B. Since (G(α))−1 ≥ 0 in the entry-wise sense, we have

‖ (G(α))−1 ‖∞ = max1≤i≤Nh

Nh∑j=1

(G(α))−1ij = max

1≤i≤Nh

((G(α))−1 1

)i, (8.22)

where 1 is the vector with all entries equal to 1. Now, let v =∑N

j=1 vj , then for all i ∈ 1, . . . , Nh,

(G(α)v)i = 1 + ∆th (cαi , vi) ≥ 1.

Furthermore, by the monotonicity property of proposition 6.4, for all i ∈ 1, . . . , Nh,

(G(α)v)i =

G(α)

Nh∑j=1

vj

i

+

N∑j=Nh+1

〈Lαivj , vi〉︸︷︷︸≤0

;

so(G(α)

∑Nhj=1 vj

)i≥ 1 for all i ∈ 1, . . . , Nh. By applying the inverse of G(α) to both sides of

the inequality, using inverse monotonicity and the fact that vj is a nodal basis, we find that forall i ∈ 1, . . . , Nh,

1 =

Nh∑j=1

vj(xi) ≥(

(G(α))−1 1)i.

Therefore ‖ (G(α))−1 ‖∞ ≤ 1 by equation (8.22) and the previous inequality.

Proposition 8.5. Let w ∈ C1([0, T ];C2

(U))

. Under the above assumptions, for every h ∈ (0, 1]

and α ∈ Λ, there exists a unique Gαhw ∈ C0 (Sh;Vh) solving for every sk ∈ S+h

−∆htG

αhw(xi, sk) + 〈LαhGαhw(·, sk), vi〉 = (−∂tw(·, sk) + Lαw(·, sk), vi) ∀i ∈ 1, . . . , Nh ; (8.23a)

γ∂U (Gαhw(·, sk)) = Ih∂Uw(·, sk); (8.23b)

Gαhw(·, T ) = Ihw(·, T ). (8.23c)

Furthermore Gαhw tends uniformly to w on O, uniformly over Λ, in the sense that

limh+∆th→0

maxsk∈Sh

‖Gαhw(·, sk)− w(·, sk)‖C(U) = 0 uniformly over Λ. (8.24)

Proof. Existence and uniqueness follow from lemma 8.4 applied inductively for k = T∆th−

1, . . . , 0. By lemma 8.3, for every ε > 0 there is δ > 0 such that for all h + ∆th < δ, all i ∈1, . . . , Nh, t ∈ [0, T −∆th], ∣∣∣(∂tw(·, t), vi)−∆h

tQαhw(xi, t)

∣∣∣ ≤ ε.So from the definition of Qαhw and Gαhw, namely

−∆htG

αhw(xi, sk) + 〈LαhGαhw(·, sk), vi〉 = (−∂tw(·, sk) + Lαw(·, sk), vi)

and

〈LαhQαhw(·, t), v〉 = (Lαw(·, t), v) for all v ∈ Vh,0,we have for all sk ∈ S+

h∣∣∣∆ht (Gαhw(xi, sk)−Qαhw(xi, sk)) + 〈Lαh (Gαhw(·, sk)−Qαhw(·, sk)) , vi〉

∣∣∣ ≤ ε. (8.25)

Since

γ∂U (Qαhw(·, sk)) = Ih∂Uw(·, sk) = γ∂U (Gαhw(·, sk))

9. PARABOLIC PROBLEM: PROOF OF MAIN RESULTS 91

it follows that for all sk ∈ S+h , Qαhw(·, sk)−Gαhw(·, sk) ∈ Vh,0. Therefore (8.25) may be written as

−ε∆th +Gαhw(xi, sk+1)−Qαhw(xi, sk+1) ≤ (Aαh [Gαhw(·, sk)−Qαhw(·, sk)])i ,(Aαh [Gαhw(·, sk)−Qαhw(·, sk)])i ≤ G

αhw(xi, sk+1)−Qαhw(xi, sk+1) + ε∆th,

where Aαh ∈ L(Vh,0;RNh

)is defined by

(Aαhvj)i = vj(xi) + ∆th〈Lαhvj , vi〉. (8.26)

By lemma 8.4, using inverse monotonicity of Aαh , and the fact that ‖ (Aαh)−1 ‖∞ ≤ 1, we have

maxi∈1,...,Nh

|Gαhw(xi, sk)−Qαhw(xi, sk)| ≤ maxi∈1,...,Nh

|Gαhw(xi, sk+1)−Qαhw(xi, sk+1)|+ ε∆th.

By induction and using the fact that the extreme values of Gαhw(·, sk)−Qαhw(·, sk) are necessarilyattained at the interior nodes because Gαhw(·, sk)−Qαhw(·, sk) ∈ Vh,0, we have

‖Qαhw −Gαhw‖L∞(Sh;C(U)) ≤ ‖Qαhw(·, T )−Gαhw(·, T )‖C(U) + Tε

≤ ‖Qαhw(·, T )− Ihw(·, T )‖C(U) + Tε.

It follows from uniform convergence of Qαhw to w on C1([0, T ];C

(U))

, uniformly over Λ and

uniform convergence of Ihw(·, T ) to w(·, T ) that Gαhw converges to w in the sense of equation(8.24), uniformly over Λ.

9. Parabolic problem: proof of main results

9.1. Proof of theorem 5.4. By lemma 8.4, the matrix representing G (α) is a non-singularM-matrix. Assumption 3.1 implies that the assumptions of lemma 2.5 and theorem 2.6 of chapter4 are met. Therefore, given uh(·, sk+1) ∈ Vh,0, there exists a unique uh(·, sk) ∈ Vh,0 solving

−∆ht uh(xi, sk) + sup

α∈Λ[〈Lαhuh(·, sk), vi〉 − (fα, vi)] = 0 ∀i ∈ 1, . . . , Nh .

Thus by induction, there exists a unique uh solution to the numerical scheme (5.8). To show thatuh ≥ 0 on U ×Sh, first note that by assumption 3.2, uT ≥ 0 on U . Therefore IhuT ≥ 0 on U . Nowsuppose that for sk ∈ S+

h , uh(·, sk+1) ≥ 0.

By compactness and continuity of Lα, assumption 3.1, there exists α ∈ ΛNh such that for alli ∈ 1, . . . , Nh,

(G(α)uh(·, sk))i = uh(xi, sk+1) + ∆th (fαi , vi) ≥ 0.

Thus by inverse monotonicity of G(α), uh(xi, sk) ≥ 0 for all i ∈ 1, . . . , Nh. Since the extrema of

uh is necessarily achieved at an interior node, this shows that uh(xi, sk) ≥ 0 on U .

Furthermore, by lemma 8.4, ‖ (G(α))−1 ‖∞ ≤ 1. So, after using Holder’s inequality,

maxi∈1,...,Nh

|uh(xi, sk)| ≤ maxi∈1,...,Nh

|uh(xi, sk+1)|+ ∆th supα∈Λ‖fα‖C(U)

Again, since the extrema of uh ∈ Vh,0 is necessarily achieved at an interior node, and assumption

3.1 implies that fαα∈Λ is bounded in C(U), we have by induction that for all sk ∈ Sh,

‖uh(·, sk)‖C(U) ≤ ‖IhuT ‖C(U) + T sup

α∈Λ‖fα‖C(U)

≤ ‖uT ‖C(U) + T supα∈Λ‖fα‖C(U).

This shows (5.9).

To prove (5.10), we use arguments similar to [16, p. 335]. By assumption 3.2, uT ∈ C(U). So

by the Tietze extension theorem, [24, p. 241], uT may be continuously extended to uT ∈ C (Rn).For ε > 0, let uεT ∈ C∞ (Rn) be the standard mollification of radius ε of uT ; uεT converges uniformly

to uT on U , see [15, p. 630].

So for all δ > 0, there exists ε0 > 0 such that for all ε < ε0, ‖uT − uεT ‖C(U) ≤ δ. For some

ε < ε0, letϕ = uεT + 3δ.


Then

uT + 2δ ≤ ϕ ≤ uT + 4δ on U.

For h ∈ (0, 1], let Lhϕ be defined as in lemma 6.1, i.e.

〈−∆Lhϕ, v〉 = (−∆ϕ, v) for all v ∈ Vh,0,

γ∂U

(Lhϕ

)= Ih∂Uϕ.

By lemma 6.1, Lhϕ converges uniformly to ϕ on U . So there exists h0 > 0 such that for all h < h0,

‖ϕ− Lhϕ‖C(U) ≤ δ, (9.1)

‖uT − IhuT ‖C(U) ≤ δ. (9.2)

As a result, for h < h0

IhuT ≤ Lhϕ ≤ uT + 5δ on U. (9.3)

By (6.3) of the consistency property, lemma 6.1, and by assumption 3.1, there exists K ≥ 0independent of h such that for all h < h0, i ∈ 1, . . . , Nh,

K ≥ supα∈Λ

∣∣∣〈LαhLhϕ, vi〉 − (fα, vi)∣∣∣ . (9.4)

Define

wh(x, t) = Lhϕ(x) +K(T − t).

We now show by induction that uh ≤ wh on U × Sh. By (9.3), uh(·, T ) = IhuT ≤ wh(·, T ).Now suppose that for sk ∈ S+

h , uh (·, sk+1) ≤ wh (·, sk+1).

For any α ∈ Λ, sk ∈ S+h , i ∈ 1, . . . , Nh,

−∆htwh(xi, sk) + 〈LαhLhw(·, sk), vi〉 = K + 〈LαhLhϕ, vi〉+K(T − sk) (cα, vi) ≥ (fα, vi) ,

because cα ≥ 0 by assumption 3.2 and sk ≤ T . As a result, we find that for any α ∈ Λ,

Aαh [wh(·, sk)− uh(·, sk)] ≥ wh(·, sk+1)− uh(·, sk) ≥ 0.

where Aαh is defined by (8.26). Since wh(·, sk) ≥ Lhϕ ≥ 0 on ∂U and uh ∈ Vh,0, inverse positivityof Aαh , lemma 8.4, implies that

wh(·, sk) ≥ uh(·, sk) on U.

Because K is independent of h and because uT ∈ C(U), for x ∈ U ,

lim sup(y,s)→(x,T )

h→0

uh(y, s) ≤ lim sup(y,s)→(x,T )

h→0

wh(y, s)

≤ uT (x) + 5δ.

Because δ > 0 was arbitrary, we conclude that

lim sup(y,s)→(x,T )

h→0

uh(y, s) ≤ uT (x). (9.5)

The proof for the other inequality, namely

lim inf(y,s)→(x,T )

h→0

uh(y, s) ≥ uT (x), (9.6)

is very similar, with the principal difference being that one constructs a smooth function ϕ lyingbelow uT , then set wh(x, t) = Lhϕ−K(T − t) for suitable K ≥ 0 independent of h. One deducesfrom assumption 3.1 and the definition of the scheme that there exists α ∈ ΛNh such that for eachi ∈ 1, . . . , Nh,

−∆ht [uh(xi, sk)− wh(xi, sk)] + 〈Laih [uh(·, sk)− wh(·, sk)] , vi〉 ≥ 0,

which is used in an induction argument to show that uh ≥ wh on U × Sh. This is then used toobtain (9.6). Equation (5.10) follows from (9.5) and (9.6). This completes the proof of theorem5.4.


9.2. Proof of proposition 5.5. By theorem 5.4, uh ≥ 0 on U × Sh, so for all (x, t) ∈∂U × (0, T ),

lim inf(y,s)→(x,t)

h→0

uh(y, s) ≥ 0. (9.7)

Suppose there exists w ∈ C1([0, T ];C2

(U))

such that for some α ∈ Λ

−wt + Lαw ≥ fα on O;

w(·, T ) = uT on U ;

w = 0 on ∂U × (0, T ).

Let Gαhw be defined as in proposition 8.5, i.e. for all sk ∈ S+h ,

−∆htG

αhw(xi, sk) + 〈LαhGαhw(·, sk), vi〉 = (−∂tw(·, sk) + Lαw(·, sk), vi) ∀i ∈ 1, . . . , Nh ;

γ∂U (Gαhw(·, sk)) = Ih∂Uw(·, sk);

Gαhw(·, T ) = Ihw(·, T ).

By proposition 8.5, Gαhw ∈ C0 ([0, T ];Vh,0) converges uniformly to w on O in the sense that

limh+∆th→0

maxsk∈Sh

‖Gαhw(·, sk)− w(·, sk)‖C(U) = 0.

We show by induction that uh ≤ Gαhw on U × Sh. Firstly, uh(·, T ) = IhuT = Gαhw(·, T ). Now

suppose that for sk ∈ S+h , uh(·, sk+1) ≤ Gαhw(·, sk+1). Then from the hypothesis on w and the

definition of Gαhw, we have

−∆htG

αhw(xi, sk) + 〈LαhGαhw(·, sk), vi〉 ≥ (fα, vi) for all i ∈ 1, . . . , Nh .

From the definition of the scheme, it therefore follows that

Aαh [Gαhw (·, sk)− uh (·, sk)] ≥ Gαhw(·, sk+1)− uh(·, sk+1) ≥ 0,

where Aαh is defined by (8.26). Since Gαhw(·, sk)− uh(·, sk) ∈ Vh,0, it follows from lemma 8.4, usinginverse positivity of Aαh , that

Gαhw(·, sk) ≥ uh(·, sk),thus completing the induction.

It follows from uniform convergence of Gαhw to w that for all (x, t) ∈ ∂U × (0, T ),

lim sup(y,s)→(x,t)

h→0

uh(y, s) ≤ lim sup(y,s)→(x,t)

h→0

Gαhw(y, s) = 0. (9.8)

Thus inequalities (9.7) and (9.8) together imply that for all (x, t) ∈ ∂U × (0, T ),

lim(y,s)→(x,t)

h→0

uh(y, s) = 0,

which is (5.12).

9.3. Proof of theorem 5.6. Let hn ⊂ (0, 1] be a sequence tending to 0. As noted inremark 2.8, terms such as Lαhn will be abbreviated by Lαn, Sn = Shn , etc. For (x, t) ∈ O define

Sn(x, t; ε) = B (x, t; ε) ∩ U × Sn and define the upper and lower envelopes of un by


lim supn

sup un(y, s) | (y, s) ∈ Sn(x, t; ε) ; (9.9)


lim infn

inf un(y, s) | (y, s) ∈ Sn(x, t; ε) . (9.10)

The stability result, inequality (5.9) of theorem 5.4, together with proposition 2.5 of chapter 5imply that u∗ ∈ USC

(O)

and u∗ ∈ LSC(O). From the hypothesis that un tends to the boundary

data near the boundary, we have u∗ = u∗ on ∂O the parabolic boundary of O.


u∗ is a viscosity subsolution. Fix α ∈ Λ. Recalling proposition 3.6 of chapter 3, we may takethe set of test functions in the definition of viscosity solutions to be C∞

(O), or more generally

C1([0, T ];C2

(U))

.

Let w ∈ C1([0, T ];C2

(U))

be such that u∗ −w has a strict local maximum at (x, t) ∈ O, withu∗(x, t) = w(x, t).

Let Gαhw be defined as in proposition 8.5, i.e. for each sk ∈ S+h , Gαh solves

−∆htG

αhw(xi, sk) + 〈LαhGαhw(·, sk), vi〉 = (−∂tw(·, sk) + Lαw(·, sk), vi) ∀i ∈ 1, . . . , Nh ;

γ∂U (Gαhw(·, sk)) = Ih∂Uw(·, sk);

Gαhw(·, T ) = Ihw(·, T ).

By proposition 8.5, Gαnw = Gαhnw converges uniformly to w in the sense of (8.24).

Let δ > 0 be sufficiently small such that B(x, t; δ) ⊂ O and that u∗ − w has a strict maximum

at (x, t) over B(x, t; δ). By proposition 2.8 of chapter 5, there exists a subsequence of hn, alsodenoted hn and (xn, sn) with sn ∈ S+

n and xn ∈ U , such that

un (xn, sn)−Gαnw (xn, sn) = max un(y, s)−Gαnw(y, s) | (y, s) ∈ Sn(x, t; δ) ; (9.11a)

limn→∞

un (xn, sn)−Gαnw (xn, sn) = u∗(x, t)− w(x, t) = 0; (9.11b)

limn→∞

(xn, sn) = (x, t). (9.11c)

Since for all n ∈ N, un(·, sn)−Gαnw(·, sn) ∈ Vn is piecewise linear, and since (xn, sn)→ (x, t), for nsufficiently large, xn is a node of the mesh T n. Let vn be the re-normalised hat function associatedwith the node xn. Let µαn = un(·, sn)−Gαnw(·, sn). Then un −Gαnw − µαn has a positive maximumat (xn, sn). Also, for n sufficiently large, (xn, sn+1) ∈ Sn(x, t, δ), thus,

−∆nt un (xn, sn) ≥ −∆n

t Gαnw (xn, sn) ,

and by the monotonicity property, proposition 6.4,

〈Lαnun (·, sn) , vn〉 ≥ 〈LαnGαnw(·, sn), vn〉+ µαn (cα, vn) .

As a result, from the definition of the scheme and the definition of Gαhw

0 ≥ −∆nt un (xn, sn) + 〈Lαnun (·, sn) , vn〉 − (fα, vn)

≥ −∆nt G

αnw (xn, sn) + 〈LαnGαnw(·, sn), vn〉+ µαn (cα, vn)− (fα, vn)

≥ (−wt(·, sn) + Lαw(·, sn), vn)− (fα, vn) + µαn (cα, vn) .

By assumption 3.2, −wt, Lαw and fα are uniformly continuous over O and U respectively, andµαn → 0 by (9.11b), therefore taking the limit n→∞ yields

−wt(x, t) + Lαw(x, t)− fα(x) ≤ 0.

Since α ∈ Λ was arbitrary, we conclude that

− wt(x, t) + supα∈Λ

[Lαw(x, t)− fα(x)] ≤ 0, (9.12)

thus showing that u∗ is a viscosity subsolution of (3.2a).

u∗ is a supersolution. Recalling proposition 3.6 of chapter 3, we may take the set of testfunctions in the definition of viscosity solutions to be C∞

(O), or more generally C1

([0, T ];C2

(U))

.

Let w ∈ C1([0, T ];C2

(U))

be such that u∗ − w has a strict local minimum at (x, t) ∈ O,u∗(x, t) = w(x, t).

Let Lnw be defined as in proposition 8.1, i.e. for all t ∈ [0, T ],

〈−∆Lhw(·, t), v〉 = (−∆w(·, t), v) for all v ∈ Vh,0;

γ∂U

(Lhw(·, t)

)= Ih∂Uw(·, t).

By proposition 8.1, Lnw converges uniformly to w on O.


Let δ > 0 be sufficiently small such that B (x, t; δ) ⊂ O and such that u∗ − w has a strict

minimum at (x, t) over B (x, t; δ). By proposition 2.8 of chapter 5, there exists a subsequence ofhn, similarly denoted hn, and (xn, sn), with (xn, sn) ∈ Sn(x, t; δ), such that

un (xn, sn)− Lnw (xn, sn) = min un(y, s)− Lnw(y, s) | (y, s) ∈ Sn(x, t; δ) ; (9.13a)

limn→∞

un (xn, sn)− Lnw (xn, sn) = u∗(x, t)− w(x, t) = 0; (9.13b)

limn→∞

(xn, sn) = (x, t). (9.13c)

For n sufficiently large, xn is an interior node of the mesh T n. Let vn be the re-normalised hatfunction associated with the node xn and let µn = un(xn, sn)−Lnw(xn, sn). Convergence of (xn, sn)to (x, t) implies that for n sufficiently large, (xn, sn+1) ∈ Sn(x, t; δ). Therefore un − Lnw − µn hasa negative minimum at (xn, sn), so

−∆nt un(xn, sn) ≤ −∆n

t Lnw(xn, sn)

and by the monotonicity property, proposition 6.4, for all α ∈ Λ,

〈Lαnun(·, sn), vn〉 ≤ 〈LαnLnw(·, sn), vn〉+ µn (cα, vn) .

The definition of the scheme then implies that

0 ≤ −∆nt Lnw(xn, sn) + sup

α∈Λ[〈LαnLnw(·, sn), vn〉 − (fα, vn) + µn (cα, vn)] . (9.14)

By (9.13b) and assumption 3.1,

limn→∞

µn (cα, vn) = 0 uniformly over Λ.

From lemma 6.1, in particular (6.4), and from the fact that w ∈ C1([0, T ];C2

(U))

, we concludethat

limn→∞

supα∈Λ

[〈LαnLnw(·, sn), vn〉 − (fα, vn) + µn (cα, vn)] = supα∈Λ

[Lαw(x, t)− fα(x)] .

Corollary 8.3, in particular (8.16), implies that

limn→∞

∆nt Lnw(xn, sn) = wt(x, t).

So taking the limit in (9.14) gives

0 ≤ −wt(x, t) + supα∈Λ

[Lαw(x, t)− fα(x)] ; (9.15)

thus showing that u∗ is a viscosity supersolution of (3.2a).

Convergence to the viscosity solution. Since u∗ ∈ USC(O)

and u∗ ∈ LSC(O)

are respectivelya viscosity subsolution and supersolution of (3.2a) and by hypothesis u∗ = u∗ on ∂O, the comparisonproperty of assumption 3.3 implies that

supU×(0,T ]

[u∗ − u∗] ≤ sup∂O

[u− v] = 0.

From the definition of u∗ and u∗, u∗ ≥ u∗ on O, therefore u∗ = u∗ on U × (0, T ] and u∗ = u∗ is the

unique viscosity solution of (3.2). As a consequence, the entire sequence un tends to u uniformlyon compact subsets U × (0, T ].

Conclusion

The Hamilton-Jacobi-Bellman equations treated in this work are fully non-linear degenerateelliptic or degenerate parabolic partial differential equations. The relevant notion of generalisedsolution is the notion of viscosity solutions. For Hamilton-Jacobi-Bellman equations related tooptimal control problems, the unique viscosity solution is the value function.

The Hamilton-Jacobi-Bellman equation is also related to Monge-Ampere equations since certaininstances of these equations are equivalent. It also forms part of the mean-field game equationswhich model the behaviour of large populations of agents optimising their strategies in a game.Given these links, the Hamilton-Jacobi-Bellman has applications in mathematics, science, finance,economics and engineering.

The viscosity solution of a Hamilton-Jacobi-Bellman equation can be found using monotonenumerical methods. A key part of the analysis of these methods is the Barles-Souganidis conver-gence argument. This work presented and analysed a new finite element method to find the valuefunction of a HJB equation. This work also showed how it is often possible to solve the equationsresulting from a numerical method with a superlinearly convergent algorithm.

97

APPENDIX A

Stochastic Differential Equations

This appendix quotes some basic results found in, e.g. [23] or [25] that are used to justifycertain arguments in chapters 1 and 2. For readers who are unfamiliar with the basics of measuretheory, we recommend reading the early chapters of [25] to obtain an intuitive understanding,followed by [24] and [23] for further details.

Remark (A point on notation). Random variables and stochastic processes are ultimately mapsand collections of maps defined on a set Ω which is the first component of a probability space(Ω,F ,P). However, it is common practice to only specify the dependence of random variables andprocesses on elements of Ω when necessary.

For instance it is common to denote a stochastic process x(t)0≤t≤T more succinctly as x(t),

not to be confused with x(t, ·) : Ω 7→ Rd. Expectation is denoted with the symbol E.

1. Basics

For the basic definitions related to stochastic process, see [23].

1.1. Brownian motion. Let T > 0. As a result of Kolmogorov’s extension theorem thereexists a probability space

(Ω,F ,P) (1.1)

and a stochastic process W (t)0≤t≤T , called Brownian motion, such that for any k ∈ N and any

collections Biki=1 Borel sets of Rd and tiki=1, ti < ti+1, we have

P (W (t1) ∈ B1, . . . ,W (tk) ∈ Bk) =

∫B1×···×Bk

p(t1, x, x1) . . . p(tk − tk−1, xk−1, xk)dx1 . . . dxk; (1.2)

where the transition density is

p(t, x, y) =1

(2πt)d2

e−|x−y|2

2t . (1.3)

In particular W (0) = x almost surely. As a result of Kolmogorov’s continuity theorem, we maytake W (t)0≤t≤T to have continuous paths.

For t ≥ 0, Ft is defined to be the smallest σ-algebra of Ω containing all sets of the form

ω |W (s1) ∈ B1, . . . ,W (sk) ∈ Bk ,

where k ∈ N, siki=1 ⊂ [0, t], Biki=1 Borel sets of Rd. We will call Ftt≥0 the family of σ-algebrasgenerated by Brownian motion. Let F∞ be the smallest σ-algebra containing⋃

t≥0

Ft. (1.4)

1.2. Ito integration.

Definition 1.1 (Adapted and progressively measurable processes). Let Ntt≥0 be an increasing

family of σ-algebras of Ω. A process g : [0, T ] × Ω 7→ Rn is called Nt adapted if for each t ≥ 0 therandom variable

ω 7→ g(t, ω)

is Nt measurable. See [25] or [23].

The process g is called Nt progressively measurable if for every s ≥ 0, the restriction g : [0, s]×Ω 7→ Rn is B[0,s] ×Ns measurable, where B[0,s] is the Borel σ-algebra of [0, s]. See [16, p. 403].

99

100 A. STOCHASTIC DIFFERENTIAL EQUATIONS

The construction of the Ito integral is explained in [25] and [23]. Briefly said, the Ito integralis the L2 limit of a sequence of random variables constructed by approximations of the integrand.

In a first instance, for S ≤ T , the Ito integral over [S, T ] is defined for processes f : [0,∞)×Ω 7→R that are B × F measurable, F as in (1.1), Ft adapted and satisfy

E∫ T

S|f(t)|2 dt <∞.

In particular, [23, theorem 3.2.1 p. 30] shows that for a process satisfying these criteria,

ET∫S

f(t)dW (t) = 0. (1.5)

2. Stochastic differential equations

Let b : Rn× [0, T ] 7→ Rn and σ : Rn× [0, T ] 7→ Rn×d the set of n× d matrices. Let x be randomvariable.

We say that a stochastic process x(t)0≤t≤T solves a stochastic differential equation

dx(t) = b (x(t), t) dt+ σ (x(t), t) dW (t) for all t ∈ (0, T ], (2.1a)

x(0) = x; (2.1b)

if the following Ito integral equation holds almost surely:

x(t) = x(0) +

t∫0

b (x(s), s) ds+

t∫0

σ (x(s), s) dW (s) for all t ∈ (0, T ],

x(0) = x;

A solution is said to be unique if for two solutions x1(t)0≤t≤T and x2(t)0≤t≤T , then x1(t) =

x2(t) almost surely for all t ∈ [0, T ].

Recall that for a matrix A ∈ Rn×d, the vector norm |A| of A is defined as

|A| =

∑i,j

|Aij |2 1

2

.

Theorem 2.1 (Existence and uniqueness). [23, p. 68]. Let T > 0 and b : Rn × [0, T ] 7→ Rn andσ : Rn × [0, T ] 7→ Rn×d be measurable functions, for which there exists C ≥ 0 such that for allx, y ∈ Rn and t ∈ [0, T ],

|b(x, t)|+ |σ(x, t)| ≤ C (1 + |x|) ;

and

|b(x, t)− b(y, t)| ≤ C |x− y| ; (2.2a)

|σ(x, t)− σ(y, t)| ≤ C |x− y| . (2.2b)

Let x be a random variable which is independent of F∞ (see (1.4)), such that

E[x2]<∞.

Then the stochastic differential equation (2.1) has a unique solution x(t)0≤t≤T .

Furthermore x (·, ω) : [0, T ] 7→ Rn is continuous for almost all ω ∈ Ω.

The process x is adapted to the filtration Fxt 0≤t≤T , Fxt generated by x and W (s)0≤s≤t.

Of course the theorem is true for different choices of starting times, etc.

Remark 2.2. This existence and uniqueness result implies that quantities such as the cost func-tional in (2.5), chapter 1, are well defined. In consequence, the value function in definition 3.1,chapter 1, is also well defined.

3. PROPERTIES OF DIFFUSION PROCESSES 101

3. The strong Markov property, generators and Dynkin’s formula

For t ∈ R, T > 0, and a stochastic process started at time t, satisfying a SDE

dx(s) = b (x(s), t+ s) ds+ σ (x(s), t) dW (s) for all s ∈ (0, T ],

x(t) = x;

it is helpful to rewrite the SDE in “time homogeneous” form by setting

y(s) = (x(s), t+ s) . (3.1)

Then y(s)0≤s≤T solves

dy(s) =

(b (y(s))

1

)ds+

(σ (y(s))

0

)dW (s). (3.2)

We now quote a number of results found in [23, chapter 7], which have been re-phrased for thisreformulated SDE.

Let x ∈ Rn be non-random, let t < T . Let the process y(s) = (x(s), t+ s) solve the SDE

dy(s) = b (y(s)) ds+ σ (y(s)) dW (s) for all s ∈ (0, T ]; (3.3)

y(0) = (x, t). (3.4)

Then y(s) = (x(s), s) is measurable with respect to Fs for all s ∈ [0, T ]. Let My be the σ-algebra generated by y(s)0≤s≤T . The measure P of (1.1) restricted to My is denoted Qy. Then

(Ω,My, Qy) is a probability space. Expectation with respect to this probability space is denotedEx,t.

Definition 3.1 (Stopping times). Let Nss≥0 be an increasing family of σ-algebras of subsets of

Ω. A function τ : Ω 7→ [0,∞] is called a strict stopping time with respect to Nts≥0 if

ω | τ(ω) ≤ s ∈ Ns for all s ≥ 0. (3.5)

Theorem 3.2 (Strong Markov property). [23, p. 117]. Let f be a bounded Borel measurablefunction on Rn × R and let τ a stopping time with respect to Fss≥0 such that τ < ∞ almostsurely. Then

Ex,t [f (y(τ + h)) |Fτ ] = Ex(τ),τ [f (y(h))] for all h ≥ 0. (3.6)

Informally, this theorem says that conditional expectations w.r.t a process started at (x, t) givenknowledge for τ further units of time is equivalent to the expectation w.r.t to the process if it werestarted at (x(τ), t+ τ). In other words, expectations of the future depend only on the current stateof the process. This result may be extended to further objects, such as integrals. See [23, p. 119].

The next two results are consequences of Ito’s formula.

Theorem 3.3 (Generators of diffusion processes). [23, p. 121]. If f ∈ C20 (Rn × R) the set of

compactly supported C2 functions on Rn × R, then

lims→0

Ex,t [f (y(s))]− f(x, t)

s=∂f

∂t(x, t)− Lf(x, t), (3.7)

where

Lf(x, t) = −1

2

n∑i,j=1

(σσT

)ij

(x, t)∂2f

∂xi∂xj(x, t)−

n∑i=1

bi(x, t)∂f

∂xi(x, t). (3.8)

Theorem 3.4 (Dynkin’s formula). [23, p. 124]. Let f ∈ C20 (Rn × R) and τ a stopping time such

that Ex,t [τ ] <∞. Then

E [f (y(τ))] = f(x, t) + Ex,tτ∫

0

(ft − Lf) (y(s))ds. (3.9)

The differentiability requirement can be weakened to smaller sets than Rn in some circum-stances, see ([23], chapter 11).

APPENDIX B

Matrix Analysis

In the following, let M (n,C) and M (n,R) be respectively the sets of C-valued and R-valuedn-by-n matrices.

1. Field of values

This section concerns the field of values. In particular, it will be used to obtain certain boundson the norms of the matrices that are used in chapter 4.

All results shown here are from [18, chapter 1]. However, since we wish to use only a limitedselection of results found in [18] and because the proofs are short, we have chosen to include themfor the reader’s convenience.

For A ∈M (n,C), the field of values, also sometimes called numerical range, is defined as

F (A) = x∗Ax |x ∈ Cn, ‖x‖2 = 1 . (1.1)

Theorem 1.1 (Toeplitz-Hausdorff). [18, p. 8]. For A ∈M (n,C), the field of values is a compactconvex subset of C.

Proposition 1.2 (Spectral Containment). For A ∈M (n,C), let σ(A) ⊂ C be the spectrum of A,i.e. the set of eigenvalues of A. Then

σ(A) ⊂ F(A).

One of the reasons for studying the field of values is that F(A+ B) ⊂ F(A) + F(B), whereasno such statement holds for the spectrum. Because of the spectral containment property, we willrelate the field of values to various properties of the matrix, such as positive definiteness and its2-norm.

Proposition 1.3. Let P be a unitary matrix, i.e. P ∗ = P−1, then for any A ∈M (n,C),

F (P ∗AP ) = F (A) .

Proof. If P is unitary, then for any v ∈ Cn with ‖v‖ = 1, then ‖Pv‖ = 1, so

v∗P ∗APv = (Pv)∗A (Pv) ∈ F (A) ,

thus showing F (P ∗AP ) ⊂ F (A). For every x with ‖x‖ = 1, there exists v with ‖v‖ = 1 such thatx = Pv, hence F (A) ⊂ F (P ∗AP ).

Proposition 1.4. Let A ∈M (n,C) be a normal matrix. Then

F (A) = Co (σ (A)) =

n∑i=1

ciλi |n∑i=1

ci = 1, ci ≥ 0, λi ∈ σ(A)

.

The set Co (σ (A)) is called the convex hull of σ(A).

Proof. If A is normal, then there exists P unitary such that P ∗AP = Λ, with Λ a diagonalmatrix with the elements of σ(A) as entries. Then by the previous proposition,

F(A) = F(Λ) =

n∑i=1

|xi|2 λi |n∑i=1

|xi|2 = 1, λi ∈ σ(A)

,

thus taking ci = |xi|2, one sees that this last set is the convex hull Co (σ(A)).

103

104 B. MATRIX ANALYSIS

In particular, for any matrix A ∈ M (n,C), its Hermitian part (A+A∗) /2 and its skew-Hermitian part (A−A∗) /2 are normal matrices. Since

x∗1

2(A+A∗)x =

1

2(x∗Ax+ x∗A∗x) =

1

2

(x∗Ax+ x∗Ax

),

we see that F ((A+A∗) /2) = Rez | z ∈ F(A) and similarly, F ((A−A∗) /2) = Imz | z ∈ F(A).

Corollary 1.5. Let A ∈M (n,R). Then for every x ∈ Rn,

xTAx ≥ minz | z ∈ F

((A+AT

)/2)

= minλ |λ ∈ σ

((A+AT

)/2).

1.1. Numerical radius. Since F(A) is compact, define

r(A) := maxz∈F(A)

|z| (1.2)

The number r(A) is called the numerical radius of A.

Proposition 1.6. The numerical radius satisfies the following inequality.

r(A) ≤ 1

2(‖A‖1 + ‖A‖∞) . (1.3)

Proof. For simplicity, we prove the result with a constant worse by a factor of√

2 and referthe reader to [18, p. 33].

By Gerschgorin’s theorems, for B ∈ M (n,C) with entries bij , if λ ∈ σ(B) then there is i ∈1, . . . , n such that

|λ− bii| ≤n∑j 6=i|bij | ,

so the triangle inequality implies that

|λ| ≤n∑j=1

|bij | .

Therefore, by proposition 1.4, for A ∈M (n,C) with entries aij

maxz∈F(A)

|Rez| = maxz∈F((A+A∗)/2)

|z|

= max

|z| | z =

n∑i=1

ciλi, λi ∈ σ ((A+A∗)/2)

≤ 1

2

n∑i=1

|aij + aji| ≤1

2

n∑i=1

[|aij |+ |aji|] .

Recall that ‖A‖1 = max1≤i≤n∑n

i=1 |aji| and that ‖A‖∞ = max1≤i≤n∑n

i=1 |aij |. Hence

maxz∈F(A)

|Rez| ≤ 1

2(‖A‖1 + ‖A‖∞) .

Similarly,

maxz∈F(A)

|Imz| ≤ 1

2(‖A‖1 + ‖A‖∞) ,

so

r(A) ≤√

maxz∈F(A)

(Rez)2 + maxz∈F(A)

(Imz)2;

and hence

r(A) ≤ 1√2

(‖A‖1 + ‖A‖∞) . (1.4)

In fact a further analysis, detailed in [18], gives

r(A) ≤ 1

2(‖A‖1 + ‖A‖∞) . (1.5)

2. M-MATRICES 105

One can show that r(·) satisfies

r(AB) ≤ 4r(A)r(B) for all A, B ∈M (n,C) .

Theorem 1.7. For any matrix A ∈M (n,C), we have

‖A‖2 ≤ ‖A‖1 + ‖A‖∞ (1.6)

Proof. By the spectral containment property, we have that ρ(A) ≤ r(A), where ρ(A) is thespectral radius of A. Furthermore F(A∗) = z | z ∈ F (A), so r(A∗) = r(A). Therefore

‖A‖2 =√ρ(A∗A) ≤

√r(A∗A)

≤√

4r(A∗)r(A) = 2√r(A)2 = 2r(A).

(1.7)

and therefore by equation (1.5),

‖A‖2 ≤ ‖A‖1 + ‖A‖∞. (1.8)

2. M-matrices

This section serves to quote two results characterising M-matrices. Define the set Zn ⊂M (n,R)by

Zn = A ∈M (n,R) | aij ≤ 0 for j 6= i . (2.1)

If A ∈ Zn is of the form sI −B, with B ≥ 0 in the sense that all entries of B are positive, withs ≥ ρ(B), then we say that A is a M-matrix.

If s > ρ(B), then A is non-singular, since

A−1 =1

s

(I − 1

sB

)−1

=

∞∑i=0

1

si+1Bi

and we furthermore may conclude that A−1 ≥ 0, i.e. all entries of A are positive, in which case onesays that A is inverse positive.

Theorem 2.1. [18, p. 114]. Let A ∈ Zn. The matrix A is a nonsingular M-matrix if and only if

• A+ αI is nonsingular for every α ≥ 0• A has all strictly positive diagonal elements, and there exists a positive diagonal matrixD = Diag(di) such that D−1AD is strictly diagonally dominant, i.e.

aii >n∑j 6=i|aij |

djdi

i ∈ 1, . . . , n . (2.2)

• A is inverse-positive: A−1 ≥ 0.

We will make use of the following proposition which gives an effective way of characterising anM-matrix. The proof given was rediscovered independently, as we did not find this result in [18]or [6].

Proposition 2.2. Let A ∈ Zn be an irreducible matrix, and suppose that A has all strictly positivediagonal elements, and that

aii ≥∑j 6=i|aij | i ∈ 1, . . . , n , (2.3)

and furthermore suppose that there exists k ∈ 1, . . . , n such that

akk >∑j 6=k|akj | . (2.4)

Then A is a nonsingular M-matrix.

106 B. MATRIX ANALYSIS

Proof. We will prove the result by induction on the number p of rows for which

aii =∑j 6=i|aij | i ∈ 1, . . . , n .

For p = 0, the second equivalence in theorem 2.1 shows that A is a M-matrix. For p > 0, note thatby hypothesis p ≤ n− 1. Without loss of generality, we may assume that

aii >∑j 6=i|aij | for i > p,

since interchanging rows and columns leaves the set of irreducible matrices in Zn invariant.

Since A is irreducible, there is r, s ∈ 1, . . . , n such that ars 6= 0, and r ≤ p < s. If there werenot, then after permutation, A would be block upper triangular and hence reducible. Choose d ∈ Rsuch that

1 < d <ass∑

j 6=s |asj |Then define D = Diag (1, . . . , 1/d, . . . , 1) the diagonal matrix with entry 1/d is the s-th row. Thematrix D−1AD satisfies for i 6= s(

D−1AD)ii

= aii ≥n∑

j 6=i,s|aij |+ |ais|

1

d=∑j 6=i

∣∣∣(D−1AD)ij

∣∣∣ ;In particular, for the r-th row, since ars 6= 0,

(D−1AD)rr = arr >∑j 6=r

∣∣∣(D−1AD)rj

∣∣∣ .Furthermore

(D−1AD)ss = ass >∑j 6=s

d |asj | = d∑j 6=s

∣∣∣(D−1AD)sj

∣∣∣ ,and D−1AD is irreducible, with at most p− 1 rows for which

(D−1AD)ii =n∑j 6=i

∣∣∣(D−1AD)ij

∣∣∣ .By the induction hypothesis, D−1AD is a M-matrix, so there exists D a positive diagonal matrixfor which D−1D−1ADD is strictly row diagonally dominant. Therefore taking B = DD, we seethat there exists a strictly positive diagonal matrix B such that B−1AB is strictly row diagonallydominant. This completes the inductive step.

APPENDIX C

Estimates for Finite Element Methods

1. Estimates for finite element methods

Let U be a polyhedral open bounded set in Rn. LetT h

0<h≤1be a family of meshes on U ,

and Vh be defined by (2.3) in chapter 7, Vh,0 = Vh ∩H10 (U).

Proposition 1.1 (Sobolev Embedding Theorem). [1, p. 292]. If k > n/p, then

W k,p(U) →→ Ck−

[np

]−1,β (

U)

where β ∈ [0, 1 +[np

]− n

p ).

We recall that for two normed linear spaces V,W , V →→ W means that V is compactlyembedded in W , i.e. V is continuously embedded in W and every bounded sequence in V has aconvergent subsequence in W . For x ∈ R, [x] denotes the integer part of x.

Proposition 1.2 (Bounds for the Interpolation Error). [8, p. 112]. LetT h

0<h≤1be a non-

degenerate family of meshes on U . Let (K,P,N ) be a reference element, K a n-simplex, P = P1(K)and N consisting of evaluation at the vertices of K. Let Vh be defined by (2.3) and the interpolantbe defined by definition 2.7.

Then there exists C ≥ 0 depending on the reference element (K,P,N ), n, m, p ∈ [1,∞] andρ = infh infT∈T h γT

1, γT the chunkiness parameter of T , such that for 0 ≤ s ≤ m

‖v − Ihv‖W s,p(U) ≤ Chm−s |v|Wm,p(U) for all v ∈Wm,p(U).

Proposition 1.3 (Estimates for Inhomogeneous Dirichlet Problems). [13, p. 125]. Let f ∈ L2(U)and g ∈ C0,1 (∂U). Let a : H1(U) × H1(U) 7→ R be a bilinear form, coercive on H1

0 (U), withcoercivity constant c0. If there exists ug ∈ H1(U)∩C

(U)

such that γ∂U (ug) = g, then there existsa unique solution to

a (uh, v) = (f, v) ∀ v ∈ Vh,0; (1.1a)

γ∂U (uh) = Ih∂Ug. (1.1b)

If furthermore there exists u ∈ H1(U) sufficiently smooth for Ihu to be well defined, that solves

a (u, v) = (f, v) ∀ v ∈ H10 (U); (1.2a)

γ∂U (u) = g, (1.2b)

then setting ‖a‖ = ‖a‖H1(U)×H1(U),

‖u− uh‖H1(U) ≤(

1 +‖a‖c0

)‖u− Ihu‖H1(U).

Proposition 1.4 (Discrete Poincare Inequality). [13, p. 77] and [8, p. 123]. LetT h

0<h≤1be a

quasi-uniform family of meshes on U ⊂ Rn. Then there exists C ≥ 0 independent of h such thatfor all v ∈ Vh

‖v‖L∞(U) ≤

C‖v‖H1(U) n = 1;

C (1 + |log h|) ‖v‖H1(U) n = 2;

Ch−1/2‖v‖H1(U) n = 3.

The case n = 1 follows from the standard Poincare inequality.

1ρ > 0 by non-degeneracy ofT h

0<h≤1

.

107

108 C. ESTIMATES FOR FINITE ELEMENT METHODS

The following is a particular case of the first Strang lemma.

Proposition 1.5 (First Strang Lemma ). [13, p. 95]. Let W be a a Banach space and Z be areflexive Banach space and let Wh ⊂ W and Zh ⊂ Z be finite dimensional subspaces, dimWh =dimZh. Let a ∈ L (W × Z;R) and f ∈ V ∗. Let ah be a bilinear form bounded on W × Zh, withnorm ‖a‖, such that there exists αh > 0 such that

infw∈Wh

supv∈Zh

ah(w, v)

‖w‖W ‖v‖Z≥ αh.

If u solvesa (u, v) = 〈f, v〉 for all v ∈ Z

and if uh solvesah (uh, vh) = 〈f, vh〉 for all vh ∈ Zh,

then the following error estimate holds

‖u− uh‖W ≤ infwh∈Wh

[(1 +‖a‖αh

)‖u− wh‖W +

1

αhsupvh∈Zh

|a (wh, vh)− ah (wh, vh)|‖vh‖Z

]. (1.3)

By adapting the proof of the first Strang lemma to the non-homogeneous Dirichlet problem,one can show the following.

Lemma 1.6. With the setting and hypotheses of proposition 1.3, suppose that

ah ∈ L(H1(U)× Vh,0;R

)satisfies the assumptions of proposition 1.5, with αh = c0 for all h ∈ (0, 1]. If uh ∈ Vh solves

ah (uh, vh) = (f, vh) for all vh ∈ Vh,0, (1.4a)

γ∂U (uh) = Ih∂Ug; (1.4b)

then

‖u− uh‖H1(U) ≤(

1 +‖a‖c0

)‖u− Ihu‖H1(U) +

1

c0sup

vh∈Vh,0

∣∣a (Ihu, vh)− ah (Ihu, vh)∣∣‖vh‖H1(U)

.

As previously, let U ⊂ Rn be a bounded polyhedral open set, n ∈ 1, 2, 3. The following resultis quoted from [8, p. 217], in a form applied to the Poisson problem in the setting of the finiteelements of chapter 7.

Theorem 1.7 (Max-norm estimates). LetT hh∈(0,1]

be a quasi-uniform family of meshes, and

let Vh be defined as in chapter 7. Let a(·, ·) : H1(U)×H10 (U) 7→ R be defined by

a(u, v) =

∫UDu(x) ·Dv(x)dx.

Suppose there exists µ > n and C ≥ 0 such that for all p ∈ (1, µ), for every f ∈ Lp(U), there existsa unique u ∈W 2,p(U) solution to

a(u, v) = (f, v) for all v ∈ H10 (U),

such that‖u‖W 2,p(U) ≤ C‖f‖Lp(U).

Then there exists h0 > 0 and C <∞ such that for all h < h0

‖uh‖W 1,∞(U) ≤ C‖u‖W 1,∞(U),

and if furthermore u ∈W 2,∞(U), then for all h < h0.

‖u− uh‖W 1,∞(U) ≤ Ch‖u‖W 2,∞(U).

APPENDIX D

Matlab Code for the Kushner-Dupuis Method

function [ u]=KDF(N, theta )

%Iain Smears , February 2011

%C a l c u l a t e s the s o l u t i o n u ( x , 0 ) to the Hamilton Jacobi Equation

% − u t+abs ( u x)−1=0

%with D i r i c h l e t data on (−1 ,1) , t imes (0 ,1)

%I m p l i c i t t h e t a method− Kushner−Dupuis Scheme with Newton I t e r a t i o n

%Use N an odd i n t e g e r so t h a t x=0 i s a node o f the mesh

%% Setup

%N i s # o f s p a t i a l DOF

dx=2/(N+1); M=N; dt=1/M; u=zeros (N, 1 ) ;

%Matrix assembly

d=zeros (1 ,2∗N−1); d ( 1 :N)=1:N; d(N+1:2∗N−1)=1:N−1;

up=zeros (1 ,2∗N−1); up ( 1 :N)=1:N; lw=up ; up(N+1:2∗N−1)=2:N;

s=ones (1 ,2∗N−1); s (N+1:2∗N−1)=−1.∗ s (N+1:2∗N−1);R=sparse (d , up , s ) ;

d (N+1:2∗N−1)=2:N; lw (N+1:2∗N−1)=1:N−1;L=sparse (d , lw , s ) ; I=sparse ( 1 :N, 1 :N, 1 ) ;

A1=I+theta .∗ dt . / dx∗R; A2=I+theta .∗ dt . / dx∗L ;

%% Computation wi th Semismooth Newton S o l v e r

%Tota l number o f i t e r a t i o n s per t i m e s t e p permi t t ed

i t =50;

t ic

for i =1:M

%Compute RHS

d1=(I−(1−theta ) . ∗ dt . / dx .∗R)∗u+dt ; d2=(I−(1−theta ) . ∗ dt . / dx .∗L)∗u+dt ;

%Tolerance f o r Newton I t e r a t i o n

eps=1e−10;

%Begin Newton I t e r a t i o n

for j =1: i t

%Construct I t e r a t i o n Matrix f o r Semi−smooth Newton

b=find ( A1∗u−d1 > A2∗u−d2 ) ; G=A2 ; d=d2 ; G(b , 1 :N)=A1(b , 1 :N) ;

d(b)=d1 (b ) ;

%Perform Newton Step

u=G\d ;

%convergence c r i t e r i o n

i f max(abs (max(A1∗u−d1 , A2∗u−d2)))<eps

break ;

end

i f j==i t

disp ( ’ I t e r a t i o n s did not converge ’ ) ;

end

end

end

toc

end

109

Bibliography

[1] Kendall Atkinson and Weimin Han. Theoretical numerical analysis, volume 39 of Texts in Applied Mathematics.Springer, Dordrecht, third edition, 2009. A functional analysis framework.

[2] M. Avellaneda, A. Levy, and A. Paras. Pricing and hedging derivative securities in markets with uncertainvolatilities. Applied Mathematical Finance, 2(2):73–88, 1995.

[3] Guy Barles and Espen Jakobsen. Error bounds for monotone approximation schemes for Hamilton-Jacobi-Bellman equations. SIAM J. Numer. Anal., 43(2):540–558 (electronic), 2005.

[4] Guy Barles and Espen Jakobsen. Error bounds for monotone approximation schemes for parabolic Hamilton-Jacobi-Bellman equations. Math. Comp., 76(260):1861–1893 (electronic), 2007.

[5] Guy Barles and Panagiotis Souganidis. Convergence of approximation schemes for fully nonlinear second orderequations. Asymptotic Anal., 4(3):271–283, 1991.

[6] Abraham Berman and Robert J. Plemmons. Nonnegative matrices in the mathematical sciences, volume 9 ofClassics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,1994. Revised reprint of the 1979 original.

[7] Olivier Bokanowski, Stefania Maroso, and Hasnaa Zidani. Some convergence results for Howard’s algorithm.SIAM J. Numer. Anal., 47(4):3001–3026, 2009.

[8] Susanne C. Brenner and L. Ridgway Scott. The mathematical theory of finite element methods, volume 15 ofTexts in Applied Mathematics. Springer, New York, third edition, 2008.

[9] Erik Burman and Alexandre Ern. Nonlinear diffusion and discrete maximum principle for stabilized Galerkin ap-proximations of the convection–diffusion-reaction equation. Comput. Methods Appl. Mech. Engrg., 191(35):3833–3855, 2002.

[10] Erik Burman and Alexandre Ern. Stabilized Galerkin approximation of convection-diffusion-reaction equations:discrete maximum principle and convergence. Math. Comp., 74(252):1637–1652 (electronic), 2005.

[11] Xiaojun Chen, Zuhair Nashed, and Liqun Qi. Smoothing methods and semismooth methods for nondifferentiableoperator equations. SIAM J. Numer. Anal., 38(4):1200–1216 (electronic), 2000.

[12] Michael G. Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second orderpartial differential equations. Bull. Amer. Math. Soc. (N.S.), 27(1):1–67, 1992.

[13] Alexandre Ern and Jean-Luc Guermond. Theory and practice of finite elements, volume 159 of Applied Mathe-matical Sciences. Springer-Verlag, New York, 2004.

[14] Lawrence C. Evans. An introduction to mathematical optimal control theory. Freely available online at http:

//math.berkeley.edu/~evans/. Version 0.2.[15] Lawrence C. Evans. Partial differential equations, volume 19 of Graduate Studies in Mathematics. American

Mathematical Society, Providence, RI, 1998.[16] Wendell H. Fleming and H. Mete Soner. Controlled Markov processes and viscosity solutions, volume 25 of

Stochastic Modelling and Applied Probability. Springer, New York, second edition, 2006.[17] M. Hintermuller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth Newton method.

SIAM Journal on Optimization, 13:865, 2002.[18] Roger A. Horn and Charles R. Johnson. Topics in matrix analysis. Cambridge University Press, Cambridge,

1991.[19] N. V. Krylov. Nonlinear elliptic and parabolic equations of the second order, volume 7 of Mathematics and its

Applications (Soviet Series). D. Reidel Publishing Co., Dordrecht, 1987. Translated from the Russian by P. L.Buzytsky [P. L. Buzytskiı].

[20] Jean-Michel Lasry and Pierre-Louis Lions. Jeux a champ moyen. i - le cas stationnaire. Comptes Rendus Math-ematique, 343(9):619 – 625, 2006.

[21] Jean-Michel Lasry and Pierre-Louis Lions. Jeux a champ moyen. ii - horizon fini et controle optimal. ComptesRendus Mathematique, 343(10):679 – 684, 2006.

[22] Jean-Michel Lasry and Pierre-Louis Lions. Mean field games. Jpn. J. Math., 2(1):229–260, 2007.[23] Bernt Øksendal. Stochastic differential equations. Universitext. Springer-Verlag, Berlin, sixth edition, 2003. An

introduction with applications.[24] Halsey Royden and Patrick Fitzpatrick. Real Analysis. Prentice Hall, Boston, fourth edition, 2010.[25] Steven Shreve. Stochastic calculus for finance. II. Springer Finance. Springer-Verlag, New York, 2004.

Continuous-time models.[26] Iain Smears. Existence and uniqueness of solutions of discretised Hamilton-Jacobi-Bellman equations with no

maximum principle: an application to the uncertain volatility model in finance. Unpublished EPSRC vacationproject report, September 2009.

[27] Iain Smears. The convergence rate of a semi-smooth Newton method for discretised Hamilton-Jacobi-Bellmanequations. Unpublished note, March 2010.

111

http://math.berkeley.edu/~evans/

http://math.berkeley.edu/~evans/

112 BIBLIOGRAPHY

[28] Endre Suli and David F. Mayers. An introduction to numerical analysis. Cambridge University Press, Cambridge,2003.

[29] Vidar Thomee. Galerkin finite element methods for parabolic problems, volume 25 of Springer Series in Compu-tational Mathematics. Springer-Verlag, Berlin, 1997.

[30] Eberhard Zeidler. Nonlinear functional analysis and its applications. I. Springer-Verlag, New York, 1986. Fixed-point theorems, Translated from the German by Peter R. Wadsack.

Hamilton-Jacobi-Bellman Equations

Documents

Transcript of Hamilton-Jacobi-Bellman Equations