Forward and adjoint sensitivity analysis with continuous explicit Runge–Kutta schemes

19
Forward and adjoint sensitivity analysis with continuous explicit Runge–Kutta schemes q Mihai Alexe * , Adrian Sandu Computational Science Laboratory, Department of Computer Science, Virginia Polytechnic Institute and State University, 2202 Kraft Drive, Blacksburg, VA 24060, USA article info Keywords: Sensitivity analysis Dense output Runge–Kutta pairs Tangent linear models Adjoint models Automatic differentiation abstract We study the numerical solution of tangent linear, first and second order adjoint models with high-order explicit, continuous Runge–Kutta pairs. The approaches currently imple- mented in popular packages such as SUNDIALS or DASPKADJOINT are based on linear mul- tistep methods. For adaptive time integration of nonlinear models, interpolation of the forward model solution is required during the adjoint model simulation. We propose to use the dense output mechanism built in the continuous Runge–Kutta schemes as a highly accurate and cost-efficient interpolation method in the inverse problem run. We imple- ment our approach in a Fortran library called DENSERKS, which is found to compare well to other similar software on a number of test problems. Ó 2008 Elsevier Inc. All rights reserved. 1. Introduction Sensitivity analysis is a research area attracting considerable attention due to the wide range of applicability of its results. The objective of sensitivity analysis is to obtain qualitative and quantitative information about the relationship between changes in the inputs or parameters of a given system and the corresponding changes in the outputs of that system. Param- eter identification [1], chemical kinetics [2], data assimilation [3], optimal control [4], ocean and atmosphere dynamics [5,6], and design optimization [7] are several of the areas where sensitivity information is essential. In this paper, we focus on time-dependent models described by ordinary differential equations (ODEs). The sensitivity analysis framework under con- sideration comprises continuous forward, tangent linear, and adjoint models of Runge–Kutta methods. While the discrete adjoints of explicit Runge–Kutta methods are consistent with the continuous adjoint equations [8–10], and automatic differ- entiation (AD) tools such as TAMC [11], TAF [12], and TAPENADE [13] considerably speed up the adjoint code generation, the code obtained through AD is frequently sub-optimal. Moreover, hand coded modifications of adaptive discrete adjoint codes are normally required in order to guarantee their correctness [14,15]. 1.1. Manuscript details Our paper focuses on the solution of sensitivity ODEs with explicit, continuous Runge–Kutta schemes. Adjoints of nonlin- ear models depend on the original model trajectory. Since adjoint models are integrated backward in time, they will require an approximation of the forward model solution at time points that cannot be known a priori. Some form of interpolation is 0096-3003/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2008.11.035 q This work has been supported by the National Science Foundation through the award NSF CCF-0635194. * Corresponding author. E-mail addresses: [email protected] (M. Alexe), [email protected] (A. Sandu). URL: http://csl.cs.vt.edu (A. Sandu). Applied Mathematics and Computation 208 (2009) 328–346 Contents lists available at ScienceDirect Applied Mathematics and Computation journal homepage: www.elsevier.com/locate/amc

Transcript of Forward and adjoint sensitivity analysis with continuous explicit Runge–Kutta schemes

Applied Mathematics and Computation 208 (2009) 328–346

Contents lists available at ScienceDirect

Applied Mathematics and Computation

journal homepage: www.elsevier .com/ locate /amc

Forward and adjoint sensitivity analysis with continuousexplicit Runge–Kutta schemes q

Mihai Alexe *, Adrian SanduComputational Science Laboratory, Department of Computer Science, Virginia Polytechnic Institute and State University,2202 Kraft Drive, Blacksburg, VA 24060, USA

a r t i c l e i n f o a b s t r a c t

Keywords:Sensitivity analysisDense outputRunge–Kutta pairsTangent linear modelsAdjoint modelsAutomatic differentiation

0096-3003/$ - see front matter � 2008 Elsevier Incdoi:10.1016/j.amc.2008.11.035

q This work has been supported by the National S* Corresponding author.

E-mail addresses: [email protected] (M. Alexe), sURL: http://csl.cs.vt.edu (A. Sandu).

We study the numerical solution of tangent linear, first and second order adjoint modelswith high-order explicit, continuous Runge–Kutta pairs. The approaches currently imple-mented in popular packages such as SUNDIALS or DASPKADJOINT are based on linear mul-tistep methods. For adaptive time integration of nonlinear models, interpolation of theforward model solution is required during the adjoint model simulation. We propose touse the dense output mechanism built in the continuous Runge–Kutta schemes as a highlyaccurate and cost-efficient interpolation method in the inverse problem run. We imple-ment our approach in a Fortran library called DENSERKS, which is found to compare wellto other similar software on a number of test problems.

� 2008 Elsevier Inc. All rights reserved.

1. Introduction

Sensitivity analysis is a research area attracting considerable attention due to the wide range of applicability of its results.The objective of sensitivity analysis is to obtain qualitative and quantitative information about the relationship betweenchanges in the inputs or parameters of a given system and the corresponding changes in the outputs of that system. Param-eter identification [1], chemical kinetics [2], data assimilation [3], optimal control [4], ocean and atmosphere dynamics [5,6],and design optimization [7] are several of the areas where sensitivity information is essential. In this paper, we focus ontime-dependent models described by ordinary differential equations (ODEs). The sensitivity analysis framework under con-sideration comprises continuous forward, tangent linear, and adjoint models of Runge–Kutta methods. While the discreteadjoints of explicit Runge–Kutta methods are consistent with the continuous adjoint equations [8–10], and automatic differ-entiation (AD) tools such as TAMC [11], TAF [12], and TAPENADE [13] considerably speed up the adjoint code generation, thecode obtained through AD is frequently sub-optimal. Moreover, hand coded modifications of adaptive discrete adjoint codesare normally required in order to guarantee their correctness [14,15].

1.1. Manuscript details

Our paper focuses on the solution of sensitivity ODEs with explicit, continuous Runge–Kutta schemes. Adjoints of nonlin-ear models depend on the original model trajectory. Since adjoint models are integrated backward in time, they will requirean approximation of the forward model solution at time points that cannot be known a priori. Some form of interpolation is

. All rights reserved.

cience Foundation through the award NSF CCF-0635194.

[email protected] (A. Sandu).

M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346 329

necessary to provide these approximations. Existing software uses Hermite polynomial interpolation of up to fifth order. Thisis insufficient if one needs a higher order of accuracy for the adjoint solution. However, continuous extensions of Runge–Kut-ta schemes [16–18] have built in interpolants that are as accurate as the numerical schemes themselves. We propose to usethis dense output mechanism to interpolate the original model solution. This approach is found to lead to a highly accurateadjoint solution, with only a small increase in the computational cost of the forward integrator.

Our Fortran library, DENSERKS, implements several well-known continuous Runge–Kutta pairs for forward and adjointsensitivity analysis. A two-level file checkpointing mechanism is used to minimize the solver memory requirements. Thismakes long term adjoint simulations feasible. Also, unlike other publicly available software, DENSERKS can directly be usedfor second order adjoint sensitivity analysis. Second order adjoint variables provide more accurate sensitivity information.They are computed most efficiently through forward-over-adjoint differentiation of the original model code [19]. A numer-ical experiment in Section 6 discusses the computational and numerical advantages of the second order adjoint approachover the finite difference method.

1.2. Related work

A significant effort has been invested in the development of efficient sensitivity solvers, and currently there are severalhigh quality, publicly available implementations. SUNDIALS [20] is a suite of ODE solvers with forward and first order adjointsensitivity analysis capabilities. The CVODES package [21], part of SUNDIALS, is a collection of sensitivity-enabled ODE solv-ers. CVODES users can choose between backward differentiation schemes or Adams–Moulton methods for forward, tangentlinear and adjoint model integrations. Either cubic Hermite interpolation or variable order polynomial interpolation is usedfor approximating the forward model solution during the adjoint problem run [22].

Cao, Li and Petzold designed and implemented software for both the forward and adjoint sensitivity analysis of differen-tial-algebraic equations (DAEs) with index up to 2 [23,24]. Both their DASPK and DASPKADJOINT [24,25] packages use var-iable order backward differentiation formulas to solve the DAE sensitivity systems. Sandu et al. [26] discuss theimplementation of implicit Runge–Kutta and Rosenbrock methods and their discrete and continuous adjoints. Third orderHermite interpolation is used to approximate the original model solutions at the points required in the continuous adjointmodel solvers. Their code is integrated into the Kinetic PreProcessor (KPP) software for solving chemical kinetics [2,27,28].

1.3. Organization

The rest of this paper is organized as follows. Section 2 contains mathematical background on Runge–Kutta processes anddense output schemes, as well as details on the tangent linear and adjoint models employed in sensitivity analysis. Section 3discusses the implementation of continuous Runge–Kutta schemes in DENSERKS. Information about the availability of oursoftware can be found in Section 5. In Section 6, we test our implementations on a set of selected problems and reportthe results. We conclude with a summary in Section 7. The appendix contains information on the use of AD with DENSERKS.

2. Theoretical background

2.1. Runge–Kutta schemes

Consider a system modeled by the following initial value problem, henceforth referred to as the forward model:

_y ¼ Fðt; yðt;pÞ;pÞ; t06 t 6 tF ; yðt0Þ ¼ y0ðpÞ; ð2:1Þ

where yðt; pÞ 2 Rny is the state vector, p 2 Rnp denotes a vector of system parameters, and F : R1þnyþnp ! Rny is a prescribedfunction. We assume that (2.1) has a unique solution y = y(t), and that F is at least twice continuously differentiable withrespect to both y and p, for all t0

6 t 6 tF.Runge–Kutta schemes are a well-known class of numerical methods for solving initial value problems (IVPs) (2.1). This

paper focuses on embedded explicit Runge–Kutta pairs. Let yn � y(tn) be the numerical solution at time tn. An s-stage embed-ded explicit Runge–Kutta pair computes two approximations to y(tn+1) using the formulas:

ynþ1 ¼ yn þ hnXs

j¼1

bjknj ;

bynþ1 ¼ yn þ hnXs

j¼1

bbjknj ;

ð2:2Þ

where

knj ¼ F tn þ hncj; yn þ hn

Xj�1

i¼1

aijkni ; p

!ð2:3Þ

and tn+1 = tn + hn.

330 M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346

Let q be the order of accuracy of the Runge–Kutta pair (2.2). For a sufficiently smooth solution yðt; pÞ 2 Cqþ1ðtÞ, we expectthe numerical solution y(t,yn) starting from the point (tn,yn) to have the same accuracy as the numerical method over a singletime step of length hn, i.e.

ynþ1 � yðtn þ hn; ynÞ ¼ OððhnÞqþ1Þ: ð2:4Þ

Both solution approximations yn+1 and bynþ1 use the same function values (stages) knj . However, they have different orders of

accuracy, depending on the particular choice of method coefficients aij; bj;bbj and cj. Here yn+1 is accurate of order q and is

used to continue the time integration. The second approximation, bynþ1, has order of accuracy less than q and helps to esti-mate the local truncation error of the scheme as

enþ1 ¼ ynþ1 � bynþ1: ð2:5Þ

This error estimate is used to automatically control the integration time step size. Hairer et al. give further details on theerror control mechanism for Runge–Kutta integrators in Section II.4 of [17].

2.2. Dense output for Runge–Kutta pairs

In many applications one needs the approximate solution at certain prescribed time points; it is highly inefficient to forcethe Runge–Kutta routine to compute those approximations, since this would impose unnecessary constraints on the size ofthe integration time steps. This situation motivated the construction of dense output formulas [17]. Given a numerical solu-tion yn, the dense output formulas yield numerical approximations to the exact solution y(tn + hhn), where h usually satisfies0 6 h 6 1. Extrapolations with h > 1 are possible. However, in this case the approximations are often less accurate, and willnot be considered further.

For most s-stage high-order Runge–Kutta schemes, one needs to append s* � s extra stages in order to accommodate thedense output interpolant. Thus, the performance penalty incurred by the use of dense output formulas is equal to the cost ofa few extra function evaluations per time step.

Following Hairer et al. [17], we consider the time interval [tn, tn+1] (with tn assumed to be sufficiently far from the initialtime t0), and denote by z(x) the solution of the local IVP

_z ¼ Fðt; z;pÞ; tn6 t 6 tnþ1;

zðtnÞ ¼ yn:ð2:6Þ

A q*th order dense output formula for (2.2) has the form

uðhÞ ¼ yn þ hnXs�

j¼1

~bjðhÞknj ; ð2:7Þ

where the knj are defined in (2.3). The coefficients of the interpolation weights ~bjðhÞ, and the additional stage coefficients aij

(with i > s) are determined such that

uðhÞ � zðtn þ hnhÞ ¼ OððhÞq�þ1Þ; ð2:8Þ

where h = max16i<n hi.Then the error of the dense output formula can be written as

uðhÞ � yðtn þ hnhÞ ¼ ½uðhÞ � zðtn þ hnhÞ�|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}OððhÞq�þ1Þ

þ ½zðtn þ hnhÞ � yðtn þ hnhÞ�|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}OððhÞqÞ

: ð2:9Þ

The first term (u � z) in the right hand side of (2.9) is the interpolation error, and has magnitude OððhÞq�þ1Þ. The second term

(z � y) denotes the global error of the method, therefore is of order OððhÞqÞ. Thus, to guarantee an order-q accurate denseoutput approximation to y(tn + hhn), it suffices to require that

q� P q� 1: ð2:10Þ

For q 6 4, cubic Hermite interpolation is sufficiently accurate. However, for larger values of q, performing polynomial inter-polation while preserving the number of stages becomes an increasingly cumbersome process, and the quality of the inter-polated solution depends on the choice of interpolation points. Continuous Runge–Kutta schemes allow for an efficientinterpolation, with only a modest increase in the computational cost coming from the s* � s additional stages incorporatedin (2.2). Since the accuracy constraints on the dense output coefficients usually allow for one or more degrees of freedom,one selects the coefficients of the polynomials ~bjðhÞ such that a certain error norm or cost function is minimized (see, e.g.[16]). Various dense output methods for constructing high-order interpolants have been proposed, notably by Sharp and Ver-ner [18], based on the bootstrapping scheme of Verner [29].

Finally, we note that the formulation of the tangent linear and adjoint ODEs described in the next section is evidentlyindependent of the numerical method used to solve those equations. All that is needed is that the numerical method be fittedwith a continuous extension, to allow an efficient on the fly interpolation of the forward trajectory during the adjoint model

M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346 331

integration. Rosenbrock methods, extrapolation methods, as well as implicit Runge–Kutta methods, support continu-ous extensions [30–32]. Thus, the sensitivity analysis framework described herein can be seamlessly extended to suchmethods.

2.3. First order sensitivity analysis

We are interested in solving the following optimal control problem:

minp

GðpÞ; ð2:11Þ

subject to (2.1), where the functional G(p) is defined as

GðpÞ ¼Z tF

t0gðt; yðt;pÞ; pÞdt: ð2:12Þ

Here g : Rnyþ1 � Rnp ! R is a given real-valued function. Most optimization algorithms require some type of derivativeinformation to build successive approximations to the optimal solution. If first order derivatives are required, we computethe gradient

rpG ¼ @G@p

: ð2:13Þ

It has been shown [21,23,28] that (2.13) can be computed from the solutions si(t) = @y/@pi(t) to the tangent linear model(TLM):

_si ¼ Fyðt; yÞsi þ Fpi; t0

6 t 6 tF ;

siðt0Þ ¼ @y0

@pi; i ¼ 1 . . . np:

ð2:14Þ

The gradient is computed component-wise by solving the following quadrature equations together with (2.14):

_fi ¼ gTysi þ gpi

;

fiðt0Þ ¼ 0; i ¼ 1 . . . np:ð2:15Þ

The ith quadrature solution equals the ith component of the gradient:

@G@pi¼ fiðtFÞ: ð2:16Þ

Note that we have to solve (2.14) and (2.15) once per gradient component. The two models are usually integrated as acoupled system of np + 1 equations. Since ð@G=@pÞ 2 Rnp ;np tangent linear and quadrature solutions are needed to obtainthe gradient vector. Hence, the forward sensitivity method becomes impractical for large np.

One way to get around this increase with np in cost of computing rpG is to use the adjoint equation for (2.1) [19,20,23].The adjoint problem can be derived as follows. We introduce the Lagrange multipliers k 2 Rny and define the extended costfunctional

bGðpÞ ¼ GðpÞ �Z tF

t0kTð _y� Fðt; y;pÞÞdt: ð2:17Þ

From (2.1), bGðpÞ ¼ GðpÞ. It follows that the sensitivity of bG with respect to the parameters p is

@bG@p

!T

¼ @G@p

� �T

¼Z tF

t0ðgT

y yp þ gTpÞ �

Z tF

t0kTð�Fp � Fyyp þ _ypÞdt: ð2:18Þ

Integration by parts leads to:

@G@p

� �T

¼Z tF

t0ðgT

p þ kT FpÞdt �Z tF

t0ð�gT

y � kT Fy � _kTÞypdt � ðkT ypÞjtF

t0 : ð2:19Þ

To avoid the computation of the forward sensitivities yp(t) at t > t0, one defines the first order adjoint variable k 2 Rny as thesolution of the first order adjoint model, described by the following final value problem:

_k ¼ �FTyðt; yÞk� gy; tF P t P t0;

kðtFÞ ¼ 0:ð2:20Þ

332 M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346

Then,

@G@p

� �T

¼Z tF

t0ðgT

p þ kT FpÞdt þ ðkT ypÞjt¼t0 ; ð2:21Þ

where the integral can be evaluated concurrently with (2.20) using the solution of a quadrature ODE:

_wT ¼ �gTp � kT Fp; tF P t P t0;

wTðtFÞ ¼ 0:ð2:22Þ

Hence,

@G@p

� �T

¼ wTðt0Þ þ kTðt0Þypðt0Þ: ð2:23Þ

Eq. (2.20) and (2.22) are attractive since they give the entire gradient vector, at the expense of only a single first order adjointand quadrature solve. However, the adjoint equation depends on the forward model trajectory through the Jacobian Fy(t,y).Moreover, (2.20) is integrated backward in time from tF to t0. Thus, solving for k(t0) and w(t0) requires solving the forwardmodel first, with a continuous Runge–Kutta method. The dense output interpolant is given by (2.7). The polynomials ~bjðhÞ aregiven by the continuous extension. To build u(h) on the fly during the backward run, the time steps hn, the correspondingtime points tn, and the interpolation weights kn

j all need to be stored. Then, in the backward time integration, (2.20) and(2.22) are solved simultaneously as a system of ny + np differential equations, using the interpolated forward solutionu(h) � y(tn + hnh) when needed.

2.4. Second order adjoint sensitivity analysis

In this section, we obtain second order derivative information for the cost function (2.12). For large-scale models, whereny and np can reach a million or more, computing the full Hessian is not computationally feasible. Instead, one calculatesHessian-vector products:

@2G@p2

!� dp; ð2:24Þ

where dp 2 Rnp is some perturbation in the (time-independent) parameters. A forward differentiation of (2.18) results in thesecond order adjoint final value problem [19,33]:

_r ¼ �FTyr� ðFyy � ðypdpÞÞTk� ðFyp � dpÞTk� gyyðypdpÞ � gypdp; tF P t P t0;

rðtFÞ ¼ 0;ð2:25Þ

where r 2 Rny is the second order adjoint variable,

Fyy ¼@2Fi

@yj@yk

!i;j;k¼1...ny

ð2:26Þ

and the � symbol denotes the tensor product

Fyy � ðypdpÞ ¼Xny

k¼1

ðFyyÞi;j;kðypdpÞk

!i;j¼1...ny

: ð2:27Þ

Fyp, Fpp, Fpy are defined by similar formulas. Using the second order adjoint model trajectory r(t), we can compute the Hes-sian-vector product for any given vector dp 2 Rp as:

@2G@p2

!� dp ¼

Z tF

t0ðgppdpþ gpyðypdpÞ þ FT

prþ ðFpp � dpÞTkþ ðFpy � ðypdpÞÞTkÞdt þ ððypp � dpÞTkþ yTprÞjt¼t0 : ð2:28Þ

The integral term in (2.28) can be evaluated concurrently with (2.25) through a set of np quadrature equations:

_v ¼ �FTpr� ðFpp � dpÞTk� ðFpy � ðypdpÞÞTk� gppdp� gpyðypdpÞ; tF P t P t0;

vðtFÞ ¼ 0:ð2:29Þ

It follows that

@2G@p2

!� dp ¼ vðt0Þ þ ððypp � dpÞTkþ yT

prÞjt¼t0 : ð2:30Þ

We note that a TLM integration starting from s(t0) = yp(t0) � dp is necessary to obtain the yp(t)dp term in (2.25) [23,33].

M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346 333

One can generalize this technique to derive the tangent linear and adjoint models for pointwise functionals [21,23]:

Table 1ExplicitsecondRunge–

Runge–

RK2 (FeRK3RK4 (‘‘3RK5RK6 (DRK8 (D

G ¼ gðtF ; yðtFÞ; pÞ: ð2:31Þ

Similarly, a simple change of variables [33] in the forward model leads to the differential equations used for calculating firstand second order sensitivities with respect to the initial conditions y0.

2.5. Using dense output in adjoint sensitivity analysis

For general nonlinear models (2.1), the first order adjoint variable k(t), and the second order adjoint solution r(t) dependon the forward and tangent linear variables y(t) and s(t). Moreover, adjoint models are final value problems integrated back-ward in time, and the time steps in the adjoint simulation are adapted independently from the forward integration. Thus it isnecessary to approximate the forward solution y(t) at a set of time points that cannot be known a priori. To do this, we usethe continuous extensions of the Runge–Kutta pairs. Dense output allows for a cost-effective and accurate interpolation. Thisis confirmed by the numerical experiments described in Section 6. The computational overhead of dense output is equal to asmall number of right hand side forward or tangent linear model evaluations per time step.

3. Implementation details

3.1. Runge–Kutta implementations

DENSERKS implements several explicit Runge–Kutta pairs, listed in Table 1. It is important to note that the user needs toemploy the same Runge–Kutta process for both the forward and adjoint mode integrations when solving (2.20) and (2.25).This is required by the fact that the adjoint code uses the interpolation weights kn

j computed by forward or tangent linearcode when it builds the interpolant u(h) (2.7). Thus, the stages kn

j corresponding to the same Runge–Kutta scheme are re-quired if the adjoint solution is to be fully accurate (i.e. have the same accuracy as the Runge–Kutta pair used in the adjointmodel integration). The Butcher tableaus and the dense output coefficients corresponding to the Runge–Kutta schemes inTable 1 can be found in [16,17,34–36]. The DENSERKS adjoint model integrators can use either cubic/quintic Hermite poly-nomial interpolation or dense output for forward trajectory recomputations. The interpolation options are dependent on theparticular choice of Runge–Kutta pair.

3.2. Checkpointing

For nonlinear models, the adjoint equations (2.20) and (2.25) are dependent on the forward and tangent linear modelsolutions y(t) and s(t) = ypdp. Moreover, since both adjoint equations are final value problems, they cannot be integratedalongside (2.1) and (2.14). The error estimators and time step controllers for the adjoint problems are independent fromthose used in the forward integrations. Thus, (2.20) and (2.25) will require an approximation to y(t) and s(t) at time pointsthat can not be determined a priori. Continuous Runge–Kutta schemes can, as explained above, provide the interpolated val-ues. However, to construct the forward solution interpolants on the fly the adjoint model run, all the interpolation weightsneed to be stored in memory during the forward model simulation. This can be prohibitive, especially for long timeintegrations.

To mitigate the memory storage problems that may arise in an adjoint model integration, DENSERKS employs a two-levelcheckpointing strategy [5,20,37]. The checkpointing algorithm can be briefly explained as follows. We perform one forwardmodel simulation from t0 to tF. During this integration, DENSERKS writes the integrator state to a disk file every Nd timesteps, and discards all other solution information. The library user can control the value of Nd. The forward model run yieldsNc state snapshots (also called file checkpoints) that will be reused in the adjoint model integration. The informationcontained in a checkpoint must be sufficient to allow a hot restart of the forward integration from the time point at whichthe checkpoint was written. This means that the forward model trajectory computed after a restart is identical to the onecalculated during the initial simulation, when all the checkpoints were written.

Runge–Kutta schemes implemented by DENSERKS: RK2 – Fehlberg-type Runge–Kutta method of order 2(3); RK3 – third order Runge–Kutta withorder error control; RK4 – fourth order Runge–Kutta method with third order error control built on the classic ‘‘3/8 rule”; RK5 – DOPRI5(4); RK6 –Kutta pair built on top of RK6(5)9FM; RK8 – DOPRI8(6).

Kutta method Order of accuracy Error control order Interpolation method

hlberg) 2 3 Hermite (third/fifth order)3 2 Hermite (third/fifth order)

/8-rule”) 4 3 Hermite (third/fifth order)5 4 Hermite (third/fifth order), dense output (fourth order)

OPRI5) 6 5 Hermite (third/fifth order), dense output (sixth order)OPRI8) 8 6 Hermite (third/fifth order), dense output (seventh order)

334 M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346

During the adjoint model run, DENSERKS divides the forward problem into Nc smaller problems, each solvable in at mostNd time steps. To do so, the software automatically reads the file checkpoints in reverse order. Consider any two consecutivefile checkpoints corresponding to time points ta and tb (ta < tb). First, DENSERKS recomputes the forward trajectory from ta totb, using the checkpoint written at ta as the initial data. The interpolation weights is stored in the working memory. Since theforward subproblem can be solved in at most Nd steps, the memory requirements are lowered significantly. The user can setNd according to the amount of available memory. Then, DENSERKS integrates the adjoint model from tb to ta, making use ofthe interpolated forward trajectory when needed.

The same strategy is used for the TLM. The checkpointing mechanism lowers the memory storage requirements consid-erably, at the expense of additional recomputations of the forward model or TLM trajectories. This makes long time adjointintegrations computationally feasible.

Let us denote the computational cost of integrating (2.1) from t0 to tF (a complete forward model integration) by CFWD. LetCADJ be the cost of solving (2.20) backward in time from tF to t0, assuming the forward model state is available at all requiredtime points (i.e. all the necessary interpolation weights are stored in memory). Finally, let CQAD be the computational cost ofsolving (2.22) for w(t0). Then we have the following bounds for CBKWD, the cost of computing k0 � k(t0) given only y(t0) (i.e. abackward problem integration with two-level checkpointing):

CFWD þ CADJ þ CQAD 6 CBKWD 6 2� CFWD þ CADJ þ CQAD: ð3:1Þ

The upper bound in (3.1) follows from the observation that we need to integrate (2.1) at most two times: once when writingthe file checkpoints, and a second time during the integration of (2.20), to obtain the interpolated forward solution at thetime points needed by the adjoint integrator. The second inequality in (3.1) becomes an equality when the last step inthe forward model integration is written as a file checkpoint.

The computational cost CBKWD2 of a second order adjoint solve ((2.1), (2.14), (2.25) and (2.29)), can be similarly bounded as

CL 6 CBKWD2 6 CH; ð3:2Þ

where

CL ¼ CFWD þ CTLM þ CADJ þ CSOA þ CQSOA;

CH ¼ 2� CFWD þ 2� CTLM þ CADJ þ CSOA þ CQSOA:ð3:3Þ

Here, CTLM; CSOA and CQSOA denote the computational cost incurred when integrating (2.14), (2.25) and (2.29), respectively.

3.3. Using automatic differentiation to obtain derivative information

The Jacobian–vector and Hessian–vector products in (2.14), (2.20) and (2.25) can be obtained both accurately and effi-ciently through automatic differentiation [38]. The AD-generated code for the right hand sides of (2.14), (2.20) and (2.25) isthen given to the DENSERKS integrators. Further details on AD and its use with DENSERKS are given in Appendix of this paper.

3.4. Code organization and usage

Table 2 lists the integrator subroutines in DENSERKS, together with short descriptions of their purpose and usage. We pro-vide the code for the test problems described in Section 6 in the DENSERKS package. Note that backward time integration inthe forward mode is not supported, nor is forward time integration in the adjoint mode. Thus, all integrators in Table 2 re-quire that t0

6 tF.We now describe a step by step procedure describing how to enhance an existing embedded Runge–Kutta code with sen-

sitivity analysis capabilities. As before, we let q be the order of accuracy of the Runge–Kutta pair.

1. Fit the Runge–Kutta pair with a dense output interpolant of order q*. This may require an increase in the number of stagesover the original scheme. For smooth y and s, the interpolated forward and TLM trajectories will be qth order accurate ifq 6 q* + 1, as shown in Section 2. If q > q* + 1, one should expect a loss of accuracy in the adjoint model solution, due to lessaccurate forward interpolation data.

2. Implement a checkpointing mechanism for the Runge–Kutta pair to reduce the memory requirements of an adjointintegration.

3. Write a Runge–Kutta integrator for the first and second order adjoint model based on the original Runge–Kutta pair. Thissolver should be able to use the checkpoint information to interpolate the forward or tangent linear model trajectorieswhen needed.

4. Build the right hand sides of the forward and adjoint sensitivity equations (2.1), (2.14), (2.20) and (2.25). These subrou-tines can be either hand coded, or generated using automatic differentiation (see Appendix for more details). When com-puting derivatives of nontrivial cost functions, we also need to code the corresponding right hand sides of the quadratureEqs. (2.22) and (2.29).

5. Set initial or final values for all the state variables: y(t0), s(t0), f(t0), k(tF), w(tF), r(tF), v(tF).6. Solve Eqs. (2.1)–(2.29) for the desired sensitivity information using the appropriate Runge–Kutta solvers.

Table 2DENSERKS integrator subroutines.

Integrator Description

RKINT Forward model integrator. Writes forward model trajectory data to the memory buffers and file checkpoints, if the memory buffering andfile checkpointing mechanisms are enabled. This routine is user-callable

RKINT_TLM Tangent linear model integrator. Simultaneously integrates the forward model (the resulting system has size 2 � ny). Writes forward andtangent linear model trajectory data to the memory buffers and file checkpoints, if the memory buffering and file checkpointingmechanisms are enabled. This routine is user-callable

RKINT_ADJ First order adjoint model integrator. The routine is used to integrate the first order adjoint model and the associated quadrature equations(for a total backward problem size of ny + np) between two given consecutive checkpoints. RKINT_ADJ reads forward model trajectory datafrom the memory buffers, implicitly assuming they hold valid forward integration information. The user should not call this subroutinedirectly. Instead, all first order adjoint integrations should be done via calls to the RKINT_ADJDR wrapper subroutine

RKINT_SOA Second order adjoint model integrator. Simultaneously integrates the first and second order adjoint model equations and the quadratureequations (the resulting system has size 2 � ny + np). Reads data from the memory buffers only. The user should not call this subroutinedirectly. Instead, all first order adjoint integrations should be done via calls to the RKINT_SOADR wrapper subroutine

RKINT_ADJDR Wrapper for RKINT_ADJ. Handles all checkpoint reads and calls RKINT to calculate the forward model trajectory, should suchrecomputations be required. Calls RKINT_ADJ for first order adjoint model integration between consecutive checkpoints. This routine isuser-callable

RKINT_SOADR Wrapper for RKINT_SOA. Handles all checkpoint reads and calls RKINT_TLM to calculate the tangent linear model trajectory, should suchrecomputations be required. Calls RKINT_SOA for second order adjoint model integration between consecutive checkpoints. This routine isuser-callable

M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346 335

This procedure can be extended to, e.g, Rosenbrock, implicit Runge–Kutta, and extrapolation methods, since they all sup-port continuous extensions [30–32]. Only the actual construction of the dense output interpolant is different; steps 2–6 areidentical to the ones described here.

To use the DENSERKS subroutines in Table 2, skip directly to step 4 in the procedure listed above. A first order adjointproblem can be solved by the following steps:

!! define the size of the forward and adjoint model state vectorsinteger ny=. . .

!! define the size of the quadrature system state (number of parameters)integer np=. . .

!! set the initial and final integration timedouble precision t0=. . ., tF=. . .

!! define the forward and adjoint model state vectorsdouble precision y(ny), ady(nx)

!! w is the quadrature system state vectordouble precision w(np)

!! write a file checkpoint every Nd time stepsinteger Nd=. . .

!! allocate the memory buffers and initialize the checkpoint files, if neededcall rk_AllocateTapes(. . .)

!! initialize y, and other integrator parameters as neededy(t0)=. . .

!! integrate the forward model!! FWD_RHS=subroutine computing the right hand side of (2.1) at given (t,y(t))

!! if we disable file checkpointing, the values of Nd and Nc are ignored by the algorithmcall RKINT(nx,y,RHS,t0,tF,Nd,Nc,. . .)

!! upon return from RKINT, Nc (integer)=the total number of file checkpoints written to disk!! set final values for ady and w

ady(tF)=. . .

w(tF)=. . .

!! integrate the first order adjoint model!! ADJ_RHS=subroutine computing the right hand side of (2.20) at given (t,y(t),ady(t))

!! QUAD_RHS=subroutine computing the right hand side of (2.22) at given (t,y(t),ady(t))

call RKINT_ADJDR(nx,ady,ADJ_RHS,. . .,np,w,QUAD_RHS,. . .,RHS,. . .,Nc, . . .)

!! upon exit from RKINT_ADJDR:!! ady � k(t0) in (2.20)!! w � w(t0) in (2.22)!! close the checkpoint files and deallocate the memory bufferscall rk_DeallocateTapes

336 M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346

Second order adjoint models are handled similarly. In the case of multiple integrations of the same first or second orderadjoint system (with different final conditions), it is not recommended to call RKINT_ADJDR or RKINT_SOADR multipletimes. Instead, RKINT_ADJDR_M and RKINT_SOADR_M allow for significant performance gains, due to the reuse of forwardand tangent linear model trajectory data over all adjoint model integrations. This lowers the cost of M first order adjointmodel integrations, CM BKWD, from M � CBKWD in (3.1) to

Table 3High-leavailabtwo sen

NumerOrders1–5 (BDInterpoInterpoBuilt inSecond

CFWD þM � CADJ 6 CM BKWD 6 2� CFWD þM � CADJ: ð3:4Þ

A similar reduction in computational cost is noticed in the case of M second order adjoint model integrations with RKINT_-SOADR_M. The corresponding cost bounds CL and CH in (3.3) become

CLM ¼ CFWD þ CTLM þM � CADJ þM � CSOA;

CHM ¼ 2� CFWD þ 2� CTLM þM � CADJ þM � CSOA:ð3:5Þ

Thus,

CLM 6 CM BKWD2 6 CHM: ð3:6Þ

4. Comparison with other related software

Table 3 provides a quick overview and comparison between the main features of DENSERKS, CVODES, andDASPKADJOINT.

5. Code availability

The DENSERKS source code is released under a GNU open source license and is available for download at: http://people.-cs.vt.edu/~asandu/Software/DENSERKS.

6. Numerical experiments

All numerical experiments were performed with double precision Fortran on a 3 GHz Intel Pentium 4 workstation run-ning Fedora Core 6 Linux.

6.1. The Arenstorf orbit

The Arenstorf orbit IVP is usually given as a set of two second order ODEs, which can readily be transformed into a firstorder initial value problem [17]:

_y1 ¼ y3;

_y2 ¼ y4;

_y3 ¼ y1 þ 2y4 � bl y1 þ l½ðy1 þ lÞ2 þ y2

2�3=2 � l y1 � bl

½ðy1 � blÞ2 þ y22�

3=2 ;

_y4 ¼ y2 � 2y3 � bl y2

½ðy1 þ lÞ2 þ y22�

3=2 � l y2

½ðy1 � blÞ2 þ y22�

3=2 ;

ð6:1Þ

vel comparison of DENSERKS against CVODES v2.5.0 [22] and DASPKADJOINT [25]. DASPKADJOINT implements sensitivity analysis for DAEs (notle in DENSERKS or SUNDIALS). Note that DENSERKS has built in support for second order sensitivity analysis, a feature not directly present in the othersitivity solvers.

DENSERKS CVODES DASPKADJOINT

ical methods implemented Explicit Runge–Kutta Adams–Moulton and BDF BDFof accuracy 2–8 1–12 (Adams–Moulton)F) (variable order) 1–5

lation method Dense output (built in) Hermite, variable order polynomial method Hermitelation order 3–8 3/5 3sensitivity analysis First order forward/adjointorder (forward-over-adjoint) First order forward/adjoint First order forward/adjoint

M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346 337

where

Fig. 1.DOPRI8the forw

l ¼ 0:012277471;bl ¼ 1� l:

The initial conditions are

y1ðt0Þ ¼ 0:994;

y2ðt0Þ ¼ 0;

y3ðt0Þ ¼ 0;

y4ðt0Þ � �2:0016:

ð6:2Þ

We compute the sensitivities of the model trajectory with respect to variations in the initial value of the first solutioncomponent y1(t0). Thus, the cost functionals chosen for this example are

Giðy1ðt0ÞÞ ¼ yiðtFÞ; i ¼ 1 . . . 4: ð6:3Þ

We consider one-tenth of a full orbit period as our forward model integration window: t0 = 0 and tF � 1.7065. Figs. 1 and 2illustrate the absolute errors of the forward, tangent linear and first order adjoint model solvers using the DOPRI5(4) andDOPRI8(6) Runge–Kutta numerical schemes with fourth order (q* = 4) and seventh order (q* = 4) dense output. Since the nec-essary condition q 6 q* + 1 is satisfied for both Runge–Kutta pairs, the corresponding tangent linear and adjoint model solu-tions are fifth and eighth order accurate, respectively. This shows that the dense output interpolants make it possible tocompute a high-quality adjoint solution that has the same order of accuracy as the underlying Runge–Kutta method. Thereference solution was obtained by a TLM integration with very tight absolute and relative tolerances (ATOL=RTOL=10�14),using the DOPRI8(53) code in [17].

120 200 320 530 875 1450 238010-15

10-10

10-5

100

FWD RHS evaluations (FE)

Rel

ativ

e Er

ror

y1y2y3y4

FE-5

180 250 355 500 700 980 138010-15

10-10

10-5

100

FWD RHS Evaluations (FE)

Rel

ativ

e Er

ror

y1y2y3y4

FE-8

224 390 670 1150 2000 3450 595010-15

10-10

10-5

100

TLM RHS Evaluations (FE)

Rel

ativ

e Er

ror

τ11τ21τ31τ41

FE-5

195 270 380 530 740 1030 144010-15

10-10

10-5

100

TLM RHS Evaluations (FE)

Rel

ativ

e Er

ror

τ11τ21τ31τ41

FE-8

Arenstorf orbit problem: Relative errors in y(t) and the forward sensitivities si1 ¼ ð@yi=@y01ÞðtFÞ computed with the DOPRI5(4) (a and c) and

(6) (b and d) pairs. The integration interval length is equal to one tenth of a full orbit period: t0 = 0, tF � 1.707. Since the solution y(t) is smooth, bothard (a and b) and the tangent linear model solutions (c and d) have the same order of accuracy as the underlying Runge–Kutta pair.

338 M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346

6.2. The Van der Pol oscillator

6.2.1. First order sensitivity analysisWe now consider the Van der Pol oscillator [30]. The first order ODE system reads:

Fig. 2.DOPRI8is equaq. We rconstanadjointintegra

_y1 ¼ y2;

_y2 ¼1�ðy2 � y1 � y2

1y2Þð6:4Þ

with � = 10�2 and initial conditions

y1ðt0Þ ¼ 2;

y2ðt0Þ ¼ 0:ð6:5Þ

Consider the cost function

Gðyðt0ÞÞ ¼ y1ðtFÞ: ð6:6Þ

The corresponding adjoint model can be computed using the Lagrangian method described in (2.17)–(2.21), where G can(informally) be written as

245 420 720 1230 2100 3600 616010-15

10-10

10-5

100

ADJ RHS evaluations (FE)

Rel

ativ

e Er

ror

λ1λ2λ3λ4

FE-5

240 370 570 875 1350 2080 320010-15

10-10

10-5

100

ADJ RHS evaluations (FE)

Rel

ativ

e Er

ror

λ1λ2λ3λ4

M-8

189 340 600 1085 1950 3480 623010-15

10-10

10-5

100

ADJ RHS evaluations (FE)

Rel

ativ

e Er

ror

λ1λ2λ3λ4

FE-5

560 800 1140 1600 2300 3260 464010-15

10-10

10-5

100

ADJ RHS evaluations (FE)

Rel

ativ

e Er

ror

λ1λ2λ3λ4

FE-8

The Arenstorf orbit problem: Work-precision diagrams for the DOPRI5(4) pair (a and c) with fourth order dense output (q* = 4 in (2.8)), and the(6) scheme (b and d) with seventh order dense output (q* = 7 in (2.8)). The adjoint model solutions are kj � ð@yF

1=@y0j Þ. The integration interval length

l to one tenth of a full orbit: t0 = 0, tF � 1.707. As expected, since q = q* + 1 for both Runge–Kutta pairs, the numerical solutions have order of accuracyan two series of tests. In the first, we integrated the adjoint model with variable tolerances, while keeping the forward integration tolerancest ((a) and (c) illustrate this case, with ATOLfwd = RTOLfwd = 10�12). Second, we varied both the forward and adjoint model tolerances during severalmodel test runs ((b) and (d) show the results of several adjoint model integrations where the prescribed absolute and relative forward and adjoint

tor tolerances are equal and vary between 10�4 and 10�12).

Fig. 3.ATOL =plots i(ATOLfw

of the a

M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346 339

Gðyðt0ÞÞ ¼Z tF

t0dðt � tFÞy1ðtÞdt ð6:7Þ

with d(t � tF) is the Dirac delta centered at time tF.

0 0.5 1 1.5 2-3

-2

-1

0

1

2

3

Time

y1(t)

0 0.5 1 1.5 2-3

-2

-1

0

1

2

3

Time

λ1

0 0.5 1 1.5 20

1

2

3

4

5x 10

-3

Time

Tim

e st

ep le

ngth

0 0.5 1 1.5 20

1

2

3

4

5x 10

-3

Time

Tim

e st

ep le

ngth

2400 3700 5700 8800 13500 2080010

-15

10-10

10-5

100

ADJ RHS evaluations (FE)

Abs

olut

e er

ror

λ1

λ2

FE-8

3040 4500 6560 9650 14150 2080010

-15

10-10

10-5

100

ADJ RHS evaluations (FE)

Abs

olut

e er

ror

λ1

λ2

FE-8

The Van der Pol oscillator (6.4): t0 = 0, tF = 2, � = 10�2. Solutions obtained using the DOPRI8(6) pair for the forward (a) and adjoint model (b). Here10�12 and RTOL = 10�10 for both model runs. The adaptive time steps taken during the two model integrations are plotted in (c) and (d). The bottomllustrate the absolute errors of the adjoint solutions ki ¼ ð@yF

1=@y0i Þ obtained using a fixed tolerance for the forward model run

d = RTOLfwd = 10�12 in (e)) and, alternatively, variable tolerance values for both the forward and adjoint integrators (in (f)). The order of accuracydjoint numerical solution is the same as that of the Runge–Kutta pair.

340 M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346

Hence, the adjoint final value problem for (6.4)–(6.6) has the following form:

Fig. 4.solutionsimultawe enaestimat

_k1 ¼1�ð2y1y2 þ 1Þk2;

_k2 ¼ �k1 þ1�ðy2

1 � 1Þk2;

k1ðtFÞ ¼ 1;

k2ðtFÞ ¼ 0:

ð6:8Þ

The time integration interval spans from t0 = 0 to tF = 2. Note from Fig. 3c and d that both forward and adjoint integrators useadaptive time stepping, and step size control is performed independently during the forward and the reverse integrations.

Fig. 3e and f shows how the forward integration errors affect the accuracy of the adjoint solution. It can be seen that forthis particular problem, we obtain an eighth order decrease in the error of the adjoint solution even when interpolating datafrom a less accurate forward integration. This error behavior reflects the order of the Runge–Kutta method. Thus, using theadaptive forward model integrator with looser tolerances still results in a fully accurate adjoint model solution. As a generalrule of thumb, we recommend that the forward tolerances be set equal to or slightly lower than those used in the adjointsimulation, to avoid losing accuracy in the adjoint solution.

Fig. 4 compares the performance of DENSERKS and CVODES for the Van der Pol problem. The results indicate that the high-order Runge–Kutta pairs implemented in DENSERKS outperform the multistep methods implemented in SUNDIALS whensolving the forward and tangent linear models, requiring considerably less right hand side evaluations to reach a prescribedsolution accuracy. When integrating the adjoint of (6.4), CVODES fares better than DOPRI5(4), but is still surpassed in per-formance by the DOPRI8(6) implementation. These results indicate that the Runge–Kutta methods and dense output-basedinterpolants implemented in DENSERKS can be competitive with other high quality implementations, yielding very accuratesolutions with relatively low computational efforts.

900 1360 2050 3100 4700 710010

-11

10-8

10-5

10-2

FWD RHS evaluations

RM

S e

rror

(F

WD

)

CVODESDOPRI5DOPRI8MATLAB

2800 4600 7650 12650 21000 3450010

-11

10-8

10-5

10-2

FWD + TLM RHS evaluations

RM

S e

rror

(T

LM)

stgsimDOPRI5DOPRI8

1200 1830 2800 4280 6550 1000010

-12

10-9

10-5

10-2

ADJ RHS evaluations

RM

S e

rror

(A

DJ)

CVODESDOPRI5DOPRI8MATLAB

a b

c

DENSERKS and CVODES work-precision diagrams for the Van der Pol system (6.4): total number of right hand side function evaluations versusaccuracies, for the (a) forward model, (b) TLM model, and (c) first order adjoint model. We tested CVODES with the staggered direct (stg) and

neous corrector methods (sim) (the staggered corrector method yielded results that are very similar to those obtained from its direct version). Also,bled full error control for the forward sensitivities. The time step controller implemented in DENSERKS includes tangent linear variables in the erroror state. DENSERKS’s DOPRI8(6) implementation achieves best all-around work – accuracy ratios.

Table 4Second order adjoint of the Van der Pol system (6.9): Runtime of the second order adjoint method (BKWD2) versus finite differences (FDIFF). DOPRI8(6) hasbeen used for all model integrations.

np FWD [ms] BKWD2 [ms] BKWD2/FDIFF

100 1 67 1.18200 2 79 0.99400 3 96 0.95800 7 137 0.871600 11 245 0.823200 20 418 0.816400 41 794 0.7712800 79 1659 0.78

M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346 341

6.2.2. Second order sensitivity analysisTo illustrate the second order sensitivity analysis capabilities of DENSERKS, we look at a variant of the Van der Pol system

with a larger number of parameters and more nonlinear dependencies between the variables (adapted from [19]):

_y1 ¼ ð1� y22Þy1 � y2 þ vðt;pÞ;

_y2 ¼ y1;

_y3 ¼ y21 þ y2

2 þ v2ðt; pÞð6:9Þ

with t0 = 0, tF = 5,

vðt;pÞ ¼Xnp�1

i¼1

tpipiþ1 ð6:10Þ

and initial conditions y1(t0) = 0, y2(t0) = 1 and y3(t0) = 0. The objective functional is chosen as

GðpÞ ¼ y3ðtF ; pÞ: ð6:11Þ

Here

p ¼ 1np;

1np; . . . ;

1np

� �;

dp ¼ 1;12;13; . . . ;

1np

� �:

ð6:12Þ

The forward sensitivity system, the first and second order adjoint models and the quadrature equations are given in [19] for ageneral function v(t,p). We approximate the Hessian-vector product

@2G@p2

!� dp

�����tF ;p

ð6:13Þ

using two distinct approaches. We first compute a finite difference approximation to (6.13):

@2G@p2 � dp

�����tF ;p

� rpgðtF ;pþ �dpÞ � rpgðtF ; pÞ�

: ð6:14Þ

Next, we compare the finite difference approach with second order adjoint method (2.30). The ratios of the runtimes of thesetwo schemes are given in Table 4, for increasing values of np. We note that the advantages of the second order adjoint meth-od are twofold. First, its computational cost (measured as runtime to convergence) is lower than that of the finite differencemethod in almost all of the test runs (as shown in the last column of Table 4). Second, the accuracy of the Hessian-vectorproduct approximation can easily be controlled via the user-chosen integration tolerances, whereas in the finite differencemethod (6.14) the accuracy depends on the particular choice of � (in the best case, it is approximately equal to the squareroot of the chosen error tolerance). Optimizers which exploit second order information can thus use second order adjointgradients to speed up convergence to (locally) optimal solutions.

6.3. The shallow water equations

Our next numerical experiment concerns the two-dimensional Saint–Venant approximation to fluid flow inside a shallowbasin:

@

@thþ @

@xðuhÞ þ @

@yðvhÞ ¼ 0;

110 200 370 680 1250 230010-10

10-8

10-6

10-4

10-2

FWD RHS evaluations

RM

S er

ror (

FWD

)

CVODESDOPRI5DOPRI8

170 335 660 1300 2580 510010-10

10-8

10-6

10-4

10-2

ADJ RHS evaluations

RM

S er

ror (

ADJ)

CVODESDOPRI5DOPRI8

a b

Fig. 5. Performance comparison of the high-order Runge–Kutta pairs in DENSERKS and the Adams–Moulton implementations in SUNDIALS, for the forwardand adjoint two-dimensional shallow water problem: (a) Root mean square (RMS) errors in the fluid layer thickness h(tF) and (b) RMS errors in the gradientrh0J, for the cost function (6.16).

342 M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346

@

@tðuhÞ þ @

@xu2hþ 1

2gh2

� �þ @

@yðuvhÞ ¼ 0;

@

@tðvhÞ þ @

@xðuvhÞ þ @

@yv2hþ 1

2gh2

� �¼ 0; ðx; yÞ 2 X ¼ ½�3;3�2; 0 6 t 6 tF ¼ 0:1: ð6:15Þ

Here h(t,x,y) denotes the fluid layer thickness, and u(t,x,y), v(t,x,y) are the components of the velocity field. g is the standardvalue of the gravitational acceleration. The cost function quantifies the mismatch between a reference thickness href andsome perturbed solution h at the final simulation time tF:

J ¼ 12

Xi

Xj

ðhðtF ; xi; yjÞ � href ðtF ; xi; yjÞÞ2; ð6:16Þ

for all grid points (xi,yj) inside a verification area Xv X. The adjoint final condition reads:

kðtFÞ ¼ hðtF ; xi; yjÞ � hrefðtF ; xi; yjÞ; ðxi; yjÞ 2 Xv ;

0; ðxi; yjÞ R Xv :

(

The shallow water PDEs (6.15) are converted to a semi-discrete form using third order upwind finite differences. The bound-ary conditions are periodic. File checkpoints are written every 50 time steps. The work-accuracy plots for forward and ad-joint simulations with both SUNDIALS and DENSERKS are shown in Fig. 5. Again, the high-order continuous Runge–Kuttapairs yield performance comparable to that of the Adams–Moulton implementation found in SUNDIALS.

6.4. Numerical optimization for a convection–diffusion problem

We now use DENSERKS to solve a parameter estimation problem. We consider the following IVP (adapted from [39]):

@y@t¼ p1

@2y@x2 þ p2

@y@x;

t0 ¼ 0 6 t 6 tF ¼ 1;x0 ¼ 0 6 x 6 x1 ¼ 2

ð6:17Þ

with initial and boundary conditions

yðt; x0Þ ¼ yðt; x1Þ ¼ 0;8t 2 ½t0; tF �;yðt0; xÞ ¼ y0ðxÞ ¼ xð2� xÞe2x:

ð6:18Þ

Let us denote the solution of (6.17) with p1 = 1 and p2 = 0.5 by yref(t,x). The objective function reads

GðtF ;pÞ ¼ 12

Z x1

x0

ðyðt; xÞ � yrefÞ2dx: ð6:19Þ

We want to find the parameters p* that minimize the cost function G, i.e. solve the following nonlinear optimizationproblem:

p� ¼ arg minp

GðtF ;pÞ: ð6:20Þ

M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346 343

We solve (6.20) with the L-BFGS-B optimization routine described in [40]. Problem (6.20) has the obvious solution p* = pref,with G(tF,p*) = 0. The gradient (6.19) can be obtained as

@G@p1ðtF ;pÞ ¼

Z tF

t0

Z x1

x0

k@2y@x2 dxdt;

@G@p2ðtF ;pÞ ¼

Z tF

t0

Z x1

x0

k@y@x

dxdt;

ð6:21Þ

where k(t,x) is the solution of the adjoint PDE:

@k@t¼ �p1

@2k@x2 þ p2

@k@x;

kðtF ; xÞ ¼ yðtF ; xÞ � yrefðtF ; xÞ;kðt; x0Þ ¼ kðt; x1Þ ¼ 0:

ð6:22Þ

We discretize the spatial derivatives in (6.17) and (6.22) using centered finite difference formulas on a uniform grid. Elim-inating the homogeneous Dirichlet boundary conditions, it follows that (6.17) and (6.22) are each described by a system of ny

ODEs.The gradient (6.21) is calculated via two additional quadrature equations:

_w1 ¼ �Z x1

x0

k@2y@x2 dx;

_w2 ¼ �Z x1

x0

k@y@x

dx;

w1ðtFÞ ¼ w2ðtFÞ ¼ 0; tF P t P t0:

ð6:23Þ

The spatial integrals in (6.23) are approximated using the trapezoidal rule, with the grid points chosen as quadrature nodes.The starting point of the optimization is ðp0

1; p02Þ ¼ ð3;3Þ and ny = 70. After approximately 11 L-BFGS-B iterations, the param-

eters converge to their reference values ðpref1 ; pref

2 Þ ¼ ð1;0:5Þ. At the end of the optimization cycle, the cost functionG(tF,popt) � 10�10 � G(tF,p0).

7. Summary

In this paper, we investigate the solution of forward and adjoint sensitivity ODEs using continuous Runge–Kutta schemes.Previous work in this area has focused on linear multistep methods. Adjoints of nonlinear models are dependent on the for-ward model solution. Since adjoint models need to be integrated backward in time, they require the original model solutionat time points that cannot generally be known a priori. Hence some form of interpolation is needed. We propose using thedense output mechanism built in the continuous Runge–Kutta pairs as a forward solution interpolant. Since high-order Run-ge–Kutta pairs such as DOPRI5(4) or DOPRI8(6) support continuous extensions, dense output is shown to lead to the com-putation of very accurate adjoint model trajectories with only a small increase in the computational cost over the originalRunge–Kutta pair. The overhead of dense output is approximately equal to the cost of the few extra stages introduced inthe Runge–Kutta method to accommodate the interpolant.

Our DENSERKS library implements several well-known continuous Runge–Kutta pairs. High-order accuracy, as well asother features such as fully adaptive time stepping, easy coupling with AD-generated code, two-level file checkpointing,make DENSERKS an attractive option for sensitivity calculations. We tested the feasibility and efficiency of our approachon a range of numerical examples. Their results indicate that, for non-stiff or mildly stiff problems, the performance ofour code is comparable to that of existing high-quality implementations based on linear multistep methods.

Acknowledgement

The authors would like to thank the anonymous referees for their comments and suggestions, which helped improve thismanuscript.

Appendix A. In the following, we illustrate the use of the well-known automatic differentiation tool TAMC for tangentlinear and adjoint code generation. Detailed information about TAMC can be found in [11]. Should one choose to work withother AD tools, the approach described below remains valid.

Following Griewank [38], consider the (nonlinear) vector equation:

b ¼ FðaÞ ð7:1Þ

with a; b 2 Rny and F is assumed to be twice continuously differentiable in a. The forward mode of AD with a as input and b asoutput yields

1 One

344 M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346

db ¼ F 0ðaÞda; ð7:2Þ

whereas the reverse mode with the same input and output computes

�a ¼ �aþ �bF 0ðaÞ;�b ¼ 0:

ð7:3Þ

Here, the tangent linear variable and the adjoint variable corresponding to a variable x are denoted by dx and �x, respectively,and F0(a) = @F/@a. The costate (adjoint) variables are shaped like row vectors. Let b$ _y and a M y in (7.1), with y and_y ¼ @y=@t defined in (2.1). One can then associate in (7.2)

da$ si

db$ _sið7:4Þ

where si are the forward sensitivity variables, i.e. the solutions of the TLM (2.14).DENSERKS requires that the right hand side of the forward model Eq. (2.1) be implemented as a subroutine with the fol-

lowing parameter list:

SUBROUTINE FWD_RHS (NX, T, Y, F)

INTEGER:: NX

DOUBLE PRECISION:: T

!! Numerical solution of the forward model at time T

DOUBLE PRECISION:: Y(NX)

!! RHS of the forward model at time T

DOUBLE PRECISION:: F(NX)

Note that, for this and for all the following examples, the user is free to choose any subroutine name. DENSERKS only re-quires the subroutine to have the specified parameter list. It follows that running TAMC in the forward mode on FWD_RHSwith Y as input and F as output will generate

SUBROUTINE TLM_RHS (NX, T, Y, DY, DF)

!! Numerical solution of the tangent linear model at time T

DOUBLE PRECISION:: DY(NX)

!! RHS of the tangent linear model at time T

DOUBLE PRECISION:: DF(NX)

This is the subroutine used for TLM right hand side evaluations. The terms Fpiin (2.14) can be generated by differentiating

the code inside FWD_RHS with respect to the system parameters.1

For the adjoint mode, we can identify in (7.3)

�a$ _kT and �b$ kT : ð7:5Þ

Thus, setting a = 0 at the beginning of the adjoint subroutine and removing the code for the second assignment in (7.3) leadsto

SUBROUTINE ADJ_RHS (NX, T, Y, ADY, ADF)

!! Numerical solution of the first order adjoint model at time T

DOUBLE PRECISION:: ADY(NX)

!! RHS of the first order adjoint model at time T

DOUBLE PRECISION:: ADF(NX)

A similar approach can be used for building the right hand side of the quadrature system (2.15).For the second order adjoint, we consider the simpler case of sensitivity analysis with respect to initial conditions (p = y0).

For example, we generate the product (Fyy � ypdp)Tk in the model Eq. (2.25). Through a forward mode differentiation of (7.3),we generate code that computes

_�a ¼ _�aþ _�bF 0ðaÞ þ �bF 00ðaÞ _a; ð7:6Þ

and we can identify using (2.25):

can add the parameter vector as a dummy argument to FWD_RHS and then differentiate.

M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346 345

_�a$ _rT ;

_�b$ rT ;

_a$ dy ypdp;�b$ kT ;

a$ y:

ð7:7Þ

Starting from ADJ_RHS, we perform a forward differentiation with Y, ADY and ADF as inputs and ADF as output. The code forthe other tensor products in (2.25) can be generated in a similar fashion. After some straightforward modifications of thecode output by the AD engine, we obtain a subroutine that computes the RHS of (2.25):

SUBROUTINE SOA_RHS (NX, NP, T, Y, DY, DP, ADY, AD2Y, AD2F)

!! Numerical solution of the second order adjoint model at time T

DOUBLE PRECISION:: AD2Y(NX)

!! RHS of the second order adjoint model at time T

DOUBLE PRECISION:: AD2F(NX)

Similarly, the RHS of second order quadrature equation (2.29) may easily be derived by forward differentiation of thequadrature code for (2.15).

References

[1] I.M. Navon, Practical and theoretical aspects of adjoint parameter estimation and identifiability in meteorology and oceanography, Dyn. Atmos. Oceans27 (1997) 55–79.

[2] V. Damian, A. Sandu, M. Damian, F. Potra, G.R. Carmichael, The kinetic preprocessor KPP – a software environment for solving chemical kinetics,Comput. Chem. Eng. 26 (2002) 1567–1579.

[3] F.X. LeDimet, I.M. Navon, D. Daescu, Second order information in data assimilation, Mon. Weather Rev. 130 (3) (2002) 629–648.[4] R. Griesse, A. Walther, Parametric sensitivities for optimal control problems using automatic differentiation, Optim. Control Appl. Methods (28) (2003)

297–314.[5] A. Adcroft, J.-M. Campin, P. Heimbach, C. Hill, J. Marshall, MIT General Circulation Model User’s Manual, MIT, Boston, MA, USA, 2007.[6] A. Sandu, D. Daescu, G.R. Carmichael, T. Chai, Adjoint sensitivity analysis of regional air quality models, J. Comput. Phys. 204 (1) (2005) 222–252.[7] D.B. Özyurt, P.I. Barton, Large-scale dynamic optimization using the directional second-order adjoint method, Ind. Eng. Chem. Res. 44 (2005) 1804–

1811.[8] W.W. Hager, Runge–Kutta methods in optimal control and the transformed adjoint system, Numer. Math. 87 (2) (2000) 247–282.[9] A. Sandu, On the properties of Runge–Kutta discrete adjoints, in: International Conference on Computational Science, vol. 4, 2006, pp. 550–557.

[10] A. Walther, Automatic differentiation of explicit Runge–Kutta methods for optimal control, Comput. Optim. Appl. 36 (1) (2007) 83–108.[11] R. Giering, Tangent linear and Adjoint Model Compiler, Users manual 1.4, 1999.[12] R. Giering, T. Kaminski, Applying TAF to generate efficient derivative code of Fortran 77-95 programs, Proc. Appl. Math. Mech. 2 (1) (2003) 54–57.[13] L. Hascöet, V. Pascual, TAPENADE 2.1 user’s guide, Tech. Rep. 0300, INRIA, Sophia Antipolis, France, 2004.[14] M. Alexe, A. Sandu, On the discrete adjoints of adaptive time stepping algorithms, Tech. Rep. TR-08-08, Virginia Polytechnic Institute and State

University, Blacksburg, VA, USA, 2008.[15] P. Eberhard, C. Bischof, Automatic differentiation of numerical integration algorithms, Math. Comput. 68 (226) (1999) 717–731.[16] T.S. Baker, J.R. Dormand, J.P. Gilmore, P.J. Prince, Continuous approximation with embedded Runge–Kutta methods, Appl. Numer. Math. 22 (1-3) (1996)

51–62.[17] E. Hairer, S.P. Nrsett, G. Wanner, Solving ordinary differential equations: nonstiff problems, Computational Mathematics, vol. I, Springer-Verlag, 1993.[18] P.W. Sharp, J.H. Verner, Generation of high-order interpolants for explicit Runge–Kutta pairs, ACM Trans. Math. Softw. 24 (1) (1998) 13–29.[19] D.B. Özyurt, P.I. Barton, Cheap second order directional derivatives of stiff ODE embedded functionals, SIAM J. Sci. Comput. 26 (5) (2005) 1725–1743.[20] A.C. Hindmarsh, P.N. Brown, K.E. Grant, S.L. Lee, R. Serban, D.E. Shumaker, C.S. Woodward, SUNDIALS: suite of nonlinear and differential/algebraic

equation solvers, ACM Trans. Math. Softw. 31 (3) (2005) 363–396.[21] R. Serban, A.C. Hindmarsh, CVODES: the sensitivity-enabled ODE solver in SUNDIALS, Tech. Rep. UCRL-JP-200037, Lawrence Livermore National

Laboratory, Livermore, CA, USA, 2003.[22] A.C. Hindmarsh, R. Serban, User documentation for CVODES v 2.5.0, Lawrence Livermore National Laboratory, Livermore, CA, USA, 2006.[23] Y. Cao, S. Li, L. Petzold, R. Serban, Adjoint sensitivity analysis for differential-algebraic equations: the adjoint DAE system and its numerical solution,

SIAM J. Sci. Comput. 24 (3) (2002) 1076–1089.[24] S. Li, L. Petzold, Design of new DASPK for sensitivity analysis, Tech. Rep. TRCS99-28, University of California at Santa-Barbara, Santa Barbara, CA, USA,

1999.[25] S. Li, L. Petzold, Description of DASPKADJOINT: an adjoint sensitivity solver for differential-algebraic equations, Tech. Rep. TRCS99-28, University of

California at Santa-Barbara, Santa Barbara, CA, USA, 2002.[26] A. Sandu, P. Miehe, Forward, tangent linear, and adjoint Runge–Kutta methods in KPP-2.2 for efficient chemical kinetic simulations, Tech. Rep. TR-06-

17, Virginia Tech, Blacksburg, VA, USA, 2006.[27] D.N. Daescu, A. Sandu, G.R. Carmichael, Direct and adjoint sensitivity analysis of chemical kinetic systems with KPP: II—numerical validation and

applications, Atmos. Environ. 37 (36) (2003) 5097–5114.[28] A. Sandu, D. Daescu, G.R. Carmichael, Direct and adjoint sensitivity analysis of chemical kinetic systems with KPP: Part I—theory and software tools,

Atmos. Environ. 37 (36) (2003) 5083–5096.[29] J.H. Verner, Differentiable interpolants for high-order Runge–Kutta methods, SIAM J. Numer. Anal. 30 (5) (1993) 1446–1466.[30] E. Hairer, G. Wanner, Solving ordinary differential equations: stiff and differential-algebraic problems, in: Computational Mathematics, vol. II,

Springer-Verlag, 1994.[31] A. Ostermann, Continuous extensions of Rosenbrock-type methods, Computing 44 (1) (1990) 59–68.[32] E. Hairer, A. Ostermann, Dense output for extrapolation methods, Numer. Math. 58 (1) (1990) 419–439.[33] A. Sandu, L. Zhang, Discrete second order adjoints in atmospheric chemical transport modeling, J. Comput. Phys. 224 (12) (2008) 5949–5983.[34] J.R. Dormand, P.J. Prince, A family of embedded Runge–Kutta formulae, J. Comput. Appl. Math. 6 (1) (1980) 19–26.[35] P.J. Prince, J.R. Dormand, High order embedded Runge–Kutta formulae, J. Comput. Appl. Math. 7 (1981) 67–76.[36] J.R. Dormand, M.A. Lockyer, N.E. McGorrigan, P.J. Prince, Global error estimation with Runge–Kutta triples, Comput. Math. Appl. 18 (9) (1989) 835–846.

346 M. Alexe, A. Sandu / Applied Mathematics and Computation 208 (2009) 328–346

[37] Y. Cao, S. Li, L. Petzold, Adjoint sensitivity analysis for differential-algebraic equations: Algorithms and software, J. Comput. Appl. Math. 149 (1) (2002)171–191.

[38] A. Griewank, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, SIAM, Philadelphia, PA, USA, 2000.[39] A.C. Hindmarsh, R. Serban, Example Programs for CVODES v.2.5.0, Lawrence Livermore National Laboratory, Livermore, CA, USA, November, 2006.[40] C. Zhu, R.H. Byrd, P. Lu, J. Nocedal, Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization, ACM Trans. Math.

Softw. 23 (4) (1997) 550–560.