Numerical Solution of Continuous-state Dynamic Programs Using Linear and Spline Interpolation

8/6/2019 Numerical Solution of Continuous-state Dynamic Programs Using Linear and Spline Interpolation

http://slidepdf.com/reader/full/numerical-solution-of-continuous-state-dynamic-programs-using-linear-and-spline 1/18

Numerical Solution of Continuous-State Dynamic Programs Using Linear and Spline

InterpolationAuthor(s): Sharon A. Johnson, Jery R. Stedinger, Christine A. Shoemaker, Ying Li, JoseAlberto Tejada-GuibertSource: Operations Research, Vol. 41, No. 3 (May - Jun., 1993), pp. 484-500Published by: INFORMSStable URL: http://www.jstor.org/stable/171851

Accessed: 27/01/2009 08:53

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless

you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you

may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at

http://www.jstor.org/action/showPublisher?publisherCode=informs.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the

scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that

promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Operations Research.

http://www.jstor.org/stable/171851?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp

http://www.jstor.org/action/showPublisher?publisherCode=informs

http://www.jstor.org/action/showPublisher?publisherCode=informs

http://www.jstor.org/page/info/about/policies/terms.jsp

http://www.jstor.org/stable/171851?origin=JSTOR-pdf



NUMERICAL OLUTIONOF CONTINUOUS-STATE YNAMICPROGRAMSUSING LINEARAND SPLINE INTERPOLATION

SHARON A. JOHNSONWorcester Polytechnic Institute, Worcester,Massachusetts

JERY R. STEDINGER and CHRISTINEA. SHOEMAKERCornell University, Ithaca, New York

YING LIBeijing Economic Research Institute of Water Resources and Electric Power, Beijing, Peoples Republic of China

JOSE ALBERTO TEJADA-GUIBERTUnited Nations, New York,New York

(ReceivedJuly 1990;revisionreceivedMarch 1991;acceptedJuly 1992)

This paper demonstrates hat the computationaleffort required o develop numericalsolutions to continuous-state

dynamicprograms an be reducedsignificantlywhen cubicpiecewisepolynomial unctions,rather hantensorproduct

linear interpolants,are used to approximate he value function.Tensorproduct cubic splines, representedn either

piecewisepolynomial or B-splineform,and multivariateHermitepolynomialsare considered.Computational avingsarepossiblebecause of the improved accuracyof higher-orderunctions and because the smoothnessof higher-order

functionsallowsefficientquasi-Newtonmethods o be used to compute optimaldecisions.Theuseof the more efficient

piecewisepolynomial orm of the splinewas slightly uperior o theuseof Hermitepolynomials orthe test problemand

easier o program. n comparison o linear nterpolation, se of splines n piecewisepolynomial ormreduced he CPU

timeto obtainresultsof equivalentaccuracyby a factorof 250-330 for a stochastic -dimensionalwater upplyreservoir

problemwith a smoothobjective unction,and factors anging rom 25-400 for a sequenceof 2-, 3-, 4-, and 5-dimensional

problems.As a result,a problem hat requiredwo hours to solve with linear nterpolationwas solved in a less than a

minute with spline nterpolationwith no loss of accuracy.

Dynamic programming (DP) is a versatile andpowerful optimization procedure because it

exploits the sequential character of the operation ofdynamic systems and allows nonlinearities, feedback,

and stochastic inputs to be represented.However, the

computational challenge posed by the numerical solu-tion of dynamic programs with continuous-state

spaces can be prohibitive for realistic problems

because of the dramatic growthin memory and com-putational requirements with the problem size andthe desiredaccuracy. Management problemsthat canbe modeled as dynamic programs with essentiallycontinuous-state spaces include the control of manu-

facturing processes, the operation of surface andground water reservoirs, he management of fisheries,crops, insect pests, and forests, and the managementof portfolios in markets with uncertain interest rates(see Shoemaker 1981, Stedinger, Sule and Loucks

1984, White 1985, 1988, Gal 1989, Shoemaker andJohnson 1989).

In a continuous-state DP model, the state space isusuallydiscretized so that the DP functional equationthat characterizes he solution need only be solved fora finite number of values of the state vector. Thenumber of discretevalues chosen affects the accuracyof the approximation to the continuous problem. Abackward recursion algorithm can generally be usedto solve the discretized functional equation. At eachstage, the algorithmrequires evaluation of the valuefunction from the previousstage,which has only beencalculated at the grid points defined by the discreti-

zation of the state space. When system dynamicsrequire he values of the value function at points otherthan state-space grid points, interpolation or anotherfunctionalapproximationscheme can be used to gen-erate them (Bellmanand Dreyfus 1962).

Subject lassifications:Dynamicprogramming,pplications: umerical olutionof continuous-stateynamicprograms. ynamicprogramming,Markov,infinite tate:higher-orderpproximation f the valuefunction.

Area of review:COMPUTING.

OperationsResearch 0030-364X/93/4103-0484$01.25Vol.41, No. 3, May-June1993 484 ? 1993OperationsResearch ocietyof America



Continuous-StateDynamic Program / 485

This paper examines the reductions in the compu-

tational effort required o develop numericalsolutions

to continuous-stateDP models that are possible whenhigher-order piecewise polynomial functions, rather

than tensor product linear interpolants, are used to

approximate the value function. The paper concen-trates on the use of tensor product cubic splines

(defined in Section 4) that are represented n piecewise

polynomial form for evaluation (pp-form splines)

because they were efficient in initial computationalstudies and easy to construct. Use of pp-form splineshas not been proposed previously in the literature.

Two piecewise polynomial methods suggested by

other authors are also considered: 1) cubic Hermite

polynomials, which form the basis of the gradi-ent dynamic programming algorithm proposed

by Kitanidis and Foufoula-Georgiou (1987), and

2) tensor product cubic splines in B-spline form

(B-formsplines), proposed by Daniel (1976) and

Birnbaum and Lapidus (1978).The computational performance s illustrated n the

numerical solution of a series of stochasticwater sup-

ply reservoir management problems. The paper

addresseswhether use of a coarser discretization of

the state spacecan compensate for the increased effort

to evaluate a higher-order approximant numericallyinstead of a simpler tensor product linear interpolant,while still maintaining the same accuracyin the solu-

tion. The smoothness of a higher-orderfunction is

also important because it allows efficient quasi-

Newton optimization algorithmsto be used to deter-mine optimal decisions. Previous papersthat describe

the use of higher-orderapproximants (Daniel 1976,Birnbaum and Lapidus 1978, L'Ecuyer 1985,

Kitanidis and Foufoula-Georgiou 1987, Foufoula-

Georgiouand Kitanidis 1988)did not providenumer-

ical evidence that the use of higher-orderapproxima-

tions can yield a significant computational advantageover the more commonly used linear interpolation.

The numerical studiesreportedhere also addresswhen

such approximations are likely to be appropriate byconsideringstochastic problems of up to five dimen-

sions and two different value functions. They dem-

onstratethat use of higher-orderapproximationscanyield significantcomputational savings on a collection

of examples.Section 1 describesthe formulation of DP models

and discusses algorithms for deriving numerical

approximations of their solutions. Section 2 reviewsothermethodsused to reducethe computationaleffortrequired to solve DP models numerically. The litera-

ture associated with analytical approximation of the

value function is discussed in Section 3. The charac-teristics of algorithmsthat employ different approxi-mations are discussed in Section 4, and the

computational effort associated with tensor productspline interpolation is compared with tensor prod-

uct linear interpolation in Section 5. Section 6presents extensive computational results for a four-

dimensional watersupplyexample. In Section 7, addi-tional computationalresultsare presentedfor a seriesof water supply examples of varying dimension andtwo different benefit functions. The results are sum-marizedin Section 8.

1. THE DP MODEL

Let X, be the continuous-state vector for a dynamicsystemthatdescribesa system'sstatusat the beginningof period or staget. In the example problemsdescribedin Sections 6 and 7, the state vector will describethevolume of water n each of several reservoirsand couldalso include hydrologic information (Loucks,Stedingerand Haith 1981, Stedinger,Sule and Loucks1984, Kelman et al. 1990). The dynamics of a systemare written

Xt+1= g(Xt, Rt, Qt), (1)

where Rt is the vector of decisions made in period tand Qtrepresentsrandom influencesupon the system.In the case of reservoiroperation, Rt correspondstothe releases from the reservoirsand Qtto the inflows

to the reservoirsduring the period.Denote the benefits fromsystemoperationin periodt by Bt[Xt,Rt, Qj, where the state is initially Xt, thedecision is RK,nd Qt is the value of randominfluencesupon system operation. The expected benefits to bemaximized from system operation from period 1untilperiod T are

J=E-{E, Bt[Xt, & Qt]+ QXT+I) (2)

where C(XT+l) is a terminal value function.Such sequential decision problems can be solved

throughthe recursivecalculation of an optimal value

functionFt[Xt] hat equals the expected futurebenefitsobtainable from the optimal operation of the systemfrom period t throughthe end of period T, given thatthe system begins period t in stateX, (Bellman 1957).For t = 1, . . T, Ft[Xt] s definedby

Ft[Xt] max EQ,lBt[Xt,Rt, Qt] + Ft+1[Xt+,]1, (3)Rw

whereXt+1=g(Xt,Rt, Qt)andFT+1(XT+1) Q(XT+I).



486 / JOHNSON ET AL.

The probability distribution over which the expecta-

tion El } is calculated may depend upon X, and R,.For finite-horizon problems, the value function

F,[X,] and policy R, can be found by recursively

solving (3) backward for t = T, . .., 1. When the

general functional form of Ft is not known, thecontinuous-state space for X, is replaced by a gridand

the value of Ft is computed from (3) at the grid points.

Because the value of Ft+, is only calculated at the grid

points of the state space, the value of Ft+, in (3) at

other points is determined by interpolating amongnearby grid points unless the transition function g is

definedso that a transitioncan only go to a grid point.For problems for which such a restriction is even

possible, it is seldom attractivebecause of the resultant

distortion of feasible system dynamics and/or thenecessity of using a very fine state-space grid to allow

reasonable resolution of any continuous control vari-

ables. Becauseof its simplicity, multidimensional lin-

ear interpolation is most often used to approximate

Ft,+, but other approximation schemes have been

suggested (see Section 3).In this paper, the location of the grid points are

chosen without consideration of the structure of the

optimal policy or value function, which is not knownin many practical problems.If the general form of the

value function or other characteristics of the model

solution are known, this knowledge may be used to

select the location of grid points or to suggest more

efficient solution methods tailored to the problem.

Calculation of the optimal decision Rtand the valueof Ft[Xt]on a fine grid poses a formidablecomputa-tional task for multidimensional problems. If k is the

dimension of Xt, and each of Xt's components is

approximated by N points, then such a grid wouldcontain Nk grid points. The optimization problem in

(3) must be solved at each of the grid points; thus, the

required computational effortE at each stage can be

as much as

E = W(p, k)Nk, (4)

where W(p, k) is the work or effort required to solveeach optimization problem. The effort W(p, k) will

generallyincreasewith p, the dimension of the deci-sion vector Rt, and also with k, the dimension of thestate vectorXt.

In the examples solved in Sections 6 and 7, thenumerical calculation of a reasonable approximationof Ft[Xt]for a 3-stage, 2-reservoirproblem when lin-ear interpolation was employed took 10 seconds onan IBM 3090-600E Supercomputer, and was esti-mated to require 70 hours for the corresponding5-dimensional problem. The effort requiredto solve

practical problems having nonlinear objectives with a

dimension k of more than 3 to 5 with reasonableaccuracy (a large value of N) is generally prohibitive

with the linear interpolation method commonly

employed. Using tensor product cubic splines allowed

solution of the 5-dimensional problem in just 10minutes of 3090-CPU time.

The backwardrecursion algorithmcan also be used

to solve (3) approximately for the long-run, steady-

state optimal value function of continuous-state,infinite-horizon problems when the state space has

been discretized (White 1963, Su and Deininger 1972,Federgruen and Schweitzer 1979). This value itera-

tion approach s advantageousfor numerically solving

the infinite-horizon, multistage periodic DP problem

posed by monthly reservoiroperatingproblems (Roefsand Guitron 1975). Policy iteration methodologies

can also be employed to solve infinite-horizon prob-lems numerically (Howard 1960). When the decision

spaceis discrete, policy iteration can be cast as a linear

programmingproblemwith decision variablesdefined

as the value function at the grid points (Heyman and

Sobel 1984), or the coefficients of polynomials and

spline functions that approximatethe value function

(Schweitzer and Seidmann 1985, L'Ecuyer 1989).Hence, the higher-order approximations developedhere for finite-horizon problems can also be used in

the value iterationand policy iterationapproachesfor

continuous-state, infinite-horizon DP problems.

2. REVIEWOF EFFORTS TO REDUCETHECOMPUTATIONALURDEN FOR DP

The computational effort required to solve a discre-tized version of a continuous DP problem has longbeen an issue. Several different strategies have beenemployedto combat the computational burden;bettermethods of approximating the value function is just

one strategy.

2.1. Modeling an Approximate Problem

Approximate solutions to complex discrete- andcontinuous-stateDP problems are often obtained byreformulating the problem using a simpler model

(Whitt 1978, Morin 1979). For example, the dimen-sion of the state space may be reduced by aggregatingstate-spacegrid points. Bean, Birge and Smith (1987)develop a state aggregation/disaggregation lgorithmfordeterministic,discretedynamic programs hatpro-duces a feasible solution to the original problem. Inthe reservoirliterature,several reservoirshave beencombined into a composite reservoir to reduce thenumber of state variables (Arvanitidis and Rosing1970a b, Turgeon 1980, 1981, Terry et al. 1986).




Shoemaker reduced the state-spacedimension for pestproblems by using analytical solutions to describe

system dynamics (Shoemaker 1982) and by usingprevious decisions as state variables (Shoemaker

1979).

Other methods for approximating complicatedproblems include substituting deterministic equiv-

alentsfor stochastic components (Lovejoy 1986, Bragaet al. 1991), and partitioning the original probleminto smaller separableproblems (Trott and Yeh 1973,

Turgeon 1980, Lovejoy 1986, Braga et al. 1991).

Saad and Turgeon (1988) considered only the states

spanned by three principal components of the covar-

iance matrix for an entire reservoirsystem's operationwhen its operation was optimized deterministically.

This approach recognizes that many combinations of

state variablesare either impossible or unlikely.Solving an approximate problem introduces error.

Lovejoy (1986) gives conditions that ensure that his

solution of approximate discrete-state problems will

provide a bound on the optimal policies of the original

problem. Haurie and L'Ecuyer (1986), Bean, Birge

and Smith (1987), White (1979), and Whitt (1978,

1979) give bounds on the errorin the value functions

obtained by solving approximate problems.

2.2. Better InitialApproximations

In discrete or discretized infinite-horizon problems,

the backward olution of (3) until a stationarysolutionis identified can be acceleratedby use of a good initial

estimate of F(X) (Mawer and Thorn 1974). A simplerproblem can be used to generate initial estimates of

the infinite-horizon optimal value function (see Bras,Buchanan and Curry 1983; Wang and Adams 1986),

or a coarse gridcan be employed initially.

3. REVIEWOF EFFORTSTO APPROXIMATE

THE OPTIMALVALUE FUNCTION

The idea of approximatingthe value function Ft[Xt]by an analyticalfunction dates back to early publica-tions on dynamic programming; Bellman (1957),Bellman and Dreyfus (1962), Bellman, Kalaba andKotkin (1963) and White (1969) employed polynomi-

als to approximateFt[X,] over the entire state space.Takeuchi and Moreau (1974) used least-squares o fitlower-orderpolynomials over the entire spaceto solvea water resources problem numerically. Gal (1979,1989) developed an iterative algorithmwherethe opti-mal valuefunction is approximatedonly on the regionof the statespaceencounteredin previoussimulations.

Read (1990) developed an efficient computationalscheme by approximating the marginal value func-

tion with piecewise linear functions, and identifying

significant points in the dual space. Pereiraand Pinto(1991) propose a similaralgorithm based on Bender's

decomposition.

Polynomials are often used for approximationbecause they are easy to evaluate and differentiate,

and any continuous function on a bounded intervalcan be approximatedarbitrarilywell by polynomialsof sufficient order (Rudin 1976, Schumaker 1981).However, high-degree polynomials can oscillateseverely (Prenter 1981). In addition, in DP modelswith continuous decision variablesRt, the optimiza-tion step in (3) may seek out local optima induced bythe oscillations (Ginn 1986). Low-orderpolynomialsmay be unable to approximate the value function

adequately over the entire state space. As a conse-quence, piecewise polynomial approximations that

employ differentpolynomials in different subregionsof the domain have been suggested. With thesepiecewise polynomial approximation schemes, thesmoothness of the approximating function can beensured by requiringthat the values and derivativesof the polynomials match at the grid points andboundaries of the hypercubes they define.

Kitanidis and Foufoula-Georgiou, and Foufoula-Georgiou and Kitanidis proposed an algorithm theycalled gradient dynamic programming (GDP) that

employs cubic Hermite polynomials to approximatethe optimal value function for continuous-state DPs.Nonlinearprogramming s used to determine optimaldecisions.Kitanidis and Foufoula-Georgioushow that

asymptoticallyand for sufficiently smooth functions,the one-stage error n the control policy is of the order(AX)3 for a one-dimensional GDP algorithm, com-paredto (AX) with linear interpolation, where AzX sthe length of each intervalin the state-spacegrid. Theerrorin the value function at each stage is of order

(zLX)4 for the GDP algorithm and (VX)2 with linear

interpolation. Thus, for small AX, GDP can yieldmore accurate solutions than a backward recursionalgorithm that uses linearinterpolation. Kitanidis andFoufoula-Georgiou compared the accuracy of thesolutions producedusing GDP and an algorithmthatemployed linear interpolation on deterministic and

stochastic versions of a one-dimensional reservoirproblem, but did not report computation times. Inboth cases, GDP yielded solutions of equivalentaccuracy with less than half the number of discretelevels for the state variable. Foufoula-GeorgiouandKitanidis presented numerical results for the 4-

dimensional problem described in Section 6 usingGDP, but did not provide comparable numericalresultsfor tensorproduct linearinterpolation.Hence,the results presented here are the first numerical




evidence that GDP with Hermite polynomials iscomputationally preferable o linear interpolationfor

multidimensional problems.

Daniel (1976) and Bi3rnbaum nd Lapidus (1978)

suggested hat tensor productcubic splines in B-spline

form (B-form splines) be used to approximate theoptimal value function. Daniel noted that theoreti-

callythesesplineshavethe same approximatingpower

as orthogonal polynomials with the same number

of coefficients, but require less effort to compute.

Birnbaum and Lapidus compared splines and poly-

nomial approximations on a two-dimensional,continuous-time, continuous-state, deterministiccon-

trol problem. After discretizingthe control variables,

they used enumeration to determine the optimal deci-

sion. Different computers were used and slightly dif-

ferent versions of the problem were solved in the

comparison. They concludedthat in terms of accuracyand computing time B-form splines were superiorto

orthogonal polynomials with the same number of

coefficients. Birnbaum and Lapidus also noted that

the gradient of the objective in (3) can be calculated

when spline approximations are employed, thus allow-

ing Newton-type optimization schemes to be used to

determine the optimal control. However, theyemployed enumerationin theircomputationalstudies

to allow comparison with previous literature.

This paper considers the computational advantage

associated with using the more efficient piecewisepolynomial form of the spline (pp-form spline) for

evaluation,ratherthan the B-form splinesuggestedbyDaniel and Birnbaum and Lapidus. The value of

employing a quasi-Newton algorithm to solve thenonlinear programmingproblem in (3), which is pos-

sible because the gradient of a spline is easily calcu-

lated, is also examined.

4. COMPARISONOF ALTERNATIVEAPPROXIMATION CHEMES

This paper examines the benefits of a backwardrecur-sion algorithm for numerically solving finite-horizon,

continuous-state DP problems, the spline DP algo-rithm, which employs tensor product cubic pp-form

splines to approximate the value function F,(X,).Use of the B-form spline proposed by Daniel andBirnbaum and Lapidus is also considered.The splineDP algorithm is compared with a backwardrecursion

algorithmthat employs multidimensional linearinter-

polation to approximate F,(X,), the multilinear DPalgorithm,as well as the GDP algorithm developed byFoufoula-Georgiou and Kitanidis.

4.1. The Multilinear DP Algorithm

In the multilinear DP algorithm, piecewise linearinterpolation is used to approximate the value func-tion FJ[X,] n each dimension. The resultingcontinu-ous multilinearapproximationof

F,[XIis then used

to generate he value of F,+Iat points in the state spacewhere it is needed when solving (3). The multilinear

approximation is local, depending only on the valuesof Ftat the vertices of the hypercube n the state-spacegrid that contains the point to be approximated (deBoor 1978).

The value of the multilinear approximant to

Ft[Xt]at a point Xt can be calculatedefficiently usingtensor product methods, which divide the k-dimen-sionalinterpolationprobleminto k parts,eachtreatingone dimension. Considera two-dimensionalproblemwhere Xt+1= (xi, x2), with x1 E (pli, p1,j+)and x2 E

(P2j, P21i+), wherePki denotes a state-spacegrid point.Let H(x1, x2) represent the bilinear approximantto F,+1 and let a = (x1 - P1j)/(Pi,j+i - Plu) and -

(X2 - P2j)/(P2,j+1 - p2j). The value of H at (xi, x2) is

calculated by first carrying out linear interpolationin xi, with x2 fixed at P2iand P2,j+1, yielding

h1(p2j)

= a * F,+l(pli+l, p2j)+ (1 -a) * Ft+1(p1,p2j) (5)

h2(P2,j+l)

=a * Ft+1(p,1+1,p2,j+1)+ (1 - a) *Ft+l(Pli,P2,i+l)

Then linear interpolation is carried out with respectto x2, so that

H(x1, x2) = d * h2(p2,j+l) + (1 -I) * h1(p2j). (6)

Using this procedure, the value of the approximantis generated each time it is needed when evaluating(3); the coefficients of the multilinearapproximationare never explicitly calculated or stored. For a k-dimensional problem, first 2(k-1) one-dimensional

interpolationsare done, then 2(k-2) and so forth, fora total of [2k - 1] one-dimensional interpolationsaltogether.

The first derivativeof the multilinearapproximant

is discontinuous at the faces of the rectangularvol-umes defined by the grid so that the optimizationproblem in (3) cannot be solved using quasi-Newtonmethods designed for use with smooth functions. Asa result,the multilinearDP programused the E04CCFalgorithm in the NAG subroutine library, which isan implementation of the polytope algorithm (Gill,Murrayand Wright 1981).The polytopemethod findsthe maximum of a function G by comparing thefunction values at the n + 1 vertices of a simplex and



Continuous-StateynamicProgram / 489

replacing the vertex with the lowest function valueuntil the simplex contracts to a final maximum. Themethod does not require evaluation of the gradientand is robust but slow (NAG FORTRAN LibraryManual 1984).

4.2. The Spline DP Algorithm

A one-dimensional cubic spline is composed of indi-vidual cubic polynomials, each defined over a subin-terval of the domain with end conditions specifiedso that overall the spline has continuous secondderivatives. Because they have excellent approx-imation power, and are easy to evaluate and manip-ulate, splines have found uses in many applications(Schumaker 1981).

The tensor product spline approximant of F,(X,)used in the spline DP algorithm consists of individ-ual multivariate cubic polynomials, each of whichis defined over a subregion of the state-spacedomain. The points that subdivide the domain aredetermined by the grid points of the state space ineach dimension, so that each polynomial is definedover one of the rectangularvolumes that make upthe state space. Thus, for X, = (xl, .. . , Xk),with xi E

[pi(ji), pi(ji + 1)], where pifji) denotes the jith state-space gridpoint that subdivides the domain in the ithdimension, the spline approximant H(x,, ..., Xk)

equals

4 4 4

E E X.. E C(il,,2,-..,jk)(l1, 12, * * , lk)(XI- )())

11=1 12=1 'k=1

(X2 - P2(j2))2 . . . (Xk -Pk(jk))k (7)

In this paper,a spline writtenin the form given by (7)is called a pp-form spline. Birnbaum and Lapidussubdivide the domain at points other than the state-space grid points.

The coefficients c(j ... jk)(ll, ..., ik) of the pp-formspline are determinedby requiringthat it interpolatethe values of F4[X,]at each state-space grid point. In

addition, the first and second derivativesof the cubicpolynomials defined within each volume arerequiredto match at the boundariesof the rectangularvolume

so that the resulting spline is smooth with second-degree continuity. Given these conditions, there aretwo extra degrees of freedom per grid line. In thespline DP algorithmthese were resolvedby using not-a-knot splines (de Boor). Thus, in a one-dimensionalproblem, with an N-point grid defining N - 1 inter-vals, only N - 3 separate cubic polynomials would be

defined; the second and second-to-last points wouldbe used only to specify function values that the firstand last cubic polynomials must interpolate. Such

one-dimensional splineshavethe same approximatingpower as cubic Hermitepolynomials (de Boor).Whennot-a-knot end conditions areused in eachdimension,the resulting spline has [4(N - 3)]k coefficients

CJ1. .k)(ll . . * , /k), where]j = 1, . . ., N- 3 and i =

1, ... i 4 for all i.In (7), the cubic spline is written as a polynomial in

powers of the components of a point X, and the

coefficientsc(j,.,jk)(J1,. . . , ik) are determinedby bothsmoothness requirements and interpolation condi-tions. Every cubic spline defined by (7) can also bewritten as a combination of one-dimensional cubicsplines, called B-splines(de Boor), which takes valuesbetween 0 and 1. Suppose that [ps, . . ., PN] representthe points that subdivide the domain [a, b] of a one-dimensional cubic spline. To define a sequence of B-splines over the interval [a, b], the points [P5, . . ., PN]

must be augmented by an additional 8 points at orbeyond the endpoints a and b. Because their exactplacementdoes not affect the accuracyof the approx-

imation, these points may be picked arbitrarily (deBoor); The cubic B-splineBi associatedwith point pi

is zero over all intervals except four adjacent ones.Figure 1 shows B-splines Bi and Bi+,1on the inter-val [a, b]; the points [pi, - . , Pi+5] are equallyspaced for convenience. Any one-dimensional cubicspline whose domain is defined by [a, P5, . .., PN, b]can be represented as a linear combination of theB-splines B1, ..., BN. A more complete discussionof B-splines, including methods for their efficient

evaluation, can be found in de Boor (1978) andJohnson (1989).In a k-dimensional, continuous-stateDP problem,

where each state variableis discretizedinto N levels,the spline approximantH(xl, . .. , Xk)with not-a-knotend conditions can also be written as

N N N

H(x, .xk) - E . . . a(i, i2, . ,ik)i1=1 i2 1 ik= I

*Bi(x)Bi2(x2)... Bik(xk), (8)

where Bi,(xi) represents the jth B-spline in the ithdimension. A splinewritten in the form in (8) is called

a B-form spline. Because each B-spline has therequired second-degree continuity, only the condi-tions that H(-) interpolateF,+1.] at each state-spacegrid point must be specified to determine the Nk

coefficientsa(i,* . ., ik,). In Section6, the performanceof the spline DP algorithm is examined using boththe pp-form and B-form spline for evaluation. In

Section 7, only the pp-formspline is used.

Using eitherrepresentation,calculationof the coef-ficients of a one-dimensional cubic spline whose




Cubic B-Splines Bi

{pi}- points that subdivide the domain [a,b]

Pa

p5 P Pji+ Pi+2 Pi+3 Pi+4P i+5 PN b

N+t

P2 tN+2

P3 PN+3

P4 PN+4

Figure 1. Some of the cubic B-splines Bi defined on an interval [a, b].

second derivatives must match at every grid pointrequires solution of a set of tridiagonal (or almost

tridiagonal) linear equations to obtain the values ofthose second derivatives,and implicitlythe firstderiv-atives as well. With multidimensional tensor productcubic splines, the required second derivatives areobtained by solving a sequence of sets of linear equa-tions, one set foreach dimension (Pereyraand Scherer1973, de Boor 1978, Johnson, 1989). Because thespline approximation is not local, the coefficients ofthe spline arecalculated at the beginningof each stage,then stored for use in finding the value of F1+1 t anypoint where it is needed.

The second-degree continuity of the cubic splineapproximant of F, is very important because it allows

the use of efficient Newton-type or quasi-Newtonalgorithms (Gill, Murray and Wright) when solvingthe optimization problem in (3) at each grid point.The spline DP programuses the sequential quadraticoptimization routine E04VCF (NAG FORTRANLibrary Manual). A sequential quadratic program-ming algorithm is an active set method in which thedirection for a one-dimensional line search at eachiterationis determined by solving a quadraticprogram(NAG FORTRAN Library Manual). The objectivefunction of the quadratic program is a quadraticmodel of the Lagrangianassociatedwith the optimi-zation problem at the currentestimateof the solution,

and requiresthe evaluation of the objective functionand its gradientat that point. The value of any non-linear constraints and their Jacobian must also bedetermined.An approximation to the Hessian is builtup as the iterationsproceed.

The solution of the optimization in (3) at eachstate-space grid point X, using the quasi-Newtonalgorithm requires the evaluation of the objective

ElBt[Xt,Rt, Qt] + Ft+,[Xt+,]}and its gradient at eachiteration. For the test problemsconsidered here,which

have linear constraintsand dynamics, constraints on

Xt+i were converted into constraints on Rt and the

optimization was solved for Rt. In problems withnonlinear dynamics or constraints, it may be morenatural to solve for both Rt and Xt+1n the optimiza-tion, subject to the additional constraintsXt+,= g(Xt,Rt, Qt) (Gill, Murray and Wright).In eithercase, afterthe spline coefficients have been determined andstored for a given stage t, the value of Ft+1(Xt+1)ndits gradientare easily calculated using tensor productmethods for any Xt+1= g(Xt, Rt, Qt)(de Boor 1978,Johnson 1989).

To increase the efficiency of the DP algorithms, thebest available solution is used to start each numericaloptimization to compute Rt in (3). When the optimi-

zation problems at each stage are similar, the approx-imate Hessian from the previous stage is also used tostarteachoptimization afterthe firststage n the splineDP algorithm.

Concavityof Ft+1[Xt+l]oes not ensure the concav-ity of a cubic splineapproximant.Identification of theglobal optimum in (3) is not guaranteed by a quasi-Newton algorithmif the cubic spline approximationto Ft+,[Xt+l]s not concave. If the spline approxima-tion of Ft+, Xt+i]fails to be concave when Ft+1[Xt+1]is, it may be an indication that the spline is providinga poor approximation or that the optimizing algo-rithm is terminating unsuccessfully, introducing

spurious values of Ft+1 Xt+ . A concavity checker,described in the Appendix, was developed to detectsuch problems and to verify the in-progress per-formanceof the spline DP algorithm.

4.3. The Gradient Dynamic Programming(GDP)Algorithm

CubicHermitepolynomialsare the multivariatecubicpolynomials within each rectangular state-spacevol-ume that match the given values of Ft[Xt] and its




gradients at the vertices of the hypercube.When (3) is

solved in the GDP algorithm, the gradient of FJ[X,]with respect to the state variableX, at each grid point

is also calculated and stored. Because these Hermite

polynomials provide a local approximation, it is pos-

sible to calculatethe coefficients for each polynomialonly when they are needed for interpolationin (3).

Our implementation of Foufoula-Georgiou andKitanidis's GDP algorithm employed their

FORTRAN program modified to use the same NAG

quasi-Newtonroutine E04VCF as the spline DP pro-

gram. Such algorithms generally perform well evenwhen the objective function does not have continuous

second derivatives (Gill, Murray and Wright), as is

the case here because the second derivatives of the

Hermite polynomials are not continuous at the faces

of the hypercubes defined by the state-space grid.The GDP program recomputesthe coefficients of the

Hermite polynomial each time an interpolationoccurs except when successive interpolations are

within the same hypercube.

5. COMPUTATIONAL FFORT FORMULTIVARIATEPLINE INTERPOLATION

Of interest is the computational gain that can be

achievedby using higher-order nterpolation schemes.Equation 4 indicates that the effort requiredto solve

a DP numericallyat each stage equals the number ofgrid points Nk at which (3) must be solved times the

effort required to solve (3), denoted W(p, k). HereW(p, k) is essentially the number of times the objec-

tive function EIB,[X,,Rt, Qt]+ F,+I[Xt+I]Ind perhapsits gradientmust be evaluated times the effortrequired

to evaluate that function. In most applications, the

expectation is computed by evaluating its argumentat a set of discrete points. If Bt[Xt,Rt, Qt]is a simple

and easy to compute analytical function of its argu-

ments, then interpolatingthe value of F.t+1[Xt+1]anbe the costly step of the objective function evaluationin multidimensional problems.

Computationaleffort is often measured in floatingpoint operations or flops (a floating point multiplyor divide and an associatedfloating point addition orsubtraction, plus any indexing;Golub and Van Loan1983).With this metric, the effort required o compute

the value of a k-dimensional multilinear interpolantis [k + 2(2k 1)] flops (Johnson 1989).

5.1. Effort for Tensor Product PP-FormSplines

The effort requiredto compute an approximationto

Ft[Xt] using tensor product cubic splines can be

divided into three parts. The LU decompositions of

the coefficient matrices needed to solve for the second

derivatives of the spline function only need to becalculated once for a given state-space grid, and may

be determined at the start of the algorithm unless the

grid changes (Daniel).Then, at the beginning of each stage, the coefficients

C(jU, ,jk)(l1, ... Ik) of the interpolatingcubic polyno-mials in (7) need to be calculated given the new values

F,+,[X,+1].The coefficients could be calculated by

directly solving the Nk linear equations generatedby the interpolationconditions. However, the coeffi-

cients are more efficiently computed using tensorproduct methods, which consider one dimension at a

time and take advantage of special structure. Solvingthe requiredequations in this calibrationstep to deter-

mine the coefficients requires approximately 4.67Nk(4k _ 1) flops (Johnson).

Finally, evaluation of the interpolatorycubic splinerequires 4k - 1) flops (Johnson). Because this evalu-ation must be done many times to compute the expec-tation in (3) at each iteration of the numerical

optimization algorithmfor each of the Nkgrid points,it dominates the effort required for calibration.

After the cubic polynomials' coefficients have been

computed and stored, evaluating the spline interpo-lant requires 4k - 1)/[k+ 2(2k - 1)]or roughly 2(k-1)

times more effort than computing the value of a

multilinear interpolant. For k = 1 there is no differ-

ence in effort,whereas for k = 4 the factoris nearly8.

5.2. Effort for Tensor Product B-FormSplines

The coefficients a(il, . . ., ik)of the B-form spline in

(8) can also be calculated using tensor productmethods. The effort requiredto calculate the coeffi-cients a(i,, . . . , ik) at each stage is kN(k- )[5N - 4]

flops (Johnson).The calibration effort associated with the B-form

spline is substantially less than that associated with

the pp-form spline, but the dominant evaluation costsare six times larger,approximately6(4k - 1) flops perfunctionevaluation(Johnson).Becauseof the number

of times evaluation is requiredwhen solving (3), useof the piecewise polynomial representation in (7)yieldsa more efficient algorithm. However,the storagerequirements for the coefficients of the B-splinerepresentation of the spline, equal to 8Nk byteswith double precision arithmetic, can be substantiallyless than the 8[4(N - 3)]k bytes required when the

interpolant is expressed as a piecewise polynomial(Johnson).

An advantageof the B-spline representation s that




it is both general and flexible. The coefficients of thepiecewise polynomial approximation can be calcu-lated efficiently by converting from the B-spline rep-resentation (de Boor 1978, Johnson 1989) inapproximatelyNk (4k - 1) flops (Johnson). The

spline DP algorithm computes the coefficients ofthe piecewise polynomial splines by convertingfrom the B-spline representation.

5.3. Advantages of Splines

While the computation of the spline approximationtakes more effort than multilinear interpolation as k

increases, cubic splines should provide a more accu-rate approximation, as would Hermite polynomials,thus allowing a coarserstate-space grid correspondingto a smaller value of Nto be employed. This can makeup for the factor of roughly2(k-1) ncrease n evaluationeffort. If N could be halved by use of cubic splines, a

computational savingwould always result.As noted in Section 4, the use of efficient quasi-

Newton optimization methods made possible by the

smoothness of the spline interpolantcan significantlyreduce the effort W(p, k) requiredto solve (3) numer-ically. The value of F,+, and its gradient are required

when evaluating the expectation in (3) at each itera-tion of a quasi-Newton algorithm. The evaluation ofthe cubic spline and its gradientwith respect to X,+1requires .67 4k-1) flops(Johnson), rapproximately2.67(2(k-1)) times more effort than required o evaluatea multilinearinterpolant.

6. COMPUTATIONAL ESULTS FOR A SAMPLEFOUR-RESERVOIR ROBLEM

The performance of the spline DP, GDP, and multi-linear DP algorithms was compared on a seriesof stochastic water supply problems to examine theissues of accuracy, efficiency, and the advantages ofsmoothness.

6.1. Four-Reservoir Test Problem

The sample four-reservoir ystemdescribed hereis the

same test problem used by Foufoula-Georgiou andKitanidis and has served as a benchmark for testingnumerical methods in the water resources literature(Yakowitz 1982). Let R(t) be the decision vector(release) and S(t) the state vector (beginning storage)forperiodt. The inflows to reservoirs1and 2 in periodt areassumedto be independent and denoted by ql(t)and q2(t), respectively. The releases from reservoir2flow into reservoir3, and the releasesfrom reservoirs1 and 3 flow into reservoir 4. The state transition

equationsare:

S1(t + 1) = SI(t) - RI(t) + ql(t)

S2(t + 1) = S2(t) - R2(t) + q2(t)

(9)

S3(t + 1) = S3(t) - R3(t) + R2(t), and

S4(t + 1) = S4(t) - R4(t) + [RI(t) + R3(t)].

The cost function associatedwith current period oper-

ations is

B[S(t), R(t), q(t)] = E Ci(t)[Ri(t) - 1]2. (10)i=l, . 4

The terminal cost

C[S(t + 1)] = E [Si(T + 1) -_ M]2

i=l. . . ,4

is incurred at the end of the operating horizon, where

mi is the desiredvolume of waterin reservoiri at the

end of period T. Consideringthe constraints on both

decision and statevectors,the problemis to determinethe policyRi(t) (i = 1, . . ., 4; t = 1, . . ., T) thatminimizes the objective function in (2) subjectto

0 _ Sp(t) < STa(t)

for i = 1, ..., 4; t = 1, ..., T (hla)

R in(t) < Ri(t) < Rl ax(t)

for i 1,.. ., 4; t = 1,..., T (1 lb)

and the state transitionequations (9).Following Foufoula-Georgiou and Kitanidis, T = 3

operating periods are considered,the maximum stor-age ST'(t) for all reservoirs is 12, and the desired

storagesmi at the end of the horizon equal (5, 5, 5, 7).The cost coefficients Ci(t) are (1.1, 1.2, 1.0, 1.3) for

all t.

In the deterministic version of the problem, the

inflows ql(t) and q2(t) have values (2,4) in all

periods. In the stochastic version, the inflows areindependent lognormal random variables, with

means (2, 4) and standard deviations (1.5, 1.5).Both continuous inflow distributions were replacedby a three-point discrete approximation with proba-bilities of 1/6, 2/3, and 1/6 assigned, respectively,tothe 5, 50, and 95 percentiles.The releasesRi(t) wereconstrained so that Si(t + 1) satisfied (1 la) for each

discrete inflow.

6.2. Results

Both the deterministic and stochastic versions of the4-reservoirproblem were solved using the three inter-polation schemeswith the discretization level N rang-ing from 3 to 17 points per dimension. The optimal




solution to the continuous deterministicproblemwas

determined using nonlinear programming,and used

to measure error in discretized problems. The first-

period value function obtained from the 17-point/dimension splinerundefinedthe true or best available

solution in the stochasticcase and was used to definethe errors n the value functions calculatedusingother

values of N. At a particularpoint in the state space,

the error is calculated by dividing the difference

between the true value function and the calculatedvalue function by the true value function, then taking

the absolute value. The overall erroris calculatedby

averagingover the errorscomputed at all the individ-

ual points in the state space. The errorwas examined

forsystematicbias,whichwouldinflate errorestimates

but is not important operationallybecause the shapeof the value function determines a policy. Bias was

not found to be significant (Tejada-Guibert 1990).

TableI displays the resultsfor the stochastic problems.The CPU times are in seconds on the Cornell National

Supercomputer Center's IBM 3090-600E (Johnson,Stedingerand Shoemaker 1988).

As indicated by a comparison of the fourth and fifth

columns of Table I, use of the B-spline representationfor evaluation in the spline DP algorithm instead ofthe piecewise polynomial representation ncreased theactual run time by approximately a factor of 3. Thisincrease is less than the factor of six suggested by the

analysis in Section 5, but clearly indicates the com-

putational advantage associated with using the piece-

wise polynomial representation.For a given discretization level N, the spline DP

program (using pp-form splines) actually ran fasterthan the multilinearDP program, despite the roughly20-fold increase in the number of flops requiredto dothe spline interpolation and to calculate the spline'sgradient analytically. This result is due to the greater

efficiency of the quasi-Newton algorithm, whichemploys the calculated gradients to select a searchdirection and to build an approximation of the

Hessian matrix. Of greater significance, for the same

discretization level, the spline DP program yieldedsolutions that were approximatelytwo ordersof mag-nitude more accurate than the solutions obtained

using the multilinear DP program.

Figure 2 compares the average of the relative abso-lute errorswith which multilinear, Hermite GDP, and

pp-form splineinterpolants n a DP algorithm approx-imated the optimal value function FI(S(l)) as a func-tion of total CPU time. Results forboth the stochastic

and deterministic versions are presented. The use of13 points per dimension with multilinear interpola-tion yielded an averageerrorof 1.09% n the stochasticversion of the problem,whereas he spline DP solutionwith 4 points per dimension had an error of 1.02%.If

a problemcan toleratea 1%error n the optimal value

function,the splineDP algorithmcan yield the answerwith 255 times less effort; for an 0.5% averageerror

the gain increasedto 330 times. On the deterministicversion of the 4-reservoir problem, the spline DP

algorithm yields a solution with 1% error 180 timesfaster than the multilinear DP program, and for an

0.5% error was 280 times faster. While results for

deterministic and stochastic versions of the problemare qualitatively similar, the gains are greaterin the

stochastic problems because more interpolations are

requiredper iteration.The dramatic savings in CPU time associated with

the DP spline algorithm is due both to the greater

accuracy of the spline and the ability to use quasi-

Newton methods to determine optimal decisions. Forthe 4-, 5-, and 7-point discretization schemes in the

stochastic problem, the spline DP algorithmwith the

quasi-Newtonroutine E04VCF ran approximately10times faster than when the polytope routine E04CCFwas employed, despite the extra effort required tocalculate the gradientsanalytically (Tejada-Guibert).In a case with a 250-fold reduction in CPU time, thisimplies that a 10-fold reduction is associated withthe use of a quasi-Newton algorithm and a 25-fold

Table IAverage Relative Absolute Error and CPU Time for Solution of the Stochastic 4-Reservoir Test Problem

Cubic SplinesN Multilinear

B-Form PP-FormHermite GDP

Pts./Dim. CPU(s) Error(%) CPU(s) CPU(s) Error(%) CPU(s) Error(%)

3 25 52.527 28 2.7794 64 21.733 72 26 1.024 87 0.4785 145 12.859 181 63 0.270 212 0.1697 462 5.385 641 225 0.087 770 0.0619 1,096 3.126 1,610 604 0.052 2,124 0.018

13 4,043 1.092 6,976 2,444 0.009 12,800 0.00717 10,520 0.860 6,950 0



494 / JOHNSONET AL.

Errorvs. CPU Time for 4-Reservoir Problem

100 -

10-U ()ML-St

SP-St

Rel e A ( ) poGDP-St

Error A s ML-Det% 0.1-

A A SP-DetI ~~~~~~A

0.01 ~-A* GDP-Det

0.001CP

F'igure2. Averagerelative errors n the optimal value function are plotted againstthe CPU time for solutions ofthe 4-reservoir est problem with quadratic penaltiesover a rangeof values of the points/dimension N,and use of either multilinear (ML), pp-form spline (SP), or Hermite polynomial interpolation(GDP)for both deterministic(Det) and stochastic (St) problems,as noted.

reduction is associated with the increasedaccuracy ofthe spline.

The GDP algorithmusing the derivatives of Ft[Xt]and Hermite polynomials with a quasi-Newton opti-mization scheme required more effort than use ofsplines and linear interpolationfor the same discreti-zation level N, but yieldedsolutionswith greateraccu-racy than the spline DP program. Figure 2 shows that

for the same effort the GDP algorithmusing Hermite

polynomials was substantially more efficient thanmultilinear nterpolation,but generallynot as efficient

as the spline DP algorithm.An advantage of the use of splines over the use of

GDP with Hermite polynomials is that a splineapproximation is constructed in much the same wayas a multilinear approximation, allowing the devel-

opment of spline subroutines that can be incorporatedinto a computer program with no greater difficultythan subroutines for multilinear interpolation. Bycontrast, the Hermite polynomials require that thegradient of the optimal value function F(X) withrespect to the decision vector R be computed, whichrequires a more elaboratecomputer code. Such pro-gramming

decisions play a larger role in DP applica-tions because general purpose algorithms are notavailable.

7. OTHER EXAMPLES

7.1. Effect of State Variable Dimension on CPURatio

To illustrate the relative change in the performanceof the spline and multilinear DP algorithms with

problem dimension, the series of 2-, 3-, 4-, and5-dimensional problems described in Table II wascreated (Tejada-Guibert). They correspond to the4-dimensional problem described in Section 6 withthe addition or deletion of reservoirs in order togenerate problems with state spaces of differentdimensions.

Figure 3 contains plots of the averagerelative errorsversusthe CPU time for solutions obtained using themultilinear DP and spline DP algorithms(using thepp-form spline) for the cases in Table II. Table IIIreportsestimatesof the ratio of the CPU time requiredto solve these problems with a 0.5% average errorusing the multilinear DP and the spline DP algo-rithms. The numerical advantage of splines increasesrapidly with problem size, although not as quickly as(4) might suggest.

7.2. Effect of the Smoothness of the ObjectiveFunction on the CPURatio

To test the effect of the smoothness of the sampleproblems on the results, the quadratic penalties ondeviations of releasesand final storage volumes from

the specified targets in (10) were replaced in the 2-and 4-reservoirproblems by absolute values to gen-erate two new test cases (Tejada-Guibert)where

B[S(t),R(t),q(t)]= E CiIRi(t)- 1I and (12a)i=l_., 4

C[S(T+ 1)] = Si(T+ 1) - mi. (12b)

Because the currentperiod objective function (12a) ispiecewise linear, the transition equations are linear,and the problem is constrained, the value function




Table IISequence of DP ReservoirProblems

DP FunctionalEquation

Ft[S(t)] = min Eq(I){B[S(t),R(t),q(t)]+ Ft+,[S(t + 1)]}R(t)

CurrentPeriodObjectiveFunctionB[S(t),R(t),q(t)]For 3-, 4- and 5-reservoir roblems: jCi*(R,(t)- 1)2

For2-reservoir roblem: RI(t)+ R2(t) - 2)2

TerminalValue functionC[S(T+ 1)]= [S,(T + 1) - m,]2; T = 3Cases

2: 2 reservoirsn parallel. CI = (1.0, 1.0), Imi}= (5.0, 7.0).3: 2 reservoirsn parallel,1reservoir ownstream fterconfluence.{C} = (1.1, 1.2, 1.3),{mi} (5.0, 5.0, 7.0).4: 1 reservoirn parallelwith respect o 2 reservoirsn series, ourthreservoir ownstreamromconfluence. CI}= (1.1, 1.2, 1.0,

1.3), {miI= (5.0, 5.0, 5.0, 7.0).5: As for 4-reservoir roblem,with fifth reservoir elow the fourth.{CI} (1.1, 1.2, 1.0, 1.3, 1.1), Im} = (5.0, 5.0, 5.0, 7.0, 7.0).

Constraints

Sm'i(t)= 0 and Si"(t) = 12forall reservoirs andperiods .

Imn(t)= 0 for all i and t; releasesare also constrained o that S(t + 1)remainswithin ts bound for all levels of the discrete nflow.

Ft(X,)is composed of linear hyperplanes. Hence, themultilinear interpolant may conceivably provide a

better approximation of F,(X,) than a smooth spline.

However, when the value function is made up of a

great many hyperplanes and the boundaries of eachhyperplaneare not identified, the tensor product cubic

spline might approximate the value function betterthan the multilinear interpolant.

Figure4 contains plots of the average relative errorsversus CPU time obtained by solving the 2- and 4-

reservoir problem with absolute value penalty func-

tions. For the 2-reservoir problem, the spline algo-rithm has trouble approximating the optimal value

function reliably when there are few discretizationlevels. However, the spline DP algorithm performed

significantly better than the multilinearDP algorithmwhen the state space contained 9 or more points perdimension. The Appendix describes the results of

using the concavity checker to verify the behavior ofthe spline approximationin this problem.

In the 4-dimensional problem, the spline DP algo-rithm always performed better than the multilinearDP algorithm, although neither algorithm producedsolutions that were as accurate as those achieved forsimilarCPU times when the quadraticobjective func-tion (10) was employed. Because therearemany inter-polation points in a 4-dimensional problem even forsmall values of N, the spline DP was better behavedand for an averageerror of 2% provided the solution40 times fasterthan the multilinearalgorithm.

Errorvs. CPU Time for 2-, 3-, and 5-Reservoir Problems

100 vU A

U A

10 A

0.1 - A AA 3L2o A

A ~~~~~~A A ML3Avg. 1A* MLRel.

Error a u agn PU

0.1 AAP

D SP 50.01 A

A

0.001 ~~~~~~~CPUsec) 1,0

Figure3. Averagerelativeerrorsin the optimal value function are plotted againstthe CPU time for a rangeofvaluesof the points/dimension N obtainedusing multilinear(ML) and pp-form spline(SP) interpolation

for solutions of the 2-, 3-, and 5-reservoir est problemswith quadratic penalties, as noted.




Table IIIRatio of Multilinear to Spline CPU Times for

0.5% Relative Error

Number of Reservoirs Ratio

2 25

3 110

4 330

5 400

8. SUMMARY

Employing multivariate,cubic, piecewise polynomialfunctions to approximate the value function in thenumerical solution of continuous-state dynamic pro-grams can reduce the computational effort requiredto solve such problems because of the ability to use acoarser grid and to employ efficient quasi-Newtonoptimization algorithms. Using a test problem thathas served as a benchmark in the waterresourcesarea,we compared the performance of our spline DP algo-

rithm, which uses the piecewise polynomial form ofthe spline approximant,to algorithmsemployingmul-tivariate Hermite polynomials and splines in B-splineform. Use of our piecewise polynomial representationdecreased the CPU time by a factor of 3 over thatrequired by the B-spline representation proposed inthe literature. For the same computational effort, thesolution to the 4-dimensional test problem obtained

using the spline DP algorithm had slightly greater

accuracy than the solution obtained using the GDPalgorithm that employs multivariate Hermite poly-nomials. Moreover,becausespline interpolantsdo notrequire that the gradient at each grid point be calcu-

lated and stored,the spline DP algorithm has the samesimple structureas an algorithm that employs tensorproduct linearinterpolation.

Computational experiments on a series of watersupply reservoirproblems of varying dimension and

two differentvalue functions were carriedout to inves-tigate the impact of these parameterson computa-tional savings. For smooth problems, the observedcomputational advantageof the spline DP algorithmrelative o the multilinear DP algorithm ncreasedwiththe state-spacedimension and the requiredaccuracy.For the 4-reservoir stochastic test problem with aquadraticpenalty function, the computationalsavingswere in excess of a factor of 250 for a 1% averagerelativeerrorand 330 for a 0.5 %averagerelativeerror.

The spline DP algorithm outperformed the multi-linear DP algorithm on test problemswhere the opti-mal value function was piecewise linear, althoughthecomputationaladvantagewas not as significantas forthe case with a quadraticobjective function. Neitherthe spline DP nor the multilinear DP algorithmpro-duced very accurate solutions for the 4-dimensionalstochastic test problem with an absolute value costfunction; the computational savings were approxi-mately 40 for an averagerelativeerror of 2%.

The computational studies reported here suggestthat significant computational savings may beachieved when tensor product, cubic spline interpo-lation is used to develop numerical solutions to con-tinuous-state dynamic programs, particularly those

with several continuous state variables and smoothobjectiveswithcurvature.In Tejada-Guibert,Johnsonand Stedinger(1993) and Tejada-Guibert, he splineDP and multilinear DP algorithms were used to

Errorvs. CPU Time for Absolute-Value Penalties

100 .

10

* * ~~~~~~~~~~~~ML2Avg. E *MRel. MI

Error E)0 IC>P2

ElLSP4

0.1 C

0.01 - _ I

0 1 10 100 1,00 10,00

CPU (see)

Figure4. Average relative errors n the optimal value function are plotted againstthe CPU time over a rangeofvaluesof the points/dimension N obtainedusing multilinear (ML) and pp-form spline(SP) interpolationfor solutions of the 2- and 4-reservoir est problemswith absolute value penalties,as noted.




generate operating policies for the two-reservoir

Shasta/Trinity system in northern California.When

energy targets were high, and large penalties were

placed on water and power shortages,the spline DP

algorithmgeneratedvaluefunctions with substantially

smaller error. However, for relatively flat objectiveswith moderatewater and power targets, he splineand

multilinearDP algorithmsproducedsolutions of sim-

ilar accuracy or the same effort. In other cases,savings

may also be significantwhen the value function con-

sists of many hyperplanesapproximatingan almost

smooth surface.EmployingHermite polynomials will

likely result in similar computational savings.

Althoughonly demonstratedin this paperfor a back-

ward recursionalgorithm,approximationof the value

functionwith tensorproductcubic splinesshould also

yield improvementsin policy iterationand linearpro-

grammingmodels.

APPENDIX

A Concavity Checker for the Spline DP Algorithm

(J. AlbertoTejada-Guibertand JeryR. Stedinger)

To demonstrate the concavity of a function it is suf-

ficient to show the negative semidefiniteness of its

Hessian matrix of second partialderivatives Sydsaeter

1981). Demonstratingthe concavity of the entire sur-

face F,(X,) would be a large numerical task because

the entire state space would have to be examined.

However, interactions among many of the compo-

nents of reservoirsystemsand other systems are often

weak. If the objective in the optimization in (3) were

composed of separable erms, it would be sufficientto

verify the concavitypropertiesof FJXJ] n eachdimen-

sion separately.Our concavity test considerswhether

the second derivatives of F4[X,]with respect to each

state variable at each state-spacegrid point are nega-

tive. Second partial derivatives of cubic splines vary

linearly between grid points and cannot become pos-

itive in the interior of a hypercubeunless they arealso

positive at a vertex.

A Nonconcavity IndexIf nonconcavities are present in the spline approxi-

mant of F[X,], an index is needed to identify situa-tions likely to pose a problem. The desirableproperties

of an index include: ease and economy of computa-tion, unitlessness and scale invariance, and generalityso that solutions to different problems of differentdimensions can be compared. For a k-dimensionalDP problem, let Xim,mnd Xi,min, = 1, . . . , k, be the

maximum and minimum valuesof each state variable

in the rectangularstate space, and L = Xi,max Xi,min

be their range. If F' is the second derivative of

FJ[XJ],the averagepositive second partial derivative

with respect to a given state variableXi (the subscript

t has been dropped) over a k-dimensional state space

equals

[f ]8+)Fl

Xk,min Xl,min

where V5Ss the state-spacevolume equal to (fli Li for

i 1, ..., k), and [FlX](+)s the value of the second

derivativeof F(X) with respect to Xi if it is positive,

and zero otherwise.This expression is not unitless. A reasonable nor-

malizing factorcould employ the rangeLi, i = 1,.

k, squared,and

MF = [Fmax Fm%], (A.2)

whereFmaxs the maximum value of F and Fm%s the

mth quantile (m% of the values of F are higherthan

Fm%).Here Fm%s employed in (A.2) ratherthan the

minimum value of F because penaltiesat boundaries

or on infeasible conditions can result in unusual F-

values that would result in large and unusual MF.

To translate he index (A.1) nto a computable form,

the integralwas discretizedusing a trapezoid-likerule

so that the second derivativeat each of the 2k vertices

in a k-dimensional problem has a weight of 1/2k

towardthe value of the integral.For example, for a 2-

dimensional (k = 2) problem, the index becomes

I 21

nl-1 JA 1jhA.2,jh

F VS+ h= = 1 2k [([Fli] )j,h

+ ([Fh](d+)%+1,h + ([F]ll(+)),h+l

+ ([F!J](+))j+lh+l]j for i = 1and 2, (A.3)

where n1 and n2 are the number of discretization

intervals for state variables X1 and X2, respectively,and I1Ajh and A2,j,h are the length of each hyper-

cube along dimensions X1 and X2. There are

(n, - 1)(n2 - 1) hypercubes in a 2-dimensional

state space.Using the normalizing factors and the averages

[F1'J(+),he overall nonconcavity index is:

NCI = (A.4)

Results

Using the median of the F-values, F50%,o compute

MF,numericaltestsindicated thatthe NCI index (A.4)




Table IV

Nonconcavity Tests Results for the Two-Reservoir

Problem With Absolute-Value Penalties

Discretization AREa

Level N NCI (%)

4 1.230 11.005 0.0518 0.523

0.0917 1.759 0.0153 0.122

13 1.400 0.59113 0.0238 0.099917 2.510 0.46317 0.0203 0.072425 0.292 0.050833 0.867 0.053549 0.220 0.023765 0.427

aAverage relativeerrors(ARE) re included using the 65discretizationevelrunas base case.

provided a consistent and general measure of non-

concavity on a series of two- and four-dimensional

problems,for different patternsand densities of state-

spacediscretizationgrids.Our experience ndicatesthat

NCI > 1.0 corresponds to unsatisfactorysituations,

and that problems with NCI < 0.1 did not have

computationalproblems.Table IV shows resultsfor the 2-reservoirabsolute-

value penalty problem reportedin Figure4. For N =

4 (NCI = 1.23,ARE = 11.0%), he splines seem unable

to describe the value function well. There aretwo runs

for N = 13 and 17; in each case the first experienced

unsuccessful terminations in the optimization steps.They have high NCI's and also large errors,ARE. The

N = 7 case exhibits low nonconcavity, but a relatively

highARE. However, this problemwas found to have a

high NCI after the firststage;the bias from the initial

errorsdid not disappear.Ourexperiencewasthat NCI

was useful in identifyingcases where something went

wrong, and as a consequence had largererrors n the

resultantapproximationof F[X].The CPU time requiredby the concavity checker

was 5 %or less of the averagetime taken to run each

stage of the spline DP algorithm for the problem

tested, with a slight additional memory requirement.

The run-time cost of checking concavity thereforeissmall, particularly n view of the benefitsof obtaining

some assurance that the algorithm is working as

anticipated.

ACKNOWLEDGMENT

The authors thank E. Foufoula-Georgiou and P.

Kitanidis forgenerouslyprovidingthe originalversion

of their GDP code. This research was supported by

National Science Foundation grant #CEE-8351819,Pacific Gas and Electric Company of San Francisco,

and a fellowship from the U.S. Army Mathematical

Sciences Institute at Cornell. Computing resourceswere provided by the CornellNational Supercomputer

Facility, a resource of the Cornell Theory Center,which receives major funding from the NationalScience Foundation and IBM Corp., with supportfrom New York State and members of the CorporateResearch Institute.

REFERENCES

ARVANITIDIS, N. K., AND J. ROSING. 1970a. Optimal

Operation of Multireservoir Systems Using a Com-

posite Representation. IEEE Trans. Power Appar.

and Syst. PAS-89, 327-335.

ARVANITIDIS, N. V., AND J. ROSING. 1970a. Composite

Representation of a Multireservoir HydroelectricPower System. IEEE Trans. Power Appar. and Syst.

PAS-89, 319-326.

BEAN, J. C., J. R. BIRGE AND R. L. SMITH. 1987. Aggre-

gation in Dynamic Programming. Opns. Res. 35,

215-220.

BELLMAN, R. E. 1957. Dynamic Programming. Princeton

University Press, Princeton, N.J.

BELLMAN, R. E., AND S. DREYFUS. 1962. AppliedDynamic Programming, Princeton University Press,

Princeton, N.J.

BELLMAN, R. E., R. KALABA AND B. KOTKIN. 1963.

Polynomial Approximation-A New Computa-

tional Technique in Dynamic Programming. Math.

Comp. 17, 155-161.BIRNBAUM, I., AND L. LAPIDUS. 1978. Studies in Approx-

imation Methods-I: Splines and Control Via Dis-

crete Dynamic Programming. Chem. Eng. Sci. 33,415-426.

BRAGA, B. P. F., W. W. YEH, L. BECKER AND M. T. L.

BARROS. 199 1. Stochastic Optimization of Multiple-

Reservoir System Operation. J. Water Resour. Plan.

and Mgmt. 117 (4), 471-481.

BRAS, R. L., R. BUCHANAN AND K. C. CURRY. 1983.

Real Time Adaptive Closed Loop Control of Reser-

voirs With the High Aswan Dam as a Case Study.

WaterResour.Res. 19 (1), 33-52.DANIEL, J. W. 1976. Splines and Efficiency in Dynamic

Programming. J. Math. Anal. & Appl. 54, 402-407.DE BOOR,C. 1978. A Practical Guide to Splines. Springer-

Verlag, New York.

FEDERGRUEN, A., AND P. J. SCHWEITZER. 1979. Dis-

counted and Undiscounted Value-Iteration in

Markov Decision Problems: A Survey. In

Dynamic Programmingand Its Applications,M.Puterman (ed.). Academic Press, New York, 23-52.

FOUFOULA-GEORGIOU, E., AND P. K. KITANIDIS. 1988.

Gradient Dynamic Programming for Stochastic

Optimal Control of Multidimensional Water




Resources Systems. Water Resour. Res. 24 (8),

1345-1359.

GAL, S. 1979. Optimal Management of a Multireservoir

Water Supply System. Water Resour. Res. 15 (4),737-748.

GAL, S. 1989. The Parameter Iteration Method inDynamic Programming. Mgmt. Sci. 35, 675-684.

GILL, P. E., W. MURRAY AND M. H. WRIGHT.1981.Practical Optimization. Academic Press, London.

GINN, T. R. 1986. Personal communication (August).

GOLUB,G. H., AND C. H. VAN LOAN. 1983. Matrix

Computations. The Johns Hopkins University Press,

Baltimore, Md.

HAURIE, A., AND P. L'ECUYER.1986. Approximationand Bounds in Discrete Event Dynamic Program-ming. IEEE Trans. Auto. Control AC-31, 227-235.

HEYMAN, D. P., AND M. J. SOBEL. 1984. StochasticModels in Operations Research, Vol. II. Stochastic

Optimization. McGraw-Hill, New York.

HOWARD, R. A. 1960. Dynamic Programming andMarkov Processes. John Wiley, New York.

JOHNSON,. A. 1989. Spline Approximation in Discrete

Dynamic Programming With Application to Sto-

chastic Multireservoir Systems. Ph.D. Dissertation,Cornell University, Ithaca, N.Y.

JOHNSON,. A., J. R. STEDINGERNDC. A. SHOEMAKER.

1988. Computational Improvements in Dynamic

Programming. FOREFRONTS, Center for Theory

and Simulation in Science and Engineering, Cornell

University, 7, 3-7.

KELMAN,., J. R. STEDINGER,. A. COOPER, . Hsu AND

S. YUAN. 1990. Sampling Stochastic Dynamic Pro-

gramming Applied to Reservoir Operation. Water

Resour. Res. 26 (3), 447-454.KITANIDIS, P. K., AND E. FoUFOULA-GEORGIOU. 1987.

Error Analysis of Conventional Discrete and Gra-

dient Dynamic Programming. Water Resour. Res.23 (5), 845-858.

L'EcuYER, P. 1985. Computing Transfer Lines Perfor-

mance Measures Using Dynamic Programming.

Comput. and Indus. Eng. 9, 387-393.

L'EcuYER, P. 1989. Computing Approximate Solutions

to Markov Renewal Programs With ContinuousState Spaces. Research Report DIUL-RR-8912,Universite Laval, Quebec, Canada.

LOUCKS,D. P., J. R. STEDINGERAND D. A. HAITH. 1981.

Water Resource Systems Planning and Analysis.

Prentice-Hall, Englewood Cliffs, N.J.LOVEJOY,W. S. 1986. Policy Bounds for Markov Deci-

sion Processes. Opns. Res. 34, 630-637.

Mawer, P. A., and D. Thorn. 1974. Improved DynamicProgramming Procedures and Their Practical Appli-cation to Water Resource Systems. Water Resour.Res. 10 (2),183-190.

MORIN, T. L. 1979. Computational Advances in

Dynamic Programming. In Dynamic Programmingand Its Applications, M. Puterman (ed.). AcademicPress, New York, 53-90.

NAG FORTRAN Library Manual. 1984. Mark 11, Vol.

3, Numerical Algorithms Group, Downers Grove,

Ill.

PEREIRA,M. V. F., AND L. M. V. G. PINTO. 1991. Multi-

Stage Stochastic Optimization Applied to Energy

Planning. Math. Prog. 22, 359-375.

PEREYRA,V., AND G. SCHERER.1973.EfficientComputerManipulation of Tensor Products With Applications

to Multidimensional Approximation. Math. Comp.

27, 595-605.PRENTER,P. M. 1975. Splines and VariationalMethods.

John Wiley, New York.

READ, E. G. 1990. Dual Dynamic Programming for

Linear Production/Inventory Systems. Computers

Math.Applic.19, 29-42.ROEFS,T. G., AND A. GUITRON. 1975.StochasticReser-

voir Models: Relative Computational Effort. Water

Resour.Res. 11 (6), 801-804.

RUDIN, W. 1976. Principlesof MathematicalAnalysis,

McGraw-Hill, New York.SAAD, M., AND A. TURGEON. 1988. Applicationof Prin-cipal Component Analysis to Long-Term Reservoir

Management. Water Resour. Res. 24 (7), 907-912.

SCHUMAKER,L.L. 1981.SplineFunctions:BasicTheory,John Wiley, New York.

SCHWEITZER, . J., AND A. SEIDMANN.1985.GeneralizedPolynomial Approximations in Markovian Decision

Processes. J. Math. Anal. & Appl.110, 568-582.

SHOEMAKER, C. A. 1979. Optimal Timing of MultipleApplications of Pesticides With Residual Toxicity.

Biometrics35, 803-812.SHOEMAKER, C. A. 1981. The Applicationof Dynamic

Programming and Other Optimization Methods to

Pest Management. EEE Trans.Auto. Control26,1125-1132.

SHOEMAKER, C. A. 1982. Optimal IntegratedControlofUnivoltine Pest Populations With Age Structure.

Opns. Res. 30, 40-61.

SHOEMAKER,C. A., AND S. A. JOHNSON.1989. StochasticNonlinear Optimal Control of Populations: Com-

putational Difficulties and Possible Solutions. In

MathematicalApproacheso Problems n ResourceManagement nd Epidemiology,C. Castillo-Chavez,S. A. Levin and C. A. Shoemaker (eds.). Springer-Verlag, New York, 67-81.

STEDINGER, J. R., B. F. SULE AND D. P. LOUCKS. 1984.

Stochastic Dynamic Programming Models for

Reservoir-Operation Optimization. Water Resour.Res. 20 (11), 1499-1505.

SU, Y., AND R. DEININGER. 1972. GeneralizationofWhite's Method of Successive Approximations

to Periodic Markovian Processes. Opns. Res. 20,

318-326.

SYDSAETER,K. 1981. Topicsin MathematicalAnalysisfor Economists. Academic Press, London.

TAKEUCHI, K., AND D. H. MOREAU. 1974.OptimalCon-trol of Multiunit Interbasin Water Resource Sys-tems. Water Resour. Res. 10 (3), 407-414.




TEJADA-GuIBERT,. A. 1990. Spline Stochastic Dynamic

Programming or Multiple ReservoirSystem Oper-ation Optimization. Ph.D. Dissertation, CornellUniversity,Ithaca,New York.

TEJADA-GuIBERT,. A., S. A. JOHNSONAND J. R.

STEDINGER.993. Comparison of Two Approaches

forImplementingMultireservoirOperatingPoliciesUsing Stochastic Dynamic Programming. provi-sionallyaccepted).

TERRY, L. A., M. V. F. PEREIRA, . A. ARARIPE NETO,

L. F. C. A. SILVAAND P. R. H. SALES.1986. Coor-

dinating the Energy Generation of the BrazilianNational HydrothermalElectricalGeneratingSys-tem. Interfaces16, 16-38.

TROTT, W. J., ANDW. W-G. YEH. 1973. Optimizationof MultipleReservoirSystems.J. Hydraulic Eng.99, 1865-1884.

TURGEON, . 1980. OptimalOperationof MultireservoirPower Systems With Stochastic Inflows. WaterResour.Res. 16 (2), 275-283.

TURGEON,A. 1981. A DecompositionMethod for theLong-Term Scheduling of Reservoirs in Series.WaterResour.Res. 17 (6), 1565-1570.

WANG, D., AND B. J. ADAMS. 1986. Optimizationof Real-Time Reservoir Operation With MarkovDecision Processes. Water Resour. Res. 22 (3),

345-352.

WHITE,D. J. 1963. Dynamic Programming, Markov

Chains, and the Method of Successive Approxima-

tions. J. Math. Anal. & Appl. 6, 373-376.WHITE,D. J. 1969. Dynamic Programming, Holden-Day,

San Francisco.

WHITE,D. J. 1979. Elimination of Non Optimal Actions

in Markov Decision Processes. In Dynamic Pro-

gramming and Its Applications, M. Puterman (ed.).

Academic Press, New York, 131-160.

WHITE,D. J. 1985. Real Applications of Markov Deci-

sion Processes. Interfaces 15, 73-83.

WHITE,D. J. 1988. Further Real Applications of Markov

Decision Processes. Interfaces 18, 55-61.

WHITT,W. 1978. Approximation of Dynamic ProgramsI and II. Math. Opns. Res. 3, 231-243 (see also 1979,

179- 185).

YAKOWITZ, . 1982. Dynamic Programming Applica-

tions in Water Resources. Water Resour. Res. 18,

673-696.

Numerical Solution of Continuous-state Dynamic Programs Using Linear and Spline Interpolation

Documents

Transcript of Numerical Solution of Continuous-state Dynamic Programs Using Linear and Spline Interpolation