Zietz, 2007

download Zietz, 2007

of 23

Transcript of Zietz, 2007

  • 8/12/2019 Zietz, 2007

    1/23

    PLEASE SCROLL DOWN FOR ARTICLE

    This article was downloaded by:

    On: 10 April 2011

    Access details: Access Details: Free Access

    Publisher Routledge

    Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-

    41 Mortimer Street, London W1T 3JH, UK

    The Journal of Economic EducationPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t914957642

    Dynamic Programming An Introduction by ExampleJoachim ZietzaaMiddle Tennessee State University,

    Online publication date: 07 August 2010

    To cite this ArticleZietz, Joachim(2007) 'Dynamic Programming: An Introduction by Example', The Journal of EconomicEducation, 38: 2, 165 186

    To link to this Article DOI 10.3200/JECE.38.2.165-186URL http://dx.doi.org/10.3200/JECE.38.2.165-186

    Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

    This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

    The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

    http://www.informaworld.com/smpp/title~content=t914957642http://dx.doi.org/10.3200/JECE.38.2.165-186http://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://www.informaworld.com/terms-and-conditions-of-access.pdfhttp://dx.doi.org/10.3200/JECE.38.2.165-186http://www.informaworld.com/smpp/title~content=t914957642
  • 8/12/2019 Zietz, 2007

    2/23

    Spring 2007 165

    Dynamic Programming:An Introduction by Example

    Joachim Zietz

    Abstract: The author introduces some basic dynamic programming techniques,

    using examples, with the help of the computer algebra systemMaple. The empha-

    sis is on building confidence and intuition for the solution of dynamic problems

    in economics. To integrate the material better, the same examples are used to

    introduce different techniques. One covers the optimal extraction of a natural

    resource, another uses consumer utility maximization, and the final example

    solves a simple real business cycle model. Every example is accompanied byMaple computer code to allow for replication.

    Keywords: dynamic programming, learning by example,Maple

    JEL codes: C610, A230

    Dynamic programming should emphasize intuition over mathematical rigor

    when it is introduced. Readers interested in more formal mathematical treatments

    are urged to read Sargent (1987), Stokey and Lucas (1989), or Ljungqvist and

    Sargent (2000). I build the discussion in this article around a set of examples thatcovers a number of key problems from economics. In contrast to other brief intro-

    ductions to dynamic programming, such as King (2002), I integrate the use of a

    computer algebra system (Maple). Program code is provided for every example

    to encourage replication and experimentation.1 On the basis of classroom experi-

    ence, I found that this allows readers to understand more easily the structure of

    dynamic programming and to move faster to their own applications.2 The prob-

    lems in this article should be accessible to undergraduates that are familiar with

    the Lagrangian multiplier method to solve constrained optimization problems.

    Dynamic programming has strong similarities to optimal control, a competingapproach to dynamic optimization. Dynamic programming has its roots in the

    work of Bellman (1957), and optimal control techniques rest on the work of the

    Russian mathematician Pontryagin and his coworkers in the late 1950s.3

    Although both dynamic programming and optimal control can be applied to dis-

    crete time and continuous time problems, most current applications in economics

    appear to favor dynamic programming for discrete time problems and optimal

    control for continuous time problems. To keep the discussion reasonably simple,

    Joachim Zietz is a professor of economics at Middle Tennessee State University (e-mail:[email protected]). The author thanks two anonymous referees for helpful comments. Maple input code

    and output for all example programs are available from the author on request. Copyright 2007Heldref Publications

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    3/23

    I only deal with discrete time dynamic programming problems, which have

    become popular in macroeconomics. However, I do not limit the discussion to

    macroeconomics but try to convey the idea that dynamic programming has appli-

    cations in other settings as well.

    I provide some motivation for the use of dynamic programming techniques and

    then discuss finite horizon problems. I treat numerical and nonnumerical prob-

    lems. I next cover infinite horizon problems, again using both numerical and non-numerical examples. Finally, I solve a simple stochastic infinite horizon problem

    of the real business cycle variety.

    A MOTIVATING EXAMPLE

    The basic principle of dynamic programming is best illustrated with an exam-

    ple. Consider the problem of an oil company that wants to maximize profits from

    an oil well. Revenue at time tis given as

    wherep is the price of oil, and u is the amount of oil that is extracted and sold. In

    dynamic programming applications,u is typically called a control variable. The

    companys cost function is quadratic in the amount of oil that is extracted,

    The amount of oil remaining in the well follows the recursion or transition

    equation

    wherexis known as a state variable in dynamic programming language. Because

    oil is a nonrenewable resource, pumping out oil in the amount of u at time tmeans

    that exactly that much less oil is left in the oil well at time t 1. I assume that

    the company applies the discount factor 0.9 for profit streams that occur in

    the future. I also assume that the company intends to have the oil well depleted in

    four years, which means thatx4 0. This is known as a boundary condition in

    dynamic programming. Given these assumptions, the central question to be

    solved is how much oil should be pumped at time t,t 0, . . . , 3, to maximize the

    discounted profit stream.

    If one had never heard of dynamic programming or optimal control, the natu-

    ral way to solve this problem would be to set up a Lagrangian multiplier problem

    along the following lines:

    where x = (x1,x2,x3), withx0 given exogenously, andx4 0, by design, where

    u (u0,u1,u2,u3), (1, 2, 3, 4), and where the constraints specify that theamount of oil left at period tis equal to the amount in the previous period minus

    the amount pumped in the previous period. Using standard techniques, the above

    maxL(x, u, ) a3

    t0

    t(RtCt) a4

    t1

    lt(xtxt1ut1),

    xt1xtut,

    Ct 0.05u2t.

    Rtptut,

    166 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    4/23

    Lagrangian problem can be solved for the decision variables x and u and the

    Lagrangian multipliers in terms of the exogenous variables (p0,p1,p2,p3,x0).

    TheMaple commands to solve this problem are given as follows4:

    restart: with(linalg): Digits:=(3):

    # memory is cleared, the linear algebra package

    # called and the number of digits for floating

    # point calculations is set; a colon at the end

    # prevents all output

    f:=sum(0.9^t*(p[t]*u[t]-0.05*u[t]^2),t = 0..3);

    # the objective function is defined

    x[0]:=x0: x[4]:0:

    g:=sum(lambda[t]*(x[t]-x[t-1]+u[t-1]),t=1..4);

    # the constraints are defined

    L:=fg;

    # the Lagrangian function is defined

    L_grad:=grad(L,[seq(x[t],t=1..3),seq(u[t],t=0..3),

    seq(lambda[t],t=1..4)]);

    # the gradient of the Lagrangian is derived;

    # the seq command greatly simplifies the input lines

    solve({seq(L_grad[i],i=1..11)},{seq(x[t],t=1..3),

    seq(u[t],t=0..3),seq(lambda[t],t=1..4)});

    # the gradient is solved for all x, u, and

    assign(%):

    # all x, u, and

    are assigned their solution val-

    # ues; this is required to do further calculations

    # with the solution values

    For example, the optimal amounts of oil to pump in periods zero to four are

    given as

    If one now specifies a particular sequence of prices and the value of x0, then a

    complete numerical solution results. For example, assuming (p0 20,p1 22,

    p2 30,p3 25,x0 1000), one can append the followingMaple commands:

    x0:=1000: p[0]:=20: p[1]:=22: p[2]:=30: p[3]:=25:

    evalf([seq(x[t],t=0..4)]); evalf([seq(u[t],t=0..3)]);

    evalf([seq(lambda[t],t=1..4)]);

    to generate the solution for [x0;x;x4], [

    u],

    [] as[1000;794,566,260;0;],[206,227,308,260],[0.60,0.60,

    0.60,0.60].

    u3 0.291x0 2.91p0 2.91p1 2.91p2 7.10p3 .

    u2 0.262x0 2.62p0 2.62p1 7.38p2 2.62p3

    u1 0.235x0 2.35p0 7.65p1 2.35p2 2.36p3

    u0 0.212x0 7.88p0 2.12p1 2.12p2 2.12p3

    Spring 2007 167

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    5/23

    Given the relative simplicity with which this problem can be solved with stan-

    dard techniques, students may wonder where there is room for a different approach.

    The clue to this puzzle lies in the simple fact that one cannot at time 0 pretend to

    know all future prices or actual amounts of oil that are pumped. For example, it may

    be that oil rigs break down or workers are on strike so that the optimal amount of

    oil simply cannot be pumped in a given year. Yet it should be clear that a change in

    any of the futurep orxmeans that the above sequence ofxand u is no longer opti-mal. The whole problem needs to be recalculated subject to the new p and the

    known values ofx. To avoid recalculating the problem whenever something unex-

    pected happens to eitherp orx, economists try to find a decision rule or policy func-

    tion for the control variable u that can be followed regardless of random shocks to

    price or state variablex. That is the basic idea of dynamic programming.

    FINITE HORIZON PROBLEMS

    The Motivating Example Again

    I now recalculate the previous problem using dynamic programming tech-

    niques. The objective is, as before, to maximize

    subject to the transition equations

    and the boundary condition

    Again, I assume that the company applies the discount factor 0.9.

    The problem is now solved with the help of the Bellman equations

    (1)

    where Vt is known as the value function in dynamic programming terminology.The value function Vappears on both the left and the right side of the Bellman

    equations, although with a different time subscript. The different time subscripts

    suggest that the dynamic programming problem is solved in a recursive manner,

    with the solution for Vderived first for time t 1 and then for time t. The lack

    of a summation sign in equation (1) underscores the fact that the optimization

    proceeds one period at a time. In line with this recursive approach, the discount

    factor that is applied to Vt1 has an exponent of one. Vt1 is discounted by just

    one period so it can be summed to the value that is provided by the objective

    function at time t. The argument max in equation (1) implies that the control vari-able ut is set to maximize the right-hand side of the Bellman equation for time

    period t, given the optimal value function for period t 1 (Vt1).5 Note that both

    the optimal value function (Vt1) and the optimal policy function for the control

    Vt maxut

    (ptut 0.05u2t) 0.9Vt1, t 0,1,2,

    x4 0.

    xt1xtut, t 0,1,2,3

    a3

    t0

    t(ptut 0.05u2t),

    168 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    6/23

    variable at time t 1 (ut1) are expressed in terms of the state variable xt1.6

    Because the transition equations linkxt1 toxt and ut, the right-hand side of the

    Bellman equation can be maximized with respect to ut only after all occurrences

    ofxt1 in Vt1 are replaced with the term (xt ut). In short, there are more utterms

    in equation (1) than meet the eye.

    In applying the Bellman equations of equation (1), I start with the boundary

    condition and the value function for t 3 and then solve the problem recursivelyin the backward direction toward period 0.

    To obtain the value function for t 3, I first need to identify the optimal pol-

    icy function for u for that period. Because x4 0, the transition equation for

    period 3 is given as

    Thus, the transition equation fort 3 conveniently provides the optimal decision

    rule for the control variable u in period 3,

    (2)

    Equation (2) simply says to extract and sell all the oil that remains at the begin-

    ning of t 3 in that same period. Compared with the rule obtained from apply-

    ing the Lagrange multiplier method in the previous section, equation (2) is much

    simpler because it is independent of any price variables or any other variables

    dated t 3. It is an application of the principle of optimality that underlies

    dynamic programming: One can do no better than to follow the derived optimal

    policy or decision rule for the control variable, regardless of what happened to

    control and state variables in earlier periods.Now that the optimal policy function for u3 is known, the value function for

    t 3 can be derived as7

    (3)

    Note the absence of a term V4 in equation (3). This derives from the fact that no

    value is added anymore in period 4 because the oil well is depleted. Equation (3)

    completes the optimization process for period t 3. The processes for periods 2,

    1, and 0 still need to be done.

    To solve the optimization problem for t 2, I substitute equation (3) into theBellman equation for period 2 as follows:

    to obtain

    (4)

    The right-hand side of equation (4) can be maximized with respect to u2 only after

    considering thatx3 is connected with u2 via the transition equation

    (5)

    Substituting equation (5) into equation (4) gives

    x3x2u2.

    V2 maxu2

    (p2u2 0.05u22) 0.9(p3x3 0.05x3

    2).

    V2 maxu2

    (p2u2 0.05u22) 0.9V3

    V3p3x3 0.05x32.

    u3x3.

    x4 0x3u3.

    Spring 2007 169

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    7/23

    (6)

    Now, only the decision variable u2 and variables that are known or exogenous at

    time t 2 are in equation (6). The right-hand side of equation (6) can be maxi-

    mized with respect to the control variable u2,

    Solving the above first-order condition for u2 gives the optimal policy or decision

    rule for period 2,

    (7)

    Similar to the decision rule in equation (2), this policy rule for the control vari-

    able is simpler than the corresponding rule from the Lagrangian multiplierapproach in the last section. There is also no need to revise this rule in the light

    of unexpected shocks to price or the state variablexprior to t 2. This completes

    the calculations for t 2.

    To find the optimal policy rule for period 1, insert equation (7) into the value

    function for period 2 to get an expression similar to that in equation (3). After

    some simplifications, the value function for period 2 can be written as

    The value function for period 2 needs to be substituted into the Bellman equationfor period 1,

    This gives

    (8)

    Making use of the transition equation

    (9)

    in equation (8), one can maximize its right-hand side with respect to u1,

    Solving the resulting first-order condition for u1 yields

    u1 0.2989x1 7.013p1 2.989p2 2.989p3.

    2.636p22 4.734p2p3 0.02368(x1u1)

    2 2.132p32 4 6>du1 0.

    d5(p1u1 0.05u12) 0.9 30.474(p2p3)(x1u1)

    x2x1u1

    2.636p22 4.734p2p3 0.02368x22 2.132p32 4 .V1 max

    u1(p1u1 0.05u1

    2) 0.9 30.474(p2p3)x2

    V1 maxu1

    (p1u1 0.05u12) 0.9V2.

    V2 0.474(p2p3)x2 2.636p22 4.734p2p3 0.02368x2

    2 2.132p32.

    u2 0.4737x2 5.263p2 4.737p3.

    p2 0.19u2 0.9p3 0.09x2 0.

    d5(p2u2 0.05u22

    ) 0.9 3p3(x2u2) 0.05(x2u2)2

    4 6>du2 0

    V2 maxu2

    (p2u2 0.05u22) 0.9 3p3(x2u2) 0.05(x2u2)2 4 .

    170 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    8/23

    This is the optimal policy or decision rule for period 1. It has some similarities

    with the equation that determines the optimal value of u1 from the Lagrangian

    multiplier approach, but it has fewer terms.

    The last step in the solution of the optimal control problem is to obtain the decision

    or policy rule for period 0. The procedure is the same as before; insert the optimal deci-

    sion rule for period 1 into the Bellman equation for period 1 and simplify to obtain

    Next, this value function is inserted into the Bellman equation for period 0, which

    is given as

    (10)

    After substituting all occurrences of x1 in the this equation with (x0 u0), the

    right-hand side of equation (10) is maximized with respect to u0. This yields

    This decision rule for period 0 is the same as the one derived in the previous sec-

    tion using Lagrangian multiplier techniques.

    These results can be calculated byMaple with the help of the following code:

    restart: N:3: Digits:4:

    V[N]:unapply(p[N]*x0.05*x^2,x);

    # the objective function is expressed as a function# of the state variable x

    for t from N-1 by -1 to 0 do;

    # a do loop is entered for a backward recursion

    V[t]:=p[t]*u-0.05*u^2+0.9*V[t+1](x-u);

    # the Bellman equation is specified; the last term

    # says to replace all values of x in V[t+1], which

    # are dated t+1, with (x-u), which is dated t

    deriv[t]:=diff(V[t],u);

    # the right side of the Bellman equation is differen-# tiated with respect to the control variable

    u_opt[t]:=evalf(solve(deriv[t],u));

    # the first derivative is set to zero and solved for

    # the control variable; the optimal policy function

    # (u_opt) is evaluated numerically

    V[t]:=unapply(evalf(simplify(subs(u=u_opt[t],

    V[t]))),x);

    # u_opt is substituted into the Bellman equation; the

    # result is simplified, evaluated numerically, and# expressed as a function of the state variable x

    od;

    # the do loop ends.

    u0 0.2120x0 7.880p0 2.119p1 2.120p2 2.120p3.

    V0 maxu0

    (p0 u0 0.05u02) 0.9V1.

    2.556p32 3.506p1

    2 2.989p1(p2p3).

    V1 0.01494x12 3.006p2

    2 2.989p2p3 0.2989(p1p2p3)x1

    Spring 2007 171

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    9/23

    It should be apparent that the recursive structure of the dynamic programming

    problem makes it easy in principle to extend the optimization to a larger number

    of periods.

    A Nonnumerical Example

    Much of the work in economics is about solving optimization problems with-

    out specific numerical entries. The next example applies the dynamic program-

    ming framework in this environment.

    Consider the maximization of utility of a consumer who does not work but

    lives off a wealth endowment a. Assume that utility depends on the discounted

    logged value of consumption c from period zero to two,

    where the discount factor is

    and where ris the relevant interest rate. Consumption is the models control vari-

    able, and the transition equations are given by

    The models boundary condition is

    which says that wealth is supposed to be depleted at the end of period 2. There

    are no bequests. The task is to find the optimal consumption sequence.

    The solution process starts in the final period and works backward in time.

    Applying the boundary condition to the transition equation for period t 2 results in

    This can be solved for c2, the control variable for t 2, to get

    (11)

    This is the decision rule for period t 2. To find the decision rules for periods

    t 1 and t 0, apply the Bellman equation

    The Bellman equation for period 1 requires the value function of period 2. The

    latter is equal to the above Bellman equation for t 2, with c2 replaced by its

    maximized value from equation (11) and with V3 0 because wealth is exhausted

    at the end of t 2,

    This value function can now be inserted into the Bellman equation for period 1,

    V1 maxc1

    ln c1 V2,

    V2 ln c2 ln 3(1r)a2 4 .

    Vt maxct

    ln ct Vt1.

    c2 (1r)a2.

    0 (1r)a2c2.

    a3 0,

    at1 (1r)atct t 0,1,2.

    11(1r),

    max U a2

    t0

    tln c1,

    172 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    10/23

    to obtain

    Before the right-hand side of the Bellman equation is maximized with respect to

    c1,a2 has to be replaced using the transition equation for t 1,

    The Bellman equation for period 1 is now of the form

    and its right-hand side can be maximized with respect to the control variable c1.

    The first-order condition and c1 are given as

    (12)

    The decision rule for t 0 still needs calculating. It requires the value function

    for t 1. This value function is equal to the Bellman equation for period 1, with

    c1 replaced by the optimal decision rule, equation (12),

    Hence, the Bellman equation for t 0 is given as

    Substituting out a1 with

    gives

    Maximizing the right-hand side with respect to the control variable c0 results in

    the following first-order condition and optimal value for c0:

    a0(1r)c0(1 2)

    c0 3a0(1r)c0 4 0

    2ln B(1r)2(1r)a0c01

    R.

    V0 maxc0

    ln c0 ln B(1r)(1r)a0c01

    R

    a1 (1r)a0c0

    V0 maxc0

    ln c0 lna1(1r)

    1 2ln

    a1(1r)2

    1 .

    V0 maxc0

    ln c0 V1

    V1 lna1(1r)

    1 ln

    a1(1r)2

    1 .

    c1(1r)a1

    1 .3

    a1(1r)c1(1 )

    4

    5c1

    3a1(1r)c1

    4 6 0,

    d5ln c1 ln(1r) ln 3(1r)a1c1 4 6dc1

    V1 maxc1

    ln c1 ln5(1r) 3(1r)a1c1 4 6,a2 (1r)a1c1.

    V1 maxc1

    ln c1 ln 3(1r)a2 4 .

    Spring 2007 173

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    11/23

    (13)

    A comparison of the consumption decision rules for t 0, 1, 2 reveals the

    following pattern:

    or, more generally, for t 0, ,n,

    (14)

    The ability to recognize and concisely describe such a pattern plays a key role in

    applications of dynamic programming in economics, in particular those applica-tions that deal with infinite horizon problems. I discuss them next.

    The followingMaple program solves the above consumption problem:

    restart: N:=2: Digits:=4:

    V[N]:=unapply(log((1+r)*a),a);

    # this initial value function follows from the bound-

    # ary condition

    for t from N-1 by -1 to 0 do

    V[t]:=log(c)+beta*V[t+1]((1+r)*a-c);

    # Since V is a function of a, the term in parenthe-

    # sis after V[t+1] is substituted for a in V[t+1]

    deriv[t]:=diff(V[t],c);

    c_opt[t]:=solve(deriv[t],c);

    V[t]:=unapply(simplify(subs(c=c_opt[t],V[t])),a);

    od;

    INFINITE HORIZON PROBLEMS

    A problem often posed in economics is to find the optimal decision rule if the

    planning horizon does not consist of a fixed number of n periods but of an infinite

    number of periods. Examining the limiting case n S may also be of interest if

    n is not strictly infinite but merely large because taking the limit leads to a deci-

    sion rule that is both simplified and the same for every period. Most often that is

    preferred when compared with a situation where one has to deal with a distinct

    and rather complex rule for each period from t 0 to t n. I illustrate this point

    with the consumption problem of the last section.

    A Nonnumerical Problem

    The optimal consumption decision rule given in equation (14) can be modified

    so it applies for an infinite time horizon. For taking the limit, the first part of

    ctat(1r)

    a i0n t

    i

    at

    a i0n t

    i

    at

    a i ln t1

    i, t 0, p , n.

    ctat(1r)

    a t02 t

    i, t 0,1,2,

    c0(1r)a0

    1 2.

    174 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    12/23

    equation (14) is most useful.

    Making use of the geometric series theorem,

    yields

    The decision rule for consumption in period tsimplifies to the intuitively obvi-

    ous: Consume in tonly what you receive in interest from your endowment in t.

    The associated transition function for wealth is given as

    that is, wealth will remain constant over time.

    This example illustrates one way of arriving at optimal decision rules that

    apply to an infinite planning horizon: simply solve the problem for a finite num-

    ber of periods, identify and write down the pattern for the evolution of the con-

    trol variable, and take the limit n S . It should be obvious that the same

    procedure can be applied to identify the value function for time period t. To illus-

    trate this point, recall the sequence of value functions for t 1, 2.

    (15)

    (16)

    The value function for t 0 is given as

    (17)

    Proceeding now as in the case of the consumption rule, the following pattern

    emerges from equations (15) to (17):

    (18)

    It is possible to check the correctness of the pattern described by equation (18) bylettingMaple calculate the implied sequence of the Vt. TheMaple code to repli-

    cate equations (15) through (17) from equation (18) is given as

    restart: n:=2:

    Vt an t

    i0

    ilnat(1r)

    i1i

    a i0n t

    i.

    V0 ln

    a0(1r)

    1 2 ln

    a0(1r)2

    1 2 2ln

    a0 (1r)32

    1 2.

    V1 lna1(1r)

    1 ln

    a1(1r)2

    1 .

    V2 ln 3a2(1r) 4 ,

    at1 (1r)atct (1r)atatrat,

    limnSq

    ctat(1r)(1 )atr.

    aq

    i0

    i 1

    1 for 0 6 6 1,

    limnSq

    ct limnSq

    at(1r)

    a i0n t

    i.

    Spring 2007 175

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    13/23

    seq(V[t]=sum(beta^i*log(a[t]*(1+r)^(i+1)*beta^i/

    (sum(beta^(i),i=0..n-t))),i=0..n-t),t=0..n);

    After verifying that the formula is correct, the limit n S is taken

    Because interest centers on finding an expression involving the variable at, the

    idea is to isolate at. This can be done as follows:

    or

    The last expression can be rewritten as

    (19)

    where is an expression independent of a. Taking the limit leads to

    (20)

    With the help of a computer algebra program, students could circumvent equation

    (18) and the following algebra altogether by going from equation (17) immedi-

    ately to equation (19) for n 2 and t 0. The following sequence of Maple

    commands takes equation (17), expands it, and isolates the terms in a0:

    restart:

    V[0]:=ln(a[0]*(1+r)/(1+beta+beta^2))+

    beta*ln(a[0]*(1+r)^2*beta/(1+beta+beta^2))+

    beta^2*ln(a[0]*(1+r)^3*beta^2/(1+beta+beta^2));

    # the above is equation 17

    assume(beta0,r0);

    assume(a[0],real);# the assumptions exclude economically senseless

    # cases that prevent the program from finding a

    # solution; variables with assumptions attached

    limnSq

    Vt1

    1 ln at ha1 1rb ln at h.

    limnSq

    Vt lim

    nSq

    a

    n t

    i0

    iln at h,

    limnSq

    Vt limnSq

    an t

    i0

    iln at limnSq

    an t

    i0

    iln(1r)i1i

    an t

    i0i

    .

    limnSq

    Vt limnSq

    an t

    i0

    i c ln at ln (1r)i1i

    an t

    i0i

    d

    limnSq

    Vt limnSq

    an t

    i0

    ilnat(1r)

    i1i

    an t

    i0i

    .

    176 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    14/23

    # are indicated with a in the Maple output

    V[0]:=collect(expand(V[0]),log(a[0]));

    # the collect command factors the specified term,

    # i.e. log(a[0])

    The output of these commands is

    (21)

    where consists of a number of terms in and r.

    A Numerically Specified Problem

    Infinite horizon problems that are numerically specified can be solved by iter-

    ating on the value function. I demonstrate this for the above consumption opti-

    mization problem. The task is to find the optimal decision rule for consumption

    regardless of time t. I employ the Bellman equation without time subscripts,

    where a and a denote current and future value of the state variable, respectively.

    The time subscripts are eliminated in the Bellman equation to reflect the idea that

    the value function V(a) needs to satisfy the Bellman equation for any a. The iter-

    ation method starts with an initial guess of the solution V(a), which is identified

    as V(0)(a). Guessing a possible solution is not as difficult as it sounds; each class

    of problems typically has associated with it a general form of the solution. For

    example, in the current case, it is known from solving the consumption problemfor t 2 that the value function will somehow depend on the log of a.

    To simplify the algebra, assume that 0.95.

    A numerically specified initial guess. I convert the initial guess of the value func-

    tion into numerical form by assumingA 0 andB 1.

    The initial guess V(0)(a) is substituted on the right-hand side of the Bellman equation

    Next, making use of the transition equation, get the following:

    to substitute out a,

    the right-hand side of V(a) is maximized with respect to the control variable c,

    dV(a)

    dc

    1

    c

    0.95

    (1.05263ac) 0,

    V(a) maxc

    ln c 0.95 ln(1.05263ac),

    a (1 0.05263) ac

    V(a) maxc

    ln c 0.95 ln a.

    V(0)(a) ln a.

    V(0) (a)ABln a.

    V(a) maxc

    ln c V(a),

    V0 (1 2) ln a0 z,

    Spring 2007 177

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    15/23

    which gives c 0.53981a. Substituting the solution for c back into V(a) yields

    and, after simplifying,

    Because

    the value function requires further iterations. For that purpose, I substitute the

    new guess,V(1)(a), into the Bellman equation and get

    Substitution of the transition equation for a gives

    which simplifies to

    Maximizing the right-hand side of V(a) with respect to c yields c 0.36902a.

    Substituting the optimal policy rule for c back into V(a) gives

    which simplifies to

    Because

    another iteration is needed. Instead of continuing the above process of iterating

    on the value function by hand, I make use of the following Maple program to

    iterate until convergence is achieved:

    restart: Digits:=8: beta:=0.95:V[0]:=unapply(ln(a),a):

    for n from 0 to 200 do

    V[n]:=unapply(ln(c)+beta*V[n]((1+((1/beta)-1))*a-

    c),(a,c));

    # V is a mathematical function of both a and c; in

    # the text V is expressed as a function of only the

    # state variable a

    deriv:=diff(V[n](a,c),c);

    c_opt:=expand(evalf(solve(deriv,c)));# the command expand often simplifies the calcula-

    # tions that follow especially when used in combi-

    # nation with simplify or evalf

    V (2)(a)V(1)(a),

    V(2)(a) 2.8899 2.8525 ln a.

    V (2) (a) ln 0.36902 a 1.1884 1.8525 ln(1.05263a 0.3690a),

    V(a) maxc

    ln c 1.1884 1.8525 ln(1.05263ac).

    V(a) maxc ln c 0.95 3 1.251 1.95 ln(1.05263ac) 4 ,

    V(a) maxc

    ln c 0.95 ( 1.251 1.95 ln a).

    V(1) (a)V(0) (a),

    V(1) (a) 1.251 1.95 ln a.

    V (1)(a) ln(0.53981a) 0.95 ln(1.05263a 0.53981a),

    178 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    16/23

    V[n+1]:=unapply(evalf(expand(simplify(subs(c=c_opt,V

    [n](a,c))))),a);

    # an example for an interlocked command structure:

    # the substitute command is executed first, followed

    # by simplify, expand, evalf, and unapply

    od;

    After about 200 iterations with eight digits of accuracy, the value function

    reaches

    and the corresponding policy function for the control variable

    These results are effectively identical to the ones that can be obtained from equa-

    tion (20) and the corresponding decision rule for c.8 Specifically, equation (20)

    predicts the value function to be

    and the corresponding decision rule for c is identical to the one above.

    A numerically unspecified initial guess. I start with a numerically unspecified

    guess of the value function,

    (22)

    whereA andB are constants to be determined. In this case, the value function will

    not be iterated on. Instead, the idea is to algebraically derive values forA andB

    that equalize V(a) on the left and V(a) on the right side of the Bellman equation.

    The solution process begins with the Bellman equation

    and the substitution of the initial guess of the value function,

    Next, the transition equation is used to substitute out the state variable a

    (23)

    The right-hand side of equation (23) is now maximized with respect to the con-

    trol variable c,

    which results in

    .c21.0526a

    20 19B

    d5ln c 0.95 3ABln(1.05263ac) 4 6dc 0,

    V(a) maxc

    ln c 0.95 3ABln(1.05263ac) 4 .

    V(a) maxc

    ln c 0.95 (ABln a).

    V(a) maxc

    ln c 0.95V (0)(a)

    V(0)(a)ABln a,

    V(a) 20 ln a h,

    c 0.05263a.

    V(a) 58.886 19.9994 ln a

    Spring 2007 179

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    17/23

    The optimal c is substituted back into equation (23), which, upon expansion,

    results in

    (24)

    If the initial guess of the value function, equation (22), had only one unknownparameter, sayA, I could substitute this initial guess for V(a) on the left side

    of equation (24) and then solve for the unknown value ofA. With two unknown

    parameters in the value function,A andB, this method is not useful. An alter-

    native solution procedure rests on a comparison of coefficients. In particular,

    the coefficient of ln a implied by equation (24) is set equal to B, which is the

    coefficient of ln a in equation (22). The resulting equation is solved for B.

    OnceB is known, the right-hand side of equation (24) is set equal to the right-

    hand side of equation (22). The resulting equation determinesA.

    To proceed with the coefficient comparison, I factor the terms in ln a inequation (24).

    (25)

    set the coefficient of ln a in equation (25) equal toB,

    and solve forB. This results inB 20. To derive parameterA, substituteB 20

    into equations (22) and (25), set the two equal

    and solve for A. The solution is A 58.888. With the parameters A and B

    known, the policy function for c is given as

    and the value function is

    Both functions match the ones obtained in earlier sections with other solution

    methods.

    TheMaple commands to replicate the above solution method are given as

    restart: Digits:=8:

    assume(a0): beta:=0.95:

    V[0]:=unapply(A+B*log(a),a);# this is the initial guess of the value function

    V:=unapply(ln(c)+beta*V[0]((1+((1/beta)-1))*a-

    c),(a,c));

    V(a) 58.888 20.0 ln a.

    c 0.05263a,

    2.9444 0.95A 20 ln aA 20 ln a

    (1 0.95B)B,

    0.95BlnB 0.95Bln(20 19B),

    V(a) 3.047 (1 0.95B) ln a ln(20 19B) 0.95A 2.846B

    0.95BlnB 0.95Bln(20 19B).

    V(a) 3.047 ln a ln(20 19B) 0.95A 2.846B 0.95Bln a

    180 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    18/23

    # the initial guess is substituted into the Bellman

    # equation; V has to be expressed as a mathematical

    # function of both a and c; in the text, V is ex-

    # pressed as a function of only the state variable a

    deriv:=diff(V(a,c),c);

    # take the first derivative of the right-hand side of

    # Vc_opt:=expand(evalf(solve(deriv,c)));

    # solve for the optimal c

    V:=expand(simplify(subs(c=c_opt,V(a,c))));

    V:=collect(V,ln(a));

    # substitute the optimal c into the Bellman equation

    # and factor (ln a)

    B:=solve(diff(V,a)*a=B,B);

    # isolate the coefficient of (ln a), set it equal to

    # B, and solve for B; to isolate the coefficient of# (ln a) in V[1], apply the chain rule: dV[1]/d(ln a)

    # = (dV[1]/da)*(da/d(ln a))

    V0:=A + B*log(a):

    A:=solve(V = V0,A);

    # solve for the coefficient A

    `c `=c_opt;`V(a)`=V0;

    A STOCHASTIC INFINITE HORIZON EXAMPLE

    Dynamic programming problems can be made stochastic by letting one or

    more state variables be influenced by a stochastic disturbance term. Most of the

    applications of dynamic programming in macroeconomics follow this path. I pro-

    vide one example.

    The basic real business cycle (RBC) model (discussed in King 2002) is very

    much related to that of Kydland and Prescott (1982) and Long and Plosser (1983).

    Many RBC models are variations of this basic model.

    The problem is to maximize the expected discounted sum of present and futureutility, where utility is assumed to be a logarithmic function of consumption (C)

    and leisure (1 n).

    where n represents labor input and available time is set at unity. Maximization is

    subject to the market clearing constraint that outputy equals the sum of the two

    demand components, consumption and investment.

    where investment equals the capital stock kbecause capital is assumed to depre-

    ciate fully in one period. There are two transition equations, one for each of the

    ytCtkt,

    maxE0aq

    y0

    t 3 ln Ct dln(1nt) 4 ,

    Spring 2007 181

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    19/23

    two state variables, output and the state of technology (A),

    (26)

    (27)

    The transition equation ofy is a dynamic Cobb-Douglas production function, and

    the transition equation ofA follows an autoregressive process subject to a distur-bance term e, which is assumed to be normally distributed. C,n, and kare the

    control variables. To make the problem easier to follow, I use numbers for the

    parameters, 0.9, 0.3, 1.0, and 0.8.

    The Bellman equation for this problem is given as

    whereEis the expectations operator. The Bellman equation can be simplified

    by substituting for Cand, hence, reducing the problem to two control variables

    (28)

    The next step in the solution process is to find an initial value function. In the ear-

    lier consumption problem, which was similar in terms of the objective function

    but had only one state variable a, the value function was of the type

    Because there are two state variables in the present problem, I try the analogous

    solution

    (29)

    From here on, the problem follows the method used for the previous section.

    Substituting the trial solution from equation (29) into equation (28) gives

    (30)

    Before the right-hand side of equation (30) can be maximized,y andA that appear in

    the expectations term are substituted out by their transition equations (26) and (27).

    The above equation is expanded to

    (31)

    and then maximized with respect to both kand n. This results in two first-order

    conditions,

    1

    yk

    0.27G

    k 0, and 1

    1n

    0.63G

    n 0,

    0.63Gln n 0.9GEe 0.72HlnA 0.9HEe

    V(y,A) maxk,n

    ln(yk) ln(1n) 0.9Z 0.27Glnk

    0.9E 3ZGln(A0.8 eek0.3 n0.7)Hln(A0.8 ee) 4 .V(y,A) maxk,n ln(yk) ln(1n)

    V(y,A) maxk,n

    ln(yk) ln(1 n) 0.9E(ZGlnyHlnA).

    V(0) (y,A)ZGlnyHlnA.

    V(a)ABln a.

    V(y,A) maxk,n

    ln(yk) ln(1n) 0.9EV (0)(y,A),

    V(y,A) maxC,k,n

    ln Ct ln(1nt) 0.9EV(0)(y,A),

    At1Atr

    eet 1.

    yt1At1kta nt

    1a.

    182 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    20/23

    which can be solved for the two control variables to yield

    These solutions are substituted back into equation (31), and the expectations

    operator is employed to remove the two disturbance terms. These operations

    yield

    (32)

    As in the consumption example of the last section, the implied coefficients of ln

    y and lnA and the remainder term of equation (32) have to be compared with G,

    H, andZof equation (29). To make such a coefficient comparison feasible, factor

    the lny and lnA terms on the right-hand side of equation (32) to obtain

    (33)

    where

    (34)

    Note that equation (33) is now in the same format as equation (29) so that thecoefficients of these two equations can be compared. Setting the coefficient of ln

    y in equation (33) equal to G yields the determining equation

    which results in G 1.37. Employing this value of G, set the coefficient of lnA

    in equation (33) equal toH,

    and obtain H 3.52. Next, substitute the solutions for G and H into equation(34), set the resulting expression for equal toZ, and solve forZ. This results in

    Z 20.853. With the values of Z,G, and Hknown, the value function that

    solves the problem is given as

    and the policy functions or decision rules for k,n, and Care

    Note that the addition of a stochastic term to variableA has had no effect on theoptimal decision rules for k,n, and C. This result is not an accident but, as dis-

    cussed at the beginning of this article, the key reason why dynamic programming

    is preferred over a traditional Lagrangian multiplier method in dynamic

    k 0.27y, n 0.463 C 0.73y.

    V(y,A) 20.853 1.37 lny 3.52 lnA,

    0.72(1.37H)H

    (1 0.27G)G,

    ln(100 63G) 0.9Z 0.63Gln(100 63G) 3.5G.

    u 9.21 0.9Gln G 0.27Gln(100 27G) ln(100 27G)

    V(y,A) u (1 0.27G) lny 0.72(GH) lnA,

    0.9Z 0.27Gln0.27Gy

    1 0.27G 63Gln

    0.63G

    1 0.63G 0.72HInA.

    V(y,A) ln ay 0.27Gy1 0.27G

    b lna1 0.63G1 0.63G

    b

    k0.27Gy

    1 0.27G, n 0.63G

    1 0.63G.

    Spring 2007 183

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    21/23

    optimization problems: unpredictable shocks to the state variables do not change

    the optimal decision rules for the control variables.9

    The above solution process can be automated and processed with other param-

    eters if one uses the followingMaple commands:

    restart:

    C:=y-k; yy:=AA*k^alpha*n^(1-alpha); AA:=A^rho*exp

    (epsilon);

    # basic definitions

    V0:=ln(C)+delta*ln(1-n)+beta*(Z+G*ln(yy)+H*ln(AA));

    # the trial solution for the value function is

    # inserted into the Bellman equation

    delta:=1.; beta:=0.9; rho:=0.8; alpha:=0.3;

    assume(y0,A0,k0,n0):

    # exclude economically senseless variable ranges

    V:=expand(V0); k_opt:=((diff(V,k)))=0; n_opt:=diff

    (V,n)=0; solve({k_opt,n_opt},{n,k});

    # the right-hand side of the Bellman equation is maxi-

    # mized with respect to the control variables n and

    # k;

    assign(%): `V `=V;

    # k and n are set equal to their optimal values

    epsilon:=0: `V `=V;

    # in applying the expectations operator all epsilon

    # terms drop out

    simplify(expand(V)): collect(%,ln(y)): V:=collect(%,

    ln(A));

    # the value function is factored with respect to (ln

    # y) and (ln A)

    G:=solve(diff(V,y)*y=G,G);

    H:=solve(diff(V,A)*A=H,H);

    Z:=solve(V-G*ln(y)-H*ln(A)=Z);

    # the values for G, H, and Z are derived

    V:=Z+G*ln(y)+H*ln(A); `k `=k; `n `=n; `C `=C;

    CONCLUSION

    I introduced dynamic programming techniques by way of example, with the

    help of the computer algebra system Maple. The emphasis of this introduction

    was to build confidence and intuition. I solved the examples one step at a time,

    with sufficient repetition of key steps to enhance the learning process. I limited

    the coverage of solution techniques to avoid confusion at the initial stages oflearning dynamic programming. To integrate the material better, I used the same

    examples to introduce different techniques. One example was on the optimal

    extraction of a natural resource, another on consumer utility maximization, and

    184 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    22/23

    the final example covered a simple real business cycle model, which could be

    considered an extension of the utility maximization example.

    Every example was accompanied by aMaple computer code to make it possi-

    ble for readers not only to replicate a given example but also to modify it in var-

    ious ways and, ultimately, to encourage application of the techniques to other

    areas in economics. As the next step in the process of learning dynamic pro-

    gramming techniques, I recommend that students study and solve the economicexamples discussed in Adda and Cooper (2003). Some of them will seem imme-

    diately familiar because of the coverage in this article. Others will open up new

    avenues of applying dynamic programming techniques.

    NOTES

    1. I found the brief overview of discrete time dynamic programming in Parlar (2000) helpful becauseof its focus on coding inMaple.

    2. In trying to modify the examples, the reader may want to start with very simple changes to mini-

    mize program syntax or rounding errors and to avoid situations where no solution is found at all.The latter can occur easily, even for seemingly minor modifications. In fact, it is for precisely thisreason that a large literature has developed in recent years on more powerful numerical solutionmethods. Judd (1998) provided an excellent starting point. Miranda and Fackler (2002) alsodiscussed numerical techniques to solve dynamic programming problems. They focused on thesoftware packageMATLAB.

    3. Kamien and Schwartz (1981) contained one of the most thorough treatments of optimal control.Most major textbooks in mathematical methods for economists also contain chapters on optimalcontrol.

    4. Comments are identified with a pound sign (#). They are not executed byMaple. Comments referto the line(s) ofMaple code right above.

    5. As seen later, knowledge of the optimal value function for time period t 1 implies also knowing

    the optimal policy function for the control variable at time t 1.6. As suggested at the end of the previous section, this may be regarded as the key innovation of thedynamic programming approach over the classical Lagrangian multiplier method.

    7. The value function is equal to the Bellman equation in equation (1), except that all occurrences ofu3 are replaced by the optimal policy function for u3, and the max argument is removed.

    8. If the calculations are done with fewer digits of accuracy, there is more of a discrepancy in resultsregardless of the number of iterations.

    9. The stochastic nature of state variables affects how the state variables move through time. Differentsequences of shocks to the state variables produce different time paths, even with the same deci-sion rules for the control variables. From a large number of such stochastic time paths, RBCresearchers calculate standard deviations or other moments of the state and control variables andcompare them with the moments of actually observed variables to check their models. Compare

    Adda and Cooper (2003) on this.

    REFERENCES

    Adda, J., and R. Cooper. 2003. Dynamic economics: Quantitative methods and applications.Cambridge, MA: MIT Press.

    Bellman, R. 1957.Dynamic programming. Princeton, NJ: Princeton University Press.Judd, K. L. 1998.Numerical methods in economics. Cambridge, MA: MIT Press.Kamien, M. I., and N. L. Schwartz. 1981.Dynamic optimization: The calculus of variations and opti-

    mal control in economics and management. New York: Elsevier North-Holland.King, I. P. 2002. A simple introduction to dynamic programming in macroeconomic models.

    http://www.business.auckland.ac.nz/Departments/econ/workingpapers/full/Text230.pdf (last

    accessed August 21, 2005).Kydland, F. E., and E. C. Prescott. 1982. Time to build and aggregate fluctuations.Econometrica 50

    (6): 134571.Ljungqvist, L., and T. J. Sargent. 2000. Recursive macroeconomic theory. Cambridge, MA: MIT

    Press.

    Spring 2007 185

    DownloadedAt:15:

    0910April2011

  • 8/12/2019 Zietz, 2007

    23/23

    Long, J. B., and C. I. Plosser. 1983. Real business cycles.Journal of Political Economy 91 (1): 3969.Miranda, M. J., and P. L. Fackler. 2002.Applied computational economics and finance. Cambridge,

    MA: MIT Press.Parlar, M. 2000. Interactive operations research with Maple: Methods and models. Boston:

    Birkhuser.Sargent, T. J. 1987.Dynamic macroeconomic theory. Cambridge, MA: Harvard University Press.Stokey, N., R. E. Lucas, with E. C. Prescott. 1989. Recursive methods in economic dynamics.

    Cambridge, MA: Harvard University Press.

    186 JOURNAL OF ECONOMIC EDUCATION

    DownloadedAt:15:

    0910April2011