Introduction to numerical methods for finance students

Handout 2 27/07/02 1

Lecture 1: Errors and Efficiency

Why numerical methods?

The solutions to most mathematical problems in the natural world cannot be determinedanalytically. If you are asked to compute the perimeter of a circle with radius r, you knowthat it is given by

C = 2πr . (1)

This is because the perimeter can be computed analytically, since we can use calculus tointegrate along the perimeter of a circle to yield

C =∮ds =

∫ 2π

0rdθ = rθ|2π0 = 2πr . (2)

Now let’s say you are asked to compute the perimeter of an ellipse with major and minoraxes of a and b, respectively. Using the same technique (albeit more involved), you obtain

C = 4∫ 2π

0

(a2 sin2 θ + b2 cos2 θ

)dθ =? . (3)

This integral is known as an elliptic integral of second type, and has no known closed-form solution. The only way, therefore, to obtain the perimeter of an ellipse, is to obtain itnumerically. Since there is no closed-form solution, you can only say that the perimeter ofthe ellipse is given by the function C(a, b), which you could call the “ellipse” function. Ifyour calculator had the “ellipse” function on it, it would compute it numerically, just as itcomputes the sine and cosine functions for you. There is no “closed form” solution to thesine function. Your calculator knows what the solution is because it can approximate thesine function using the first three terms of the Taylor series approximation

sin θ ≈ θ − θ3

3!+θ5

5!. (4)

Certainly, problems in the natural world can be much larger and more complex thanthe ellipse problem. If you are asked to calculate the drag on an automobile moving at100km/hr, you could calculate it exactly but you would need roughly 20, 000 years to do soon the fastest computers available today. Clearly you could not even conceive of solving thisproblem without numerical methods.

Types of errors

Referring back to the ellipse problem, if you did decide to solve it numerically, evidentlythere would be errors in the way you computed it. Since a computer can only approximatethe solution, the degree of accuracy is dependent on several factors.

Handout 2 27/07/02 2

Truncation error

Every problem can be solved in many different ways. The error associated with the numericalmethod you choose to solve the problem is termed the truncation error. In the Taylor seriesapproximation of the sine function, the exact solution is given by

sin θ =∞∑

n=0

θ2n+1

(2n+ 1)!(−1)n . (5)

Clearly, it is impossible to compute an infinite number of terms, so in this case, the truncationerror is given by the value of the terms you choose not to include in the calculation. Usingonly three terms, the trunction error appears in the approximation of the sine function as

sin θ = θ − θ3

3!+θ5

5!+∞∑

n=3

θ2n+1

(2n+ 1)!(−1)n . (6)

In this case,

Truncation error =∞∑

n=3

θ2n+1

(2n+ 1)!(−1)n . (7)

As the number of terms increases, the magnitude of the truncation decreases. No matterhow many terms are computed, however, it will always be an approximation.

Error resulting from problem assumptions

More often than not you are forced to solve a particular problem with assumptions thatmake the problem only an approximate solution. For example, consider the problem ofsound propagation in a concert hall. For practical purposes you use the wave equation todetermine the acoustical properties of the concert stadium and you assume that the speedof sound is constant everywhere in the room. No matter how accurately you solve thewave equation, you will be stuck with the error associated with this approximation. This isindependent of the numerical algorithm or the computer you use to solve the problem.

Error in original data

The original data you are given can also impose limits on the accuracy of the final solutionto your problem. If you are asked to predict the weather, no matter how accurate youralgorithm or equations are, if you don’t know the present conditions exactly, you won’t beable to have an exact solution for what will happen to the weather tomorrow.

Propagated error

Whether or not the original conditions are exact, any error that develops in the solution willpropagate through to other parts of the computation and manifest itself in the final solution.For example, suppose you have developed an algorithm that solves Fermat’s last theorem infive steps. If steps 2-4 are close to exact but their solution depends on the solution of step1, but step 1 is approximate, then the error associated with step 1 will manifest itself in the

Handout 2 27/07/02 3

final solution regardless of how accurate steps 2-4 are solved. This type of error shows up alot in time-dependent solutions that are impulsively started from rest. A great deal of errorresults initially. But even if the solver becomes more accurate as time progresses, the errorassociated with the initial conditions will propagate throughout the entire solution.

Human error and bugs

This form of error in a numerical method is what causes the last 1% of a particular numericalproject to take 99% of the time. Bugs arise in code development that are hidden to theprogrammer, who may think the error is resulting from the numerical method itself, whilea simple foolish error may result while deriving a particular method. These types of errorsresult even in the largest of projects that take millions of human- and computer-hours tocomplete. For example, the American National Aeronautics and Space Administration lostone of its Martian surveyors due to a misunderstanding of the units used to quantify thethruster forces!

Absolute vs. Relative error

Error is meaningless unless it is compared to the actual quantity that is being computed.This is true not only for numerical methods, but for all approximations in general. Forexample, if you are asked to estimate what the annual operating budget of your company is,but you can only estimate it to within R 100,000, then you are better off if your company isa billion rand per-year corporation than if it is a million rand-per-year corporation becauseyou will only be 0.01% off! If you work for a million rand-per-year company then you willbe 10% off, and certainly your boss will not be very happy. In this case the absolute error is

absolute error = |true value− approximate value| = R 100,000 , (8)

while the relative error is

relative error =absolute error

|true value| . (9)

The absolute error of R 100,000 is meaningless unless it is given as a ratio to the true value.In numerical methods, the relative error is used to determine whether or not a particular

solution is solved to within some predetermined level of accuracy.

Significant digits

When describing the accuracy of a numerical result, often it is useful to quantify it in terms ofthe number of significant digits in which it agrees to the true value. We usually write integersand rational numbers as their irrational counterparts with the last digit repeating in orderto determine the number of significant digits. In the following two examples, 1.0 = 0.9999.The last digit is only significant if it differs from the exact value by less than 5. If the lastdigit is within 4 of the exact value, then it is significant.

Handout 2 27/07/02 4

For example,

if true value = 0.99999999

and approximate value = 0.9998 ,

then significant digits = 4

since 9− 8 < 5 .

If the last digit differs by 5 or more, then it is not significant. For example,

if true value = 0.99999999

and approximate value = 0.9994 ,

then significant digits = 3

since 9− 4 ≥ 5 .

Computer round-off error

A computer can only store a discrete set of numbers on the real number line. These numbersare referred to as floating-point numbers. Every floating point number is stored in aparticular base system. Most of us think in base 10, and calculators usually work in base 10,but computers usually work in base 2 or base 16. Given its base, a floating-point numberhas three parts, namely, the sign, the fraction, and the exponent. As examples of floatingpoint numbers in base 10, As examples of floating point numbers in base 2,

Decimal number FP notation Sign Fraction Exponent (base 10)22.45 .2245× 102 + 2245 2−0.0227 .227× 10−2 − 227 −2

Decimal number Binary equiv. FP notation Sign Fraction Exponent (base 2)16.25 10000.01 .1000001× 25 + 1000001 5−.078125 −0.000101 −.101× 2−3 − 101 −3

A computer’s precision is a function of how much space it has available to store the sign,the fraction, and the exponent for each number. Let’s say a computer that employs base-10arithmetic can store the sign, two digits for the fraction, and the exponent must be one of−2,−1, 0, 1, 2. The number must then have the form

X = ±.d1d2 × 10p p ∈ {−2,−1, 0, 1, 2} d1, d2 ∈ {0, . . . , 9} . (10)

With this number system we are limited to −99 ≤ X ≤ −10−3 and 10−3 ≤ X ≤ 99. Clearly,numbers that do not fall in this range cannot exist on this computer. Those that are toolarge in absolute value are termed as either postive overflow or negative overflow, whilethose too small in absolute value are termed as either positive underflow or negativeunderflow. Because of the limited number system, severe round-off error would occur if

Handout 2 27/07/02 5

one attempted to perform calculations using this number system. For example, the number.02154 would be represented as .02.

The IEEE number system is the most common number system used on personal com-puters, and obviously it has much more versatility than the X number system above. Singleprecision IEEE floating point numbers use a total of 32 bits to store a number (includingthe sign, fraction, and exponent), while double precision IEEE floating point numbers use64 bits. The smallest double precision number is 2.225E − 308, and the largest number is8.988E307. Even so, because of the finite range of possible numbers, round-off error stilloccurs, and sometimes can even effect the outcome of a numerical result significantly.

How good is a numerical method?

Number of operations

If two numerical methods are used to achieve the same result, the numerical method that usedthe least number of operations is more efficient. Likewise, the numerical method in whichthe number of operations grows more slowly with the problem size is also more efficient.“Big O” notation is used to quantify this behavior of numerical algorithms. For example,consider an algorithm that computes the sum of N numbers, represented by

A1 =N∑

i=1

i . (11)

This requires a total of N − 1 operations. If N doubles, then the method effectively takestwice as many operations. In Big O notation, this is refered to as requiring O (N) operations.Now consider another algorithm that computes the sum of all of the possible distinct productsof a list of non-repeating numbers. Since there are a total of

Np = N + (N − 1) + (N − 2) + ...+ 2 + 1 (12)

possible products, then

Np =N∑

i=1

i =N(N + 1)

2. (13)

There are then a total of Np multiplications required, and then the sum takes another Np−1operations. Therefore, this algorithm takes a total of

2Np − 1 = N(N + 1)− 1 = N2 +N − 1 (14)

operations to complete. If the total number of entries N doubles, then for large N the totalnumber of calculations effectively quadruples. As N becomes very large, then, the methodis an O (N2) method.

Speed

A given algorithm will require a given number of “floating point operations”. However, dif-ferent people can implement the same algorithm in a different way on a computer. Likewise,

Handout 2 27/07/02 6

different computers can perform differently with the same algorithm. Therefore, coupledwith the total number of operations, it is desirable to compare the total time a certain al-gorithm takes to perform a certain number of operations. A good measure of an algorithmwhen it is implemented on a particular computer is its performance in FLOPS, or floatingpoint operations per second. The FLOPS count of an algorithm is given by

FLOPS =total number of operations

time to compute in seconds. (15)

The FLOPS performance of an algorithm for large-scale high-performance numerical com-putations is usually referred to in MegaFLOPS (106 FLOPS), GigaFLOPS (109 FLOPS), orTeraFLOPS (1012 FLOPS).

Handout 3 30/07/02 1

Lecture 2: Solution of nonlinear equations I

The model nonlinear equation

In this lecture we will be using numerical methods to determine the roots of the nonlinearequation f(x) = 0. As the model equation, we will compute the roots of

f(x) = x2 + x− 6 , (1)

whose shape is shown in Figure 1 and has roots at x = −3 and x = +2. We will use the

Figure 1: Graph of x2 + x− 6.

fact that the roots of f(x) are known exactly and compare them to the approximate valuescomputed with each numerical method that follows.

Bisection method

The bisection method starts with two guesses x1 and x2 such that f(x1) and f(x2) lie oneither side of the root so that f(x1)× f(x2) < 0. A guess is made by bisecting the intervalbetween the two guesses to yield a new guess of x3 = (x1 + x2)/2. The procedure repeats byusing x3 and the other guess that brackets the root with x3.

To find the root in between x1 = −4 and x2 = −1, we follow the steps outlined in thefollowing pseudocode:

Handout 3 30/07/02 2

Pseudocode for bisection

Start with two guesses x1 and x2 such that f(x1)× f(x2) < 0.

1. Make a guess for the root with x3 = (x1 + x2)/2.

2. If f(x1)× f(x3) < 0, then set x2 = x3.Otherwise set x1 = x3.

3. Repeat until |x1 − x2| < 2ε.

Using this technique we obtain the following history of the solution process with ε = 10−4:

Iteration x1 x2 x3 f(x3) |x1 − x2|/21 -4.0000 -1.0000 -2.5000 -2.2500 1.50002 -4.0000 -2.5000 -3.2500 1.3125 0.75003 -3.2500 -2.5000 -2.8750 -0.6094 0.37504 -3.2500 -2.8750 -3.0625 0.3164 0.18755 -3.0625 -2.8750 -2.9688 -0.1553 0.09386 -3.0625 -2.9688 -3.0156 0.0784 0.04697 -3.0156 -2.9688 -2.9922 -0.0390 0.02348 -3.0156 -2.9922 -3.0039 0.0195 0.01179 -3.0039 -2.9922 -2.9980 -0.0098 0.005910 -3.0039 -2.9980 -3.0010 0.0049 0.002911 -3.0010 -2.9980 -2.9995 -0.0024 0.001512 -3.0010 -2.9995 -3.0002 0.0012 0.000713 -3.0002 -2.9995 -2.9999 -0.0006 0.000414 -3.0002 -2.9999 -3.0001 0.0003 0.0002

The advantage of the bisection method is that we know exactly how accurate our solutionwill be given the number of iterations we use. Each guess halves the uncertainty associatedwith the previous one, as shown in the error history in Figure 2. The uncertainty of theinitial guess will be that the root is given by

ROOTfirstguess = x3 ±|x1 − x2|

2. (2)

The next guess will yield an error that is half that much, and so on, so that the nth guesswill yield a root given by

ROOTnthguess = x3 ±|x1 − x2|

2n. (3)

The other advantage of the bisection method is that it is guaranteed to find a root aslong as the first two guesses bracket one. Even if the function is multivalued between the

Handout 3 30/07/02 3

Figure 2: Error history of the bisection method.

first two guesses a root will be found. But which root is found depends on the initial valuesof x1 and x2. For example, consider the function

f(x) = x3 − 5x2 + 3x+ 1 , (4)

whose roots are given by 4.2361, 1.0000, and −0.2361. If the bisection method is usedwith x1 = −1 and x2 = 6, as shown in Figure 3, the result will be x3 = 4.2361 becausef((x1 + x2)/2) = f(2.5) < 0. If bisection is used with x1 = −6 and x2 = 6, then the

Figure 3: Graph of x3 − 5x2 + 3x+ 1, showing the roots at x3 = −0.2361, 1.0, and 4.2361

result will be x3 = −0.2361. The middle root at x3 = 1.0 can only be found by choosing−0.2361 < x1 < 1.0 and 1.0 < x2 < 4.2361.

Handout 3 30/07/02 4

The only drawback to the bisection method is that the rate of convergence is rather slow.The following methods will yield faster convergence but are not as easy to implement norare they necessarily as robust.

Newton’s method

Given a function f(x) = x2 +x−6, if we would like to find a root then we can use the Taylorseries expansion of the function about some initial guess to help us find the root. If we startat x1, then we can obtain information about the function in the vicinity of that point, sayat x2 = x1 + ∆x, with the Taylor series about x1,

f(x2) = f(x1 + ∆x) = f(x1) + ∆xdf

dx

∣∣∣∣∣x=x1

+HOT (5)

f(x2) ≈ f(x1) + (x2 − x1)df

dx

∣∣∣∣∣x=x1

. (6)

If we are looking for the root f(x2) = 0, then we can use the Taylor series to approximatewhat value of x2 will yield f(x2) = 0, so that we can solve for what x2 needs to be in orderto approximate the root,

f(x2) ≈ f(x1) + (x2 − x1)df

dx

∣∣∣∣∣x=x1

(7)

0 = f(x1) + (x2 − x1)df

dx

∣∣∣∣∣x=x1

(8)

x2 = x1 −f(x1)

f ′(x1). (9)

Because this will yield an approximation to the root, we need to continue this procedureuntil f(x2) < ε or |x2 − x2| < ε. The following is a pseudocode for Newton’s method:

Pseudocode for Newton’s method

1. Choose a starting value x1.

2. Shoot for the root with x2 = x1 − f(x1)/f ′(x1).

3. If |x2 − x1| < ε or f(x2) < ε then done!

4. Otherwise set x1 = x2 and return to step 2.

Which root is found depends on the initial guess x1. If the function f(x) = x2 + x− 6 andthe initial guess is x1 = −4, then the root is x2 = −3. The results of this iteration are shownin the table below with ε = 10−5. The convergence history of the error is shown in Figure 4.Comparing the results from the bisection method to Newton’s method, we see that Newton’s

Handout 3 30/07/02 5

Iteration x1 x2 f(x2) |x1 − x2|/21 -4.0000 -3.1429 0.7347 0.85712 -3.1429 -3.0039 0.0193 0.13903 -3.0039 -3.0000 0.0000 0.00394 -3.0000 -3.0000 0.0000 0.0000

Figure 4: Error history of Newton’s method.

method converges in only 4 steps while the bisection method converges in 14 steps, and thetolerance for Newton’s method is 10 times smaller! This is because the error of each step inNewton’s method is roughly equal to the square of the error of the previous step, whereasthe error in the Bisection method is only half of the previous step.

Which root is obtained depends on the initial guess x1. How the root depends on theguess is given below. The particular root found obeys the rule of thumb of Newton’s method,

Initial guess x1 Root found−∞ < x1 < −1

2-3

−12< x1 < +∞ +2

in which the root finder alwasy goes “downhill” from the initial guess, as shown in Figure 5.This rule of thumb is what leads to one drawback of Newton’s method. Figure 6 shows afunction in which the root finder will fail because it will always move downhill towards theminimum of the function, which is not the root. The second drawback of Newton’s methodis that it requires that we evaluate the function as well as its derivative at every iteration.For larger calculations this can be quite a drawback. Also, more often than not, it is difficultto compute the derivative f ′(x) because the closed-form solution of f(x) may not be knownin advance.

Handout 3 30/07/02 6

Figure 5: The dependence of the root on the initial guess for the Newton method. Theroot-finder always goes “downhill”.

Figure 6: Newton’s method will fail if the initial guess x1 is chosen to the right of the dashedvertical line.

Handout 4 01/08/02 1

Lecture 3: Solution of nonlinear equations II

Secant method

The secant method is used as an alternative to Newton’s method when the derivative f ′(x)is not known analytically. It derives its name from a line drawn through two points in acurve, which is called a “secant”. From Figure 1, we see that the approximate root xNR can

Figure 1: Illustration of Newton’s method being used to approximate the root xR at xNR.

be obtained with Newton’s method using

xNR = x0 −f(x0)

f ′(x0). (1)

When the derivative is not known, then we need to choose two points x1 and x2 in the vicinityof x0 and write Newton’s method using the approximate values at x0 that are obtained fromthe known quantities at x1 and x2. As shown in Figure 2, if the function is assumed to belinear, then the approximate value of x0 is

x0 ≈1

2(x1 + x2) . (2)

The approximate values of f(x0) and its derivative are then given by

f(x0) ≈ 1

2[f(x1) + f(x2)] (3)

f ′(x0) ≈ f(x2)− f(x1)

x2 − x1

. (4)

Substituting these approximations into the formula for Newton’s method (1) yields

Handout 4 01/08/02 2

Figure 2: Illustration of the secant method being used to approximate the root xR at x3.

x3 =1

2(x1 + x2)−

12[f(x1) + f(x2)]f(x2)−f(x1)

x2−x1

, (5)

which, after some manipulation, yields

x3 = x2 − f(x2)x1 − x2

f(x1)− f(x2). (6)

This same result could have been obtained by using similar triangles, for which, as shown inFigure 3,

tan θ =f(x1)− f(x2)

x2 − x1

=f(x2)

x3 − x2

. (7)

Figure 3: Illustration of the secant method being used to approximate the root xR at x3

with similar triangles.

When we use the two points x1 and x2 to obtain a guess for x3, x3 will be closer to theroot than either of our two initial guesses. This is usually the case as long as x2 is closer tothe root than x1. If not, then the method will just take longer to converge as long as f(x)

Handout 4 01/08/02 3

is continuous. In order to speed up convergence, we swap x1 and x2 if x1 is closer to theroot. A simple test for determining whether or not we need to swap is if |f(x1)| < |f(x2)|,which works most of the time, and in the cases that it doesn’t work, then the algorithm stillconverges, just at a slower rate. The pseudocode for the secant method is shown below.

Pseudocode for the secant method

1. Start with two guesses x1 and x2 near the root.

2. If |f(x1)| < |f(x2)| then swap x1 and x2 withSet xtemp = x2

Set x2 = x1

Set x1 = xtemp

3. Make a guess for the root with x3 = x2 − f(x2) x1−x2

f(x1)−f(x2).

4. Set x1 = x2

Set x2 = x3

5. Return to step 3 until |f(x3)| < ε.

Figures 4 and 5 demonstrate the effect of swapping the initial guesses with the functionf(x) = 3x + 10 sin(x). The non-swapped cases still converge to within ε = 10−5, but at aslower rate, and, as shown in Figure 5, swapping does not have to change the convergencehistory by very much. Nevertheless, it is always a good idea to swap x1 and x2 when|f(x1)| < |f(x2)|.

Figure 4: Demonstration of the effect of swapping the initial guesses when x1 = 1 and x2 = 6.

Handout 4 01/08/02 4

Figure 5: Demonstration of the effect of swapping the initial guesses when x1 = 1 and x2 = 8.

Linear interpolation

When the slope is relatively large near the root, the secant method can overshoot the rootand slow down convergence. This can be remedied by starting with guesses for x1 and x2

that bracket the root, and making sure that the subsequent values of x1 and x2 bracketthe root as well. This is essentially the same as the bisection method, except each time theestimate for the root is interpolated linearly rather than bisected. Figure 6 depicts the secantinterpolation diagram for the case when x1 and x2 bracket the root. The pseudocode for

Figure 6: Illustration of the linear interpolation method being used to approximate the rootxR at x3 with guesses x1 and x2 that bracket the root.

the linear interpolation method is identical to that for the secant method, except additionalsteps are required to ensure that x1 and x2 always bracket the root.

Handout 4 01/08/02 5

Pseudocode for the linear interpolation method

1. Start with two guesses x1 and x2 such that f(x1)f(x2) < 0.

2. Make a guess for the root with x3 = x2 − f(x2) x1−x2

f(x1)−f(x2).

3. If f(x1)f(x3) < 0Set x2 = x3

Otherwise set x1 = x3.

4. Return to step 3 until |f(x3)| < ε.

Fixed-point iteration

When the equation f(x) = 0 can be written in the form x = g(x), then the roots r thatsatisfy f(r) = 0 are known as the fixed points for the function g. The fixed-point iterationdetermines the roots (under the right conditions) using

xn+1 = g(xn) n = 0, 1, . . . . (8)

The requirements are that g(x) and g′(x) be continuous within an interval surrounding aroot xR and that |g′(x)| < 1 within that interval. If x1 is chosen so that it is within theinterval then the method will converge to xR. This is a sufficient condition for convergence,but is not a necessary one. That is, other cases may converge even if they do not satisfy|g′(x)| < 1 in the interval. The pseudocode for fixed-point iteration is given below.

Pseudocode for fixed-point iteration

1. Start with an initial guess x1 such that |g′(x1)| < 1

2. Make a guess for the root with x2 = g(x1).

3. If |x2 − x1| < ε then done!Otherwise set x1 = x2 and return to step 2.

Handout 5 05/08/02 1

Lecture 4: Numerical differentiation

Finite difference formulas

Suppose you are given a data set of N non-equispaced points at x = xi with values f(xi)as shown in Figure 1. Because the data are not equispaced in general, then ∆xi 6= ∆xi+1.

Figure 1: Data set of N points at x = xi with values f(xi)

Let’s say we wanted to compute the derivative f ′(x) at x = xi. For simplicity of notation,we will refer to the value of f(x) at x = xi as fi. Because, in general, we do not know theform of f(x) when dealing with disrete points, then we need to determine the derivatives off(x) at x = xi in terms of the known quantities fi. Formulas for the derivatives of a dataset can be derived using Taylor series.

The value of f(x) at x = xi+1 can be written in terms of the Taylor series expansion off about x = xi as

fi+1 = fi + ∆xi+1f′i +

∆x2i+1

2f ′′i +

∆x3i+1

6f ′′′i +O

(∆x4

i+1

). (1)

This can be rearranged to give us the value of the first derivative at x = xi as

f ′i =fi+1 − fi

∆xi+1

− ∆xi+1

2f ′′i −

∆x2i+1

6f ′′′i +O

(∆x3

i+1

). (2)

If we assume that the value of f ′′i does not change significantly with changes in ∆xi+1, thenthis is the first order derivative of f(x) at x = xi, which is written as

f ′i =fi+1 − fi

∆xi+1

+O (∆xi+1) . (3)

Handout 5 05/08/02 2

This is known as a forward difference. The first order backward difference can be obtainedby writing the Taylor series expansion about fi to obtain fi−1 as

fi−1 = fi −∆xif′i +

∆x2i

2f ′′i −

∆x3i

6f ′′′i +O

(∆x4

i

), (4)

which can be rearranged to yield the backward difference of f(x) at xi as

f ′i =fi − fi−1

∆xi+O (∆xi) . (5)

The first order forward and backward difference formulas are first order accurate approxima-tions to the first derivative. This means that decreasing the grid spacing by a factor of twowill only increase the accuracy of the approximation by a factor of two. We can increase theaccuracy of the finite difference formula for the first derivative by using both of the Taylorseries expansions about fi,

fi+1 = fi + ∆xi+1f′i +

∆x2i+1

2f ′′i +

∆x3i+1

6f ′′′i +O

(∆x4

i+1

)(6)

fi−1 = fi −∆xif′i +

∆x2i

2f ′′i −

∆x3i

6f ′′′i +O

(∆x4

i

). (7)

Subtracting equation (7) from (6) yields

fi+1 − fi−1 = (∆xi+1 + ∆xi) f′i +

∆x2i+1 −∆x2

i

2f ′′i +

∆x3i+1 + ∆x3

i

6f ′′′i

+ O(∆x4

i+1

)+O

(∆x4

i

)

fi+1 − fi−1

∆xi+1 + ∆xi= f ′i +

∆x2i+1 −∆x2

i

2 (∆xi+1 + ∆xi)f ′′i +

∆x3i+1 + ∆x3

i

6 (∆xi+1 + ∆xi)f ′′′i

+ O(

∆x4i+1

∆xi+1 + ∆xi

)+O

(∆x4

i

∆xi+1 + ∆xi

), (8)

which can be rearranged to yield

f ′i =fi+1 − fi−1

∆xi+1 + ∆xi− ∆x2

i+1 −∆x2i

2 (∆xi+1 + ∆xi)f ′′i +O

(∆x3

i+1 + ∆x3i

6 (∆xi+1 + ∆xi)

)(9)

In most cases if the spacing of the grid points is not too eratic, such that ∆xi+1 ≈ ∆xi,equation (9) can be written as the central difference formula for the first derivative as

f ′i =fi+1 − fi−1

2∆xi+O

(∆x2

i

). (10)

What is meant by the “order of accuracy”?

Suppose we are given a data set of N = 16 points on an equispaced grid as shown in Figure2, and we are asked to compute the first derivative f ′i at i = 2, . . . , N − 1 using the forward,backward, and central difference formulas (3), (5), and (10). If we refer to the approximation

Handout 5 05/08/02 3

Figure 2: A data set consisting of N = 16 points.

of the first derivative as δfδx

, then these three formulas for the first derivative on an equispacedgrid with ∆xi = ∆x can be approximated as

Forward differenceδf

δx=

fi+1 − fi∆x

, (11)

Backward differenceδf

δx=

fi − fi−1

∆x, (12)

Central differenceδf

δx=

fi+1 − fi−1

2∆x. (13)

These three approximations to the first derivative of the data shown in Figure 2 are shownin Figure 3. Now let’s say we are given five more data sets, each of which defines the samefunction f(xi), but each one has twice as many grid points as the previous one to define thefunction, as shown in Figure 4. The most accurate approximations to the first derivativeswill be those that use the most refined data with N = 512 data points. In order to quantifyhow much more accurate the solution gets as we add more data points, we can compare thederivative computed with each data set to the most resolved data set. To compare them, wecan plot the difference in the derivative at x = 0.5 and call it the error, such that

Error =

∣∣∣∣∣δf

δx

)

n

− δf

δx

)

n=6

∣∣∣∣∣ (14)

where n = 1, . . . , 5 is the data set and n = 6 corresponds to the most refined data set. Theresult is shown in Figure 5 on a log-log plot. For all three cases we can see that the errorclosely follows the form

Error = k∆xn , (15)

where k = 1.08 and n = 1 for the forward and backward difference approximations, andk = 8.64 and n = 2 for the central difference approximation. When we plot the error of anumerical method and it follows the form of equation (15), then we say that the method is

Handout 5 05/08/02 4

Figure 3: Approximation to the first derivative of the data shown in Figure 2 using threedifferent approximations.

nth order and that the error can be written as O (∆xn). Because n = 1 for the forward andbackward approximations, they are said to be first order methods, while since n = 2 forthe central approximation, it is a second order method.

Taylor tables

The first order finite difference formulas in the previous sections were written in the form

df

dx=δf

δx+ Error , (16)

where δfδx

is the approximate form of the first derivative dfdx

with some error that determinesthe order of accuracy of the approximation. In this section we define a general method ofestimating derivatives of arbitrary order of accuracy. We will assume equispaced points, butthe analysis can be extended to arbitrarily spaced points. The nth derivative of a discretefunction fi at points x = xi can be written in the form

dnf

dxn

∣∣∣∣∣x=xi

=δnf

δxn+O (∆xm) , (17)

whereδnf

δxn=

j=+Nr∑

j=−Nlaj+Nlfi+j , (18)

and m is the order of accuracy of the approximation, aj+Nl are the coefficients of the ap-proximation, and Nl and Nr define the width of the approximation stencil. For example, inthe central difference approximation to the first derivative,

f ′i = − 1

2∆xfi−1 + 0fi +

1

2∆xfi+1 +O

(∆x2

), (19)

= a0fi−1 + a1fi + a2fi+1 +O(∆x2

). (20)

Handout 5 05/08/02 5

Figure 4: The original data set and 5 more, each with twice as many grid points as theprevious one.

In this case, Nl = 1, Nr = 1, a0 = −1/2∆x, a1 = 0, and a2 = +1/2∆x. In equation (18) thediscrete values fi+j can be written in terms of the Taylor series expansion about x = xi as

fi+j = fi + j∆xf ′i +(j∆x)2

2f ′′i + ... (21)

= fi +∞∑

k=1

(j∆x)k

k!f

(k)i . (22)

Using this Taylor series approximation with m+ 2 terms for the fi+j in equation (18), wherem is the order of accuracy of the finite difference formula, we can substitute these values intoequation (17) and solve for the coefficients aj+Nl to derive the appropriate finite differenceformula.

As an example, suppose we would like to determine a second order accurate approxima-tion to the second derivative of a function f(x) at x = xi using the data at xi−1, xi, and

Handout 5 05/08/02 6

Figure 5: Depiction of the error in computing the first derivative for the forward, backward,and central difference formulas

xi+1. Writing this in the form of equation (17) yields

d2f

dx2=δf

δx+O

(∆x2

), (23)

where, from equation (18),δf

δx= a0fi−1 + a1fi + a2fi+1 . (24)

The Taylor series approximations to fi−1 and fi+1 to O (∆x4) are given by

fi−1 ≈ fi −∆xf ′i +∆x2

2f ′′i −

∆x3

6f ′′′i +

∆x4

24f ivi , (25)

fi+1 ≈ fi + ∆xf ′i +∆x2

2f ′′i +

∆x3

6f ′′′i +

∆x4

24f ivi , (26)

Rather than substitute these into equation (24), we create a Taylor table, which requiresmuch less writing, as follows. If we add the columns in the table then we have

a0fi−1 + a1fi + a2fi+1 = (a0 + a1 + a2)fi + (−a0 + a2)∆xf ′i + (a0 + a2)∆x2

2f ′′i

+ (−a0 + a2)∆x3

6f ′′′i + (a0 + a2)

∆x4

24f ivi . (27)

Handout 5 05/08/02 7

Term in (24) fi ∆xf ′i ∆x2f ′′i ∆x3f ′′′i ∆x4f ivia0fi−1 a0 −a0 a0/2 −a0/6 a0/24a1fi a1 0 0 0 0a2fi+1 a2 a2 a2/2 a2/6 a2/24

0 0 1 ? ?

Because we would like the terms containing fi and f ′i on the right hand side to vanish, thenwe must have a0 + a1 + a2 = 0 and −a0 + a2 = 0. Furthermore, since we want to retain thesecond derivative on the right hand side, then we must have a0 + a2 = 1. This yields threeequations in three unknowns for a0, a1, and a2, namely,

a0 + a1 + a2 = 0−a0 + a2 = 0a0/2 + a2/2 = 1

, (28)

in which the solution is given by a0 = a2 = 1 and a1 = −2. Substituting these values intoequation (27) results in

fi−1 − 2fi + fi+1 = ∆x2f ′′i +∆x4

12f ivi , (29)

which, after rearranging, yields the second order accurate finite difference formula for thesecond derivative as

f ′′i =fi−1 − 2fi + fi+1

∆x2+O

(∆x2

), (30)

where the error term is given by

Error = −∆x2

12f ivi . (31)

As another example, let us compute the second order accurate one-sided difference for-mula for the first derivative of f(x) at x = xi using xi, xi−1, and xi−2. The Taylor tablefor this example is given below. By requring that a0 + a1 + a2 = 0, −2a0 − a1 = 1, and

Term fi ∆xf ′i ∆x2f ′′i ∆x3f ′′′i ∆x4f ivia0fi−2 a0 −2a0 2a0 −4a0/3 2a0/3a1fi−1 a1 −a1 +a1/2 −a1/6 a1/24a2fi a2 0 0 0 0

0 1 0 ? ?

2a0 + a1/2 = 0, we have a0 = 1/2, a1 = −2, and a2 = 3/2. Therefore, the second orderaccurate one-sided finite difference formula for the first derivative is given by

df

dx=fi−2 − 4fi−1 + 3fi

2∆x+O

(∆x2

), (32)

where the error term is given by

Error =∆x2

3f ′′′i . (33)

Handout 5 05/08/02 8

Higher order finite difference formulas can be derived using the Taylor table method describedin this section. These are shown in Applied numerical analysis, sixth edition, by C. F. Gerald& P. O. Wheatley, Addison-Welsley, 1999., pp. 373-374.

Handout 6 06/08/02 1

Lecture 5: Numerical integration

Discretizing the integral

In this lecture we will derive methods of computing integrals of functions that cannot besolved analytically. A typical function that cannot be solved analytically is the error function,

Erf(y) =2√π

∫ y

0e−x

2

dx . (1)

To leave the analysis in its most general form, we will consider an evaluation of the integral

∫ b

af(x) dx . (2)

This integral is evaluated numerically by splitting up the domain [a, b] into N equally spacedintervals as shown in Figure 1. Because we assume that the intervals are constant, then theinterval width is given by

Figure 1: Discretization of a function f(x) into N = 8 equally spaced subintervals over [a, b].

h = ∆x = xi+1 − xi . (3)

The idea behind the numerical integration formulas is to approximate the integral in eachsubinterval and add up the N approximate integrals to obtain the integral over [a, b].

Handout 6 06/08/02 2

Trapezoidal rule

The Trapezoidal rule approximates the function within each subinterval using the first termin the Taylor series expansion about xi, such that, in the range [xi, xi+1],

f(x) = fi + (x− xi)f ′i +1

2(x− xi)2f ′′i +O

((x− xi)3

). (4)

Using this approximation, we can evaluate the integral over [xi, xi+1] with∫ xi+1

xif(x) dx =

∫ xi+1

xifi + (x− xi)f ′i +

1

2(x− xi)2f ′′i dx , (5)

where we have omitted the truncation error term since the last term will end up being theerror term in the analysis. Making a change of variables such that

s =x− xixi+1 − xi

=x− xih

, (6)

we have∫ xi+1

xif(x) dx = h

∫ 1

0fi + hsf ′i +

1

2h2s2f ′′i ds ,

= hsfi +1

2h2s2f ′i +

1

6h3s3f ′′i

∣∣∣∣1

0,

= hfi +1

2h2f ′i +

1

6h3f ′′i .

Substituting in an approximation for the first derivative

f ′i =fi+1 − fi

h− h

2f ′′i , (7)

we have∫ xi+1

xif(x) dx = hfi +

1

2h2

(fi+1 − fi

h− h

2f ′′i

)+

1

6h3f ′′i ,

=1

2h (fi + fi+1)− 1

12h3f ′′i . (8)

Which shows that the Trapezoidal rule approximates the integral of the function over thesubinterval [xi, xi+1] as the area of the trapezoid created by the function values at fi andfi+1, as shown in Figure 2.

The integral over [a, b] is evaluated by taking the sum of the approximate integrals eval-uated in each subinterval as

∫ b

af(x) dx =

N−1∑

i=0

∫ xi+1

xif(x) dx ,

=N−1∑

i=0

[1

2h (fi + fi+1)− 1

12h3f ′′i

],

=1

2h (f0 + 2f1 + 2f2 + . . .+ 2fN−2 + 2fN−1 + fN)− h3

12

N−1∑

i=0

f ′′i .

Handout 6 06/08/02 3

Figure 2: Depiction of how the trapezoidal rule approximates the integral on the subinterval[xi, xi+1].

The error term is given by

Error = −h3

12

(f ′′0 + f ′′1 + f ′′2 + . . .+ f ′′N−1

),

= −Nh3

12

(f ′′0 + f ′′1 + f ′′2 + . . .+ f ′′N−1

N

).

If the mean value of f ′′i is given by

(f ′′0 + f ′′1 + f ′′2 + . . .+ f ′′N−1

N

), (9)

then we know that it must lie within the bounds of f ′′(x), and hence it can be representedas f ′′(ξ) for some ξ such that

f ′′(ξ) =

(f ′′0 + f ′′1 + f ′′2 + . . .+ f ′′N−1

N

). (10)

Therefore, since Nh = (b− a), the error becomes

Error = −(b− a)h2

12f ′′(ξ) = O

(h2), (11)

which shows that the trapezoidal rule is second order accurate.

Simpson’s rules

Simpson’s 1/3 rule

Simpson’s 1/3 rule approximates the function within the interval [xi, xi+2] as a quadratic,as shown in Figure 3. This is done by writing the Taylor series expansion of f(x) about

Handout 6 06/08/02 4

Figure 3: Depiction of how the Simpson’s 1/3 rule approximates the function f(x) with aquadratic through xi, xi+1, and xi+2.

x = xi+1 to obtain

f(x) = fi+1 + (x− xi+1)f ′i+1 +1

2(x− xi+1)2f ′′i+1 +

1

6(x− xi+1)3f ′′′i+1

+1

24(x− xi+1)4f

(iv)i+1 +O

((x− xi+1)5

).

The integral in the subinterval [xi, xi+2] is then given by

∫ xi+2

xif(x) dx =

∫ xi+2

xifi+1 + (x− xi+1)f ′i+1 +

1

2(x− xi+1)2f ′′i+1 +

1

6(x− xi+1)3f ′′′i+1

+1

24(x− xi+1)4f

(iv)i+1 dx ,

where the truncation error has been left off since the last term will end up being the error.Making a change of variables such that

s =2(x− xi+1)

xi+2 − xi=x− xi+1

h, (12)

we have∫ xi+2

xif(x) dx = h

∫ +1

−1fi+1 + hsf ′i+1 +

1

2h2s2f ′′i+1 +

1

6h3s3f ′′′i+1 +

1

24h4s4f

(iv)i+1 ds , (13)

which becomes

∫ xi+2

xif(x) dx = hsfi+1 +

1

2h2s2f ′i+1 +

1

6h3s3f ′′i+1 +

1

24h4s4f ′′′i+1 +

1

120h5s5f

(iv)i+1

∣∣∣∣+1

−1,

= 2hfi+1 +1

3h3f ′′i+1 +

1

60h5f

(iv)i+1 .

Handout 6 06/08/02 5

Using the second order accurate approximation to the second derivative

f ′′i+1 =fi − 2fi+1 + fi+2

h2− h2

12f

(iv)i+1 , (14)

the integral becomes

∫ xi+2

xif(x) dx = 2hfi+1 +

1

3h3

(fi − 2fi+1 + fi+2

h2− h2

12f

(iv)i+1

)+

1

60h5f

(iv)i+1 .

=1

3h (fi + 4fi+1 + fi+2)− 1

90h5f

(iv)i+1 . (15)

The integral over [a, b] is taken by taking the sum of the approximate integrals, as in

∫ b

af(x) dx =

N/2∑

i=0

∫ xi+2

xif(x) dx ,

=N/2∑

i=0

[1

3h (fi + 4fi+1 + fi+2)− 1

90h5f

(iv)i+1

].

The sum is given by

13h( f0 + 4f1 + f2 +

f2 + 4f3 + f4 +f4 + 4f5 + f6

+ . . . +fN−6 + 4fN−5 + fN−4 +

fN−4 + 4fN−3 + fN−2 +fN−2 + 4fN−1 + fN ) ,

which becomes

∫ b

af(x) dx =

1

3(f0 + 4f1 + 2f2 + 4f3 + . . .+ 4fN−3 + 2fN−2 + 4fN−1 + fN)

− 1

90h5

N/2∑

i=0

f(iv)i+1 .

The error term is given by

Error = − 1

90h5

N/2∑

i=0

f(iv)i+1 , (16)

which, using the same arguments as those for the Trapezoidal rule, becomes

Error = − 1

180(b− a)h4f (iv)(ξ) = O

(h4), (17)

which shows that Simpson’s 1/3 rule is fourth order accurate.

Handout 6 06/08/02 6

Simpson’s 3/8 rule

Simpson’s 3/8 rule approximates the function within the subinterval [xi, xi+3] using a quartic.The Taylor series expansion is performed about xi+3/2 to obtain

f(x) = fi+3/2 + (x− xi+3/2)f ′i+3/2 +1

2(x− xi+3/2)2f ′′i+3/2 +

1

6(x− xi+3/2)3f ′′′i+3/2

+1

24(x− xi+3/2)4f (iv)i+3/2 +O

((x− xi+3/2)5

). (18)

Integrating this function in a similar manner to that used for the 1/3 rule yields

∫ b

af(x) dx =

3

8h (f0 + 3f1 + 3f2 + 2f3 + 3f4 + 3f5 + . . .

+ 2fN−3 + 3fN−2 + 3fN−1 + fN) (19)

− 1

80(b− a)h4f (iv)(ξ) .

Summary of integration formulas and pseudocodes

Trapezoidal rule

∫ b

af(x) dx =

1

2h (f0 + 2f1 + 2f2 + . . .+ 2fN−2 + 2fN−1 + fN) + Error

Error = − 1

12(b− a)h2f ′′(ξ) = O

(h2)

1. If fi and h are already known discretely on an equispaced grid with N + 1 points,then proceed to step 2.Otherwise, choose interval [a, b] and set h = (b− a)/N .for i = 1 to N + 1

Set xi = a+ h(i− 1)Set fi = f(xi)

end

2. Set I = 0for i = 2 to N

Set I = I + hfiendSet I = I + 1

2h(f1 + fN+1)

3. The integral is given by I.

Handout 6 06/08/02 7

Simpson’s 1/3 rule (N divisible by 2)

∫ b

af(x) dx =

1

3(f0 + 4f1 + 2f2 + 4f3 + . . .+ 4fN−3 + 2fN−2 + 4fN−1 + fN)

+ Error

Error = − 1

180(b− a)h4f (iv)(ξ) = O

(h4)

1. If fi and h are already known discretely on an equispaced grid with N + 1 points,where N is even, then proceed to step 2.Otherwise, choose interval [a, b] and set h = (b− a)/N , with N even.for i = 1 to N + 1


end


2

Set I = I + 43hf2i

endfor i = 1 to N

2− 1

Set I = I + 23hf2i+1

endSet I = I + 1

3h(f1 + fN+1)


Handout 6 06/08/02 8

Simpson’s 3/8 rule (N divisible by 3)

∫ b

af(x) dx =

3

8h (f0 + 3f1 + 3f2 + 2f3 + 3f4 + 3f5 + . . .

+ 2fN−3 + 3fN−2 + 3fN−1 + fN) (20)

+ Error

Error = − 1

80(b− a)h4f (iv)(ξ) .

1. If fi and h are already known discretely on an equispaced grid with N + 1 points,where N is divisible by 3, then proceed to step 2.Otherwise, choose interval [a, b] and set h = (b− a)/N , with N divisible by 3.for i = 1 to N + 1


end


Set I = I + 98hfi

endfor i = 1 to N

3− 1

Set I = I − 38hf3i+1

endSet I = I + 3

8h(f1 + fN+1)


Handout 8 13/08/02 1

Lecture 6: Review of ODEs

Types of ODEs

Homogeneous vs. inhomogeneous ODEs

The most basic first order ordinary differential equation can be written as

dy

dt+ λy = 0 . (1)

We have written it in terms of t but it can also be in terms of x, or any other variable.The dependent variable is y, while the independent variable is t. This is a first orderhomogeneous ODE. It is first order because the highest derivative is the first derivative ofy(t), and, because it can be written in terms of the function y(t) and its derivatives only, itis homogeneous. By integrating the equation we see that the general solution is given by

y(t) = ae−λt , (2)

where a is some constant that is determined by the initial condition which can either be interms of y(0) or y′(0). An example of a first order inhomogeneous ODE is given by

dy

dt+ λy = z(t) , (3)

which is inhomogeneous because the general solution is given by the sum of the homogenoussolution, say yh(t), and the particular solution yp(t), so that y(t) = yh(t) + yp(t), where thehomogeneous solution is given by the solution of

dyhdt

+ λyh = 0 , (4)

and the particular solution solves

dypdt

+ λyp = z(t) . (5)

Linear vs. nonlinear ODEs

An ODE will be nonlinear when the terms consist of products of the dependent variable.For example,

dy

dt+ λy = 0 , (6)

is a linear homogeneous first order ODE, while

dy

dt+ λy2 = 0 (7)

Handout 8 13/08/02 2

and

ydy

dt+ λy = 0 (8)

are nonlinear, because they consist of products of the dependent variable y(t) and itsderivatives yn(t) (Remember that the zeroth derivative is the function itself, i.e. y0(t) = y(t)).

An easy way to check to see if an equation is linear, substitute in the depedent variabley times a constant ky, and see if the equation changes. For example, substituting in ky fory in equation (6), we have

d(ky)

dt+ λ(ky) = 0 → dy

dt+ λy = 0 , (9)

which does not change the equation. However, if we make the same substitution into equation(7), we have

d(ky)

dt+ λ(ky)2 = 0 → dy

dt+ λky2 = 0 , (10)

which does change the ODE. Therefore, equation (7) is nonlinear.

Constant vs. non-constant coefficients

If an ODE consists of products of the dependent variable and the independent variable, thenit is an ODE with non-constant coefficients. For example,

dy

dt+ λyt = 0 (11)

and

sin(t)dy

dt+ λy = 0 (12)

are linear homogeneous first order ODEs, but because they consists of products of yn(t)and functions of t, they are ODEs with non-constant coefficients.

Higher order and systems of ODEs

In the previous examples, all of the ODEs have consisted of first order ODEs. Higherorder ODEs consist of higher order derivatives of the dependent variable, such as

d2y

dt2+ λy = 0 , (13)

which is a second order linear homogeneous ODE, and

d4y

dx2+ λ

dy

dx= 0 , (14)

which is a fourth order linear homogeneous ODE.

Handout 8 13/08/02 3

All higher order ODEs can be written as systems of ODEs, in that instead of beingwritten as an ODE in one variable, they are written as an ODE of several variables. Thesecond order ODE

d2y

dt2+ λy = 0 , (15)

can be written as a system of two first order ODEs if we let y1 = y and y2 = dy/dt, sothat

dy1

dt= y2 , (16)

dy2

dt= −λy1 . (17)

If we define the vector y as

y =

(y1

y2

)(18)

and the matrix A as

A =

(0 1−λ 0

), (19)

then the system of ODEs in equations (16) and (17) can be written in matrix-vector form as

d

dt

(y1

y2

)=

(0 1−λ 0

)(y1

y2

), (20)

or, in more compact notation,dy

dt= Ay . (21)

As another example, consider the fourth order linear inhomogenous ODE given by

d4y

dx4+ λy

dy

dx= 0 . (22)

If we let

y =

y1

y2

y3

y4

=

yy′

y′′

y′′′

, (23)

then the ODE in (22) can be written as four first order ODEs as

dy1

dx= y2 ,

dy2

dx= y3 ,

dy3

dx= y4 ,

dy4

dx= −λy1y2 ,

Handout 8 13/08/02 4

or, in matrix form,dy

dx= Ay , (24)

where

A =

0 1 0 00 0 1 00 0 0 1−λy2 0 0 0

. (25)

This method of rewriting ODEs in matrix form forms the basis for the numerical solution ofODEs of all order, since we can derive algorithms to solve first order ODEs and apply thoseto their counterparts in matrix-vector form.

Initial and boundary conditions

All of the preceding examples considered different types of ODEs. But just as importantas the ODEs themselves are the initial and boundary conditions. An ODE without initialor boundary conditions is like a brain without a body, or a ship without any water. TheODE itself determines the general solution of a problem. It can only be written in termsof unknown coefficients. But the initial and boundary conditions determine what thosecoefficients must be in order to solve the problem.

Initial conditions are specified for time-dependent problems, while boundary conditionsare specified for space-dependent problems. The number of initial or boundary conditionsrequired depends on the order of the ODE. Consider, for example, the equation for theheight of a tennis ball that is dropped from your hand to the ground. Neglecting the forcesof friction imposed by the air on the ball, the ODE governing the height of the tennis ball isgiven by

d2y

dt2= −g , (26)

where y is the height of the tennis ball above the ground, and g = 9.81m/s2 is the accelerationdue to gravity. The general solution of this second order linear inhomogeneous ODE is givenby

y(t) = −1

2gt2 + at+ b , (27)

where a and b are unknown coefficients. In order to determine what a and b are, you mustspecify two initial conditions to determine the solution of its height with time. Theseinitial conditions are given by what you knew about the ball when you dropped it. That is,you released it from rest, and you released it from a certain height off the ground. Theseinitial conditions in mathematical form are given by

You dropped the ball from 1 m above the ground: y(t = 0) = 1 .

You dropped the ball from rest: v(t = 0) = y′(t = 0) = 0 .

Substituting these into the general solution (27), we have the equation for the height of theball as

y(t) = 1− 1

2gt2 . (28)

Handout 8 13/08/02 5

The fundamental rule for initial conditions is that the number of initial conditions mustequal the order of the highest derivative in the problem. Therefore, you need two initialconditions for a second order time-dependent ODE, while you need three boundary conditionsfor a third order space-dependent ODE.

Solution methods for first order ODEs

For more information see: http://www.math.hmc.edu/calculus/tutorials/odes/

Separable ODEs

An ODE is separable if it can be written in the form

f(x)dx = g(y)dy . (29)

We can then integrate both sides to find the solution y = h(x) (if we can do it analytically,of course). As an example, consider the first order ODE

dy

dx+ xy = 0 . (30)

This ODE is separable because it can be written as

dy

y= −x dx , (31)

which can be solved by integrating both sides to yield

y = ae−x2/2 , (32)

where a is a constant.

Integrating factor

Suppose we have a first order linear ODE of the form

dy

dx+ f(x)y = g(x) . (33)

If we multiply both sides by a new function h(x), we have

hdy

dx+ hfy = gh ,

Using the chain rule:d

dx(hy) = h

dy

dx+dh

dxy ,

d

dx(hy)− ydh

dx+ hfy = hg ,

d

dx(hy) + y

(hf − dh

dx

)= hg .

Handout 8 13/08/02 6

If we require that

hf − dh

dx= 0 , (34)

then we haved

dx(hy) = hg , (35)

and the solution is then given by

y =1

h(x)

∫g(x)h(x) dx . (36)

From equation (34), the integrating factor h(x) must be given by

h(x) = e∫f(x) dx . (37)

Change of variables

If a first order ODE cannot be separated but it can be written in the form

dy

dx= f(x, y) , (38)

where f(kx, ky) = f(x, y), then the change of variables z = y/x will make it separable. Thisis known as a homogeneous equation of order zero. As an example, consider the ODE

dy

dx=

y − xx− 4y

. (39)

To test if this is a homogeneous equation of order zero, let x = kx and y = ky,

f(kx, ky) =ky − kxkx− 4ky

=y − xx− 4y

= f(x, y) . (40)

Making the substitution z = y/x, or y = zx, we have

dy

dx=

zx− xx− 4zx

,

xdz

dx+ z =

zx− xx− 4zx

,

xdz

dx=

4z2 − 1

1− 4z. (41)

This equation is separable and the solution is given by

(2y + x)3(2y − x) = c . (42)

Handout 9 15/08/02 1

Lecture 7: Numerical solution of ODEs I

The model ODE

In this lecture we will be learning how to solve the first order ODE

dy

dt= f(t, y) . (1)

The reason we analyze such a simplified equation is because, as we saw in the previouslecture, all higher ODEs can be written in the form of a system of first order ODEs, whichwe write as

dy

dt= F(t,y) . (2)

These higher order systems can be solved using the same methods we develop to solve themodel ODE (1).

Forward and backward Euler: explicit vs. implicit methods

Discretization

The model ODE (1) is written discretely by choosing a time step (or space step) at whichwe would like to evaluate both sides of the equation. Let’s say we want to evaluate bothsides of (1) at time step n. In this case, the model ODE would be written as

dy

dt

∣∣∣∣∣n

= fn , (3)

where fn = f(tn, yn). So far the discretization is exact. We have not made any approxima-tions yet because we are assuming that we can evaluate everything exactly. If we approximatethe left hand side with the forward discrete derivative with

dy

dt

∣∣∣∣∣n

=yn+1 − yn

∆t+O (∆t) , (4)

then we have the first order accurate approximate to the model equation (1) as

yn+1 − yn∆t

= fn +O (∆t) , (5)

oryn+1 = yn + ∆tfn +O

(∆t2

). (6)

This equation is known as the forward Euler method because it uses the forward discretederivative in time to evaluate the left hand side. Since in order to evaluate yn+1, we useinformation from time step n, this is known as an explicit method.

Handout 9 15/08/02 2

If we choose to write the model equation (1) at time step n+ 1

dy

dt

∣∣∣∣∣n+1

= fn+1 , (7)

then this can be approximated using the backward discrete derivative to yield

yn+1 − yn∆t

= fn+1 +O (∆t) , (8)

oryn+1 = yn + ∆tfn+1 +O

(∆t2

). (9)

This is known as the backward Euler method because is uses the backward finite differenceto evaluate the first derivative. If you were to evaluate y at time step n + 1 you would seethat you need information at time step n + 1 in order to compute fn+1. When you needinformation at the next time step, the method is known as an implicit method.

An example

Let’s say you want to numerically determine the evolution of the ODE

dy

dt= y cos y , (10)

with y(0) = 1. If we use the forward Euler method, we have

yn+1 = yn + ∆tyn cos yn ,

yn+1 = yn(1 + ∆t cos yn) . (11)

We can easily obtain y1 if y0 is known because everything on the right hand side is knownexplicitly. If we use the backward Euler method, however, we have

yn+1 = yn + ∆tyn+1 cos yn+1 ,

yn+1 (1−∆t cos yn+1) = yn . (12)

Now, instead of having the solution of y1 in terms of y0, we have a horrendous nonlinearequation for y1 that must be solved using a nonlinear equation solver, such as Newton’smethod. Clearly, then, in this case, the explicit method is much faster than the implicitmethod because we do not have to iterate at every time step to find the solution. The nextsection shows the advantages of using implicit methods.

The linearized ODE

In the preceding example we saw how the forward Euler method was much easier and fasterto use than the backward Euler method. Any time something seems too good to be truein numerical methods, it really is too good to be true. Which leads us to the first law ofnumerical methods: There is no free lunch!. The problem with the forward Euler method,

Handout 9 15/08/02 3

despite its simplicity, is that it can be unstable, while the implicit backward Euler methodis unconditionally stable.

In order to study the stability of numerical methods for ODEs, we first need a modelequation that we can use to apply each method to and analyze its stability properties. Thismodel equation is the linear ODE

dy

dt= −λy , (13)

where λ is some characteristic value of the ODE that arises from assuming that the ODEbehaves in this linear manner. We need to do this because we would like to analyze thelinear stability characteristics of numerical methods applied to all ODEs in general. Takefor example, the ODE used in the previous example,

dy

dt= y cos y . (14)

In order to analyze the stability properties of this nonlinear ODE, we need to linearize it.When we linearize an ODE, we analyze its behavior in the vicinity of some point t0, y0 todetermine its stability properties. To analyze the behavior of an ODE in the vicinity of y0

and t0, we make the subsitution y = y0 + y′ and t = t0 + t′, and assume that y′ = y− y0 andt′ = t− t0 represent very small quantities. Substituting these values into equation (14), wehave

dy

dt=dt′

dt

d(y0 + y′)

dt′= (y0 + y′) cos(y0 + y′) . (15)

In order to linearize this, we need to use the Taylor Series approximation of the cosinefunction

cos(y0 + y′) = cos(y0)− y′ sin(y0) +O((y′)2

). (16)

Substitution into equation (15) yields

dy′

dt′+ (cos y0 − y0 sin y0) y′ = y0 cos y0 +O

((y′)2

). (17)

If we assume that y′ is very small, then the second order term is negligible, and we have

dy′

dt′+ (cos y0 − y0 sin y0) y′ = y0 cos y0 , (18)

which is a linear inhomogeneous ODE in terms of y′ and t′ that represents the behavior ofthe original nonlinear ODE in equation (14) in the vicinity of y0, t0. If we substitute backin the values for y′ = y − y0 and t′ = t− t0 we have

dy

dt+ (cos y0 − y0 sin y0) y = 2y0 cos y0 − y2

0 sin y0 . (19)

If we split the linearized solution into its homogeneous and particular parts with y = yh+yp,then the homogenous solution satisfies

dyhdt

= −λyh , (20)

where λ = (cos y0 − y0 sin y0). If we analyze the stability properties of this linearized ODE,then we can apply that analysis to the nonlinear problem by seeing if it remains stable atall values of t0 and y0.

Handout 9 15/08/02 4

Stability

If we apply the forward Euler method to the model linearized ODE

dy

dt= −λy , (21)

then we have

yn+1 = yn − hλyn ,= yn (1− hλ) , (22)

where h = ∆t. If we write the amplification factor at each time step as

Gn =

∣∣∣∣∣yn+1

yn

∣∣∣∣∣ , (23)

then, for the forward Euler method, we have

Gn = |1− hλ| , (24)

where the vertical bars imply the modulus, to account for the possibility that λ may notnecessarily be real. If the amplification is less than 1, then we are guaranteed that thesolution will not grow without bound, and hence it will be stable. If we assume that λ isreal, then for stability we must have

−1 < 1− hλ < +1 , (25)

which implies that, for stability, 0 < λh < 2, if λ is real. This translates to a time steprestriction for stability, for which 0 < ∆t < 2/λ.

Now consider the backward Euler method applied to the model linearized ODE. Thisyields

yn+1 = yn − hλyn+1 ,

(1 + hλ) yn+1 = yn , (26)

and the amplification factor is given by

Gn =∣∣∣∣

1

1 + hλ

∣∣∣∣ . (27)

If λ is real, then we must have λh > 0, or ∆t > 0. The backward Euler method is hencestable in the linear sense for all ∆t! While it may be more expensive to use the implicitmethod, as in the example discretization of equation (14), it is guaranteed to be stable.

The greatest drawback to the Euler methods is that they are first order accurate. In thenext sections, we derive more accurate methods to solve ODEs.

Handout 9 15/08/02 5

Euler predictor-corrector method

The improved Euler method is derived by integrating the model ODE from tn to tn+1 toobtain ∫ tn+1

tn

dy

dtdt =

∫ tn+1

tnf(y) dt . (28)

Using the trapezoidal rule, we can approximate the above integral to third order accuracywith

yn+1 − yn =∆t

2(fn + fn+1) +O

(∆t3

). (29)

to obtain the second order accurate approximation to the model ODE as

yn+1 − yn∆t

=1

2(fn + fn+1) +O

(∆t2

). (30)

As it is, this method is an implicit method because we need information at time step n+ 1in order to evaluate the right hand side. Instead of using fn+1, we will use a predicted value,f∗ = f(y∗), where y∗ is obtained with the forward Euler predictor step

y∗ = yn + ∆tfn . (31)

The Euler predictor-corrector method is then given in two steps:

Predictor: y∗ = yn + ∆tfn +O(∆t2

),

Corrector: yn+1 = yn +∆t

2(fn + f∗) +O

(∆t3

). (32)

This method is second order accurate, since y∗ approximates yn+1 to second order accuracy.Substituting y∗ = yn+1 +O (∆t2) into f∗ yields

f∗ = f(y∗) ,

= f(yn+1 +O(∆t2

)) ,

= f(yn+1) +O(∆t3

).

Substituting this result into the corrector yields

yn+1 − yn∆t

=1

2(fn + fn+1) +O

(∆t2

), (33)

which is identical in accuracy to equation (30).

Runge-Kutta methods

The Runge-Kutta methods are the most popular methods of solving ODEs numerically.They can be derived for any order of accuracy, but we will derive the second order methodfirst. The second order Runge-Kutta method is derived by taking two steps to get from nto n+ 1 with

yn+1 = yn + ak1 + bk2 ,

k1 = hf(tn, yn) ,

k2 = hf(tn + αh, yn + βk1) , (34)

Handout 9 15/08/02 6

where h = ∆t is the time step. In order to determine what the constants a, b, α, and βare, we must use the Taylor series to match the terms and make the method second orderaccurate. By substituting in for k1 and k2, we have

yn+1 = yn + ahf(tn, yn) + bhf [tn + αh, yn + βhf(tn, yn)] . (35)

In order to expand the third term in equation (35), we need to use the Taylor series expansionof a function of more than one variable, which is given by

f(t+ ∆t, y + ∆y) = f(t, y) + ∆t∂f

∂t+ ∆y

∂f

∂y+O (∆t∆y) , (36)

which, when applied to the third term in equation (35), results in

f [tn + αh, yn + βhf(tn, yn)] = f + αh∂f

∂t+ βhf

∂f

∂y, (37)

where all functions and derivatives are evaluated at time step n, and we have left off thetruncation error. Substituting this into equation (35) results in

yn+1 = yn + h(a+ b)f + αbh2∂f

∂t+ βbh2f

∂f

∂y. (38)

Since y is only dependant on the variable t, then the Taylor series expansion about yn+1 isgiven by the ordinary derivatives with

yn+1 = yn + hdy

dt+h2

2

d2y

dt2+O

(h3). (39)

But since the ODE we are trying to solve is given by

dy

dt= f , (40)

then we know thatd2y

dt2=df

dt, (41)

so equation (39) becomes

yn+1 = yn + hf +h2

2

df

dt, (42)

where we have left off the truncation error. Since from the chain rule, if f is a function of tand y, then

df =∂f

∂tdt+

∂f

∂ydy , (43)

thendf

dt=∂f

∂t+∂f

∂y

dy

dt=∂f

∂t+∂f

∂yf . (44)


yn+1 = yn + hf +h2

2

∂f

∂t+h2

2f∂f

∂y. (45)

Handout 9 15/08/02 7

Comparing this equation to equation (38),

yn+1 = yn + hf +h2

2

∂f

∂t+h2

2f∂f

∂y,

yn+1 = yn + h(a+ b)f + αbh2∂f

∂t+ βbh2f

∂f

∂y, (46)

in order for the terms to match, we must have

a+ b = 1 ,

αb =1

2,

βb =1

2. (47)

This is a system of three equations in four unknowns. Therefore, we are free to choose oneindependantly and the others will then be determined, and the method will still be a secondorder method. If we let a = 1/2, then the other parameters must be b = 1/2, α = 1, andβ = 1, so that the second order Runge-Kutta method is given by

yn+1 = yn +1

2k1 +

1

2k2 ,

k1 = hf(tn, yn) ,

k2 = hf(tn + h, yn + hf(tn, yn)) , (48)

which is just the Euler predictor-corrector scheme, since k2 = hf∗, and substitution resultsin

yn+1 = yn +h

2(fn + f∗) . (49)

Higher order Runge-Kutta methods can be derived using the same technique. The mostpopular method is the fourth order Runge-Kutta method, or RK4 method, which is givenby

yn+1 = yn +1

6(k1 + 2k2 + 2k3 + k4) ,

k1 = hf(tn, yn) ,

k2 = hf

(tn +

h

2, yn +

1

2k1

),

k3 = hf

(tn +

h

2, yn +

1

2k2

),

k4 = hf (tn + h, yn + k3) .

Although this method is a fourth order accurate approximation to the model ODE, it requiresfour function evaluations at each time step. Again, there is never any free lunch!

Handout 11 27/08/02 1

Lecture 9: Numerical solution of boundary valueproblems

Initial vs. boundary value problems

In lectures 7 and 8 we discussed numerical solution techniques for initial value problems.Those concerned solutions of ordinary differential equations of the form

dy

dt= f(t,y) , (1)

where initial conditions were imposed at the same locations, most likely t = 0 in time, ofthe form

y(0) = y0 . (2)

That is, every initial value of the elements of y is specified at the same location in time.An example of an initial value problem is given by the second order ODE

d2y

dt2+ g = 0 , (3)

with initial conditions y(0) = y0 and y(0) = 0. This is written in vector form as

dy

dt+ f(t,y) = 0 , (4)

where

y =

(y1

y2

)=

(yy

), (5)

and

f(t,y) =

(−y2

g

), (6)

with initial conditions

y(0) =

(y0

0

). (7)

The difference between initial and boundary value problems is that rather than initialconditions being imposed at the same point in the independent variable (in this case, t),boundary conditions are imposed at different values of the independent variable. As anexample of a boundary value problem, consider the second order ODE

d2y

dx2+ λ2y = 0 , (8)

with boundary conditions given by y(0) = 0 and y(1) = 1. This problem cannot be solvedusing the methods we learned for the initial value problems because the two conditionsimposed on the problem are not at coincident locations of the independent variable x.

Handout 11 27/08/02 2

Boundary condition types

Dirichlet condition (Value specified)

When the value is specified at a particular location of the independent variable, this is knownas a Dirichlet boundary condition. Examples of a Dirichlet boundary condition are given by

y(0) = a , (9)

ory(b) = 2 . (10)

Neumann condition (Derivative specified)

If the derivative is specified, then this is known as a Neumann boundary condition. Examplesof Neumann conditions are given by

y′(0) = 1 , (11)

andy′(a) = b . (12)

Mixed condition (Gradient + value)

When the boundary condition specifies an equation that involves both a value and thederivative, it is known as a mixed condition. Examples are given by

y′(a) + λy(a) = 0 , (13)

andy′(0) = 2y(0) . (14)

The shooting method

The shooting method uses the methods developed for solving initial value problems to solveboundary value problems. The idea is to write the boundary value problem in vector formand begin the solution at one end of the boundary value problem, and “shoot” to the otherend with an initial value solver until the boundary condition at the other end converges toits correct value.

The vector form of the boundary value problem is written in the same way as it was forthe initial value problems, except all of the initial conditions are not known a-priori. As anexample, take the boundary value problem

d2y

dx2+ λ2y = 0 , (15)

with boundary conditions y(0) = 0 and y(1) = 1. In vector form, this is given by

dy

dx+ f(x,y) = 0 , (16)

Handout 11 27/08/02 3

where

y =

(y1

y2

)=

(yy′

), (17)

and

f(x,y) =

(−y2

λ2y1

). (18)

All of the elements of the boundary condition vectors are not known initially, because certaincomponents will depend on the solution of the problem. Since we are only given y(0) andy(1), then the boundary condition vectors are given by

y(0) =

(0?

),y(1) =

(1?

). (19)

We leave question marks in place of the unknown boundary conditions because they will onlybe known when we actually solve the problem. In this case, we will only know the values ofy′(0) and y′(1) when we have the solution to the boundary value problem (15).

As another example, suppose we want to express the boundary value problem

yxxxx + ayxx = 0 , (20)

with boundary conditions y(0) = 0, y′(0) = 1, y(1) = 0, and y′(0) = −1 in vector form.Because this is a fourth order ODE, we know that it has four elements in the y vector, andas a result, it has the four given boundary conditions. The y vector is given by

y =

y1

y2

y3

y4

=

yyxyxxyxxx

, (21)

and the boundary value problem is given by

dy

dx+ f(x,y) = 0 , (22)

where

f(x,y) =

−y2

−y3

−y4

ay3

, (23)

with boundary conditions

f(0) =

01??

, f(1) =

0−1??

. (24)

Because we are only given four boundary conditions, the other values of the derivatives atthe boundary are determined after a solution of the problem is found.

The best way to illustrate the shooting method is with an example.

Handout 11 27/08/02 4

An example of the shooting method

Find the solution of the boundary value problem

d2y

dx2− y = 0 , (25)

with boundary conditions y(0) = 0, y′(1) = −1.

1: Write the BVP in vector form

In order to solve this problem numerically, we write it in its vector form as

dy

dx+ f(x,y) = 0 , (26)

where

y =

(y1

y2

)=

(yyx

), (27)

and

f(x,y) =

(−y2

−y1

), (28)

with boundary conditions

y(0) =

(0?

),y(1) =

(?−1

). (29)

2: Discretize

The problem is first discretized into N points, the number of which depends on the desiredaccuracy of the solution. We will use N = 20 for this example and assume that this yieldsa converged result. The independent variable x is discretized with xi = i∆x, with ∆x =L/(N−1), where L = 1 is the size of the domain. Sometimes we might need to discretize thegrid with an unequispaced grid if the terms in the boundary value problem vary considerablyin some locations of the domain in which we are solving the problem. Since this problem islinear and behaves smoothly, we do not need to worry about this.

3: Choose an integrator

For this problem we will use the Euler predictor-corrector algorithm, which will give us valuesfor y1 and y2 in the domain if we give it starting values y1(0) and y2(0). But only y1(0) isspecified, so we need to iterate to determine y2(0).

4: Iterate to find the solution

This is the trickiest part of the problem. Because the only boundary condition at x = 0 isy1(0) = 0, then we need to guess the value of y2(0) and use the predictor-corrector algorithmto shoot to the other end of the domain and see if this guess satisfies the boundary condition

Handout 11 27/08/02 5

Figure 1: Results of the shooting method with a guess of y2(0) = 1 which yields y2(1) =1.542566.

y2(1) = −1. Let’s say we guess a value of y2(0) = 1. The predictor-corrector algorithm willyield the result shown in Figure 1. Because y2(1) = 1.542566 does not match the correctvalue of y2(1) = −1 (which is specified as a boundary condition), then we need to try again.Let’s try y2(0) = −1. Using this guess, the predictor-corrector shooting method yields theresult shown in Figure 2. Again, this is the incorrect answer since a guess of y2(0) = −1

Figure 2: Results of the shooting method with a guess of y2(0) = −1 which yields y2(1) =−1.542566.

yields y2(1) = −1.542566.The shooting method gives us a value for y2(1) when we are given a value for y2(0). That

is, if we guess the slope yx at x = 0, then the shooting method will give us a value for the

Handout 11 27/08/02 6

slope yx at x = 1, which is specified as a boundary condition in the problem as y′(1) = −1.To solve the boundary value problem, we need to iterate with different values of y2(0) untilwe converge upon the correct value of y2(1). This can be done with a root-finder such as thebisection method, the secant method, or linear interpolation. The table below depicts theresults of the two previous guesses we used to solve the initial value problem. We can use

Guess number Guess for y2(0) Result of shooting method y2(1)1 1.0 1.5425662 -1.0 -1.542566

the Secant method to find a good value for the next guess. If we let s be the guess for y2(0)and E(s) = y2(1)− y′(1) be the error in the result of the shooting method, then we need touse the Secant method to find the root of

E(s) = 0 . (30)

This is done by using the formula for the secant method, which is given by

s3 = s2 − E(s2)

(s1 − s2

E(s1)− E(s2)

). (31)

Using the results from the table above, we have

E(s1) = 1.542566− (−1) = 2.542566 ,

E(s2) = −1.542566− (−1) = −0.542566 ,

and

s3 = −1.0− (−0.542566)

(1.0− (−1.0)

2.542566− (−0.542566)

)= −0.648270 . (32)

If we use y2(0) = −0.648270, then the result is shown in Figure 3. As shown in thefigure, when we use a gues of y2(0) = −0.64827, we end up with a slope at x = 1 ofy2(1) = −0.999999, which is the exact value (or close enough)! The result in Figure 3 istherefore the solution of the boundary value problem, which is y = −sinh(x)/cosh(1). Fromthis we can see that the shooting method only requires us to shoot for the result threetimes for linear boundary value problems. Two guesses are required, and then a linearinterpolation yields the solution to within the errors of the method used to integrate theODE. In this case, since the Euler predictor-corrector method is second-order accurate in∆x, then we know that we must have the solution to the boundary value problem to withinO (∆x2).

Only three steps are required to find the solution for linear problems, and the accuracy ofthe result is governed by the accuracy of the shooting method used. For nonlinear problems,however, more iterations are required, and one must continue to integrate until the residualerror in the root of E(s) is below some specified tolerance. If the tolerance is less than theerror of the shooting method, then the error in the solution of the boundary value problemwill be governed by the shooting method.

Handout 11 27/08/02 7

Figure 3: Results of the shooting method with a guess of y2(0) = −0.648270 which yieldsy2(1) = −0.999999.

The finite-difference method

The boundary value problem is given by

d2y

dx2− y = 0 , (33)

with boundary conditions y(0) = 0, y′(1) = −1. In order to solve this boundary valueproblem with the finite difference method, the following steps should be taken.

1: Discretize x

The discretization of the boundary value problem for the finite-difference method is donedifferently than for the shooting method. In order to guarantee second order accuracy of theNeumann (derivative) boundary condition at x = 1 (and lead to a tridiagonal system as instep 5), the grid must be staggered about that boundary. That is, the x values must lieon either side of the point x = 1. In order to stagger the grid, a discretization of x with Npoints must be given by

xi =(i− 3

2

)∆x , (Neumann boundary conditions) (34)

with ∆x = 1/(N − 2). This is the discretization we will use for the current problem, sinceit has a Neumann boundary condition.

As an aside, if the problem only consists of Dirichlet boundary conditions, then it is betterto collocate the x values with the boundaries. In this case it is best to use the discretization

xi = i∆x , (Dirichlet boundary conditions) (35)

with ∆x = 1/(N − 1).

Handout 11 27/08/02 8

2: Discretize the governing ODE

The governing ODE for this problem can be discretized by rewriting it as a finite differenceequation at each point xi, for which

d2y

dx2

∣∣∣∣∣i

− yi = 0 i = {2, . . . , N − 1} . (36)

The second order accurate finite difference approximation is then given by

yi−1 − 2yi + yi+1

∆x2− yi = 0 , (37)

which can be rewritten asaiyi−1 + biyi + ciyi+1 = di , (38)

where

ai =1

∆x2,

bi = −(

1 +2

∆x2

),

ci =1

∆x2,

di = 0 .

We have neglected the discretization error, keeping in mind that the discretization is secondorder accurate in ∆x, and we will assume that di 6= 0 and that the coefficients are notconstant with i to be as general as possible. These equations are only valid for i ∈ {2, . . . , N−1} since the discrete second derivative is not defined at i = 1 or i = N as we have written it.

3: Discretize the boundary conditions

Just as the governing ODE is discretized, so must the boundary conditions. The boundarycondition at x = 0 is given by y(0) = 0. Because the grid we are using is staggered, wedo not have values at x = 0, but rather, we have values at x1 = −∆x/2 and x2 = +∆x/2.Therefore, the value at x = 0 must be interpolated with the values at y1 and y2. This isgiven by a centered interpolation to obtain y3/2 as

y3/2 =y1 + y2

2+O

(∆x2

)= 0 . (39)

Solving for y1 and neglecting the discretization error, we have

y1 = −y2 . (40)

The boundary condition at x = 1 is discretized by writing the second-order accurateapproximation for the first derivative at x = 1 to obtain

dy

dx

∣∣∣∣∣i=N−1/2

=yN − yN−1

∆x+O

(∆x2

)= −1 . (41)

Leaving out the discretization error, we have

yN = yN−1 −∆x . (42)

Handout 11 27/08/02 9

4: Embed the boundary conditions

The discretized ODE (38) is only valid for i ∈ {2, . . . , N −1}. Therefore, it can only be usedto solve for points in that range. Any terms in the discretized ODE that contain points notin that range are removed by embedding the boundary conditions. If we write the discretizedODE at i = 2 and i = N − 2 we have

a2y1 + b2y2 + c2y3 = d2 ,aN−1yN−2 + bN−1yN−1 + cN−1yN−1 = dN−1 .

(43)

From the boundary conditions, we know that

y1 = −y2 ,

yN = yN−1 −∆x .

Substituting the boundary conditions into equations (43), we have

(b2 − a2)y2 + c2y3 = d2 ,aN−1yN−2 + (bN−1 − cN−1)yN−1 = dN−1 + cN−1∆x .

(44)

5: Set up the linear system

The discretized set of ODEs that govern the behavior of yi where i ∈ {2, . . . , N − 1} is thengiven by

i = 2 (b2 − a2)y2 + c2y3 = d2 ,i = {3, . . . N − 2} aiyi−i + biyi + ciyi+1 = di

i = N − 1 aN−1yN−2 + (bN−1 − cN−1)yN−1 = dN−1 + cN−1∆x .

This represents a linear system of the form

b2 c2

a3 b3 c3

a4 b4 c4

. . . . . . . . .

aN−3 bN−3 cN−3

aN−2 bN−2 cN−2

aN−1 bN−1

y2

y3

y4...

yN−3

yN−2

yN−1

=

d2

d3

d4...

dN−3

dN−2

dN−1

,

where we have performed the replacements

b2 ← b2 − a2 ,

bN−1 ← bN−1 + cN−1 ,

dN−1 ← dN−1 + cN−1∆x . (45)

Handout 11 27/08/02 10

6: Solve the linear system

The linear system derived in the previous step can be represented as

Ay = d . (46)

The objective is to now solve the system with

y = A−1d . (47)

We can usually take advantage of the structure of A in order to speed up the calculationof its inverse. In this case it turns out that A is a tridiagonal matrix. That is, it has threediagonals, and as a result, it can be solved with the use of a tridiagonal solver.

The solution y then represents the solution of the boundary value problem we initiallyset out to solve. Due to the accumulation of errors in the tridiagonal solver, this methodturns out to be first-order accurate in ∆x, as opposed to the second-order accurate shootingmethod with the use of the Euler predictor-corrector method.

Handout 12 29/08/02 1

Lecture 10: Numerical solution of characteristic-valueproblems

A characteristic-value problem

In lecture 9 we covered two methods on how to solve boundary value problems. Thoseconcerned the solution of ODEs with conditions imposed at different values of the indepen-dent variable. Now we will cover the numerical solution of characteristic-value problems.Characteristic-value problems are a subset of boundary value problems because the govern-ing ODE is a boundary value problem, except in characteristic-value problems we are alsoconcerned with finding the characteristic-value, or eigenvalue, that governs the behavior ofthe boundary value problem.

Consider the boundary value problem

d2y

dx2+ k2y = 0 , (1)

with boundary conditions y(0) = 0 and y(1) = 0. The value of k is not known a-priori, andit represents the characteristic value, or eigenvalue, of the problem. This characteristic valueproblem has an analytical solution, which is given by

y(x) = a cos(kx) + b sin(kx) . (2)

Substituting in the boundary conditions, we have

y(0) = a = 0 ,

y(1) = a cos(k) + b sin(k) = 0 .

The trivial solution is a = b = 0, but this is not a very useful result. A more useful result isto set

sin(k) = 0 , (3)

which is satisfied whenk = ±nπ n = 1, 2, . . . . (4)

The general solution to the problem is then given by

y(x) = b sin(nπx) n = 1, 2, . . . . (5)

These solutions are known as the eigenfunctions of the boundary value problem and k isknown as the eigenvalue. It is common to write the solution as

yn(x) = b sin(knx) kn = nπ n = 1, 2, . . . , (6)

where yn(x) is referred to as the nth eigenfunction with corresponding eigenvalue kn.In this lecture we will be concerned with solution methods for finding the eigenvalue of

the characteristic-value problem.

Handout 12 29/08/02 2

The characteristic-value problem in matrix form

Just as we did in lecture 9, we will discretize the characteristic-value problem using thesecond order finite difference representation of the second derivative to yield

yi−1 − 2yi + yi+1

∆x2+ k2yi = 0 , (7)

with y1 = 0 and yN = 0 as the boundary conditions. If we discretize the domain with N = 5points, then we have ∆x = 1/(N − 1) = 0.25, and the equations for yi, i = 2, . . . , 4 are givenby

16y1 − 32y2 + 16y3 + k2y2 = 0 ,

16y2 − 32y3 + 16y4 + k2y3 = 0 ,

16y3 − 32y4 + 16y5 + k2y4 = 0 ,

but since we know that y1 = 0 and y5 = 0, we have

−32y2 + 16y3 + k2y2 = 0 ,

16y2 − 32y3 + 16y4 + k2y3 = 0 ,

16y3 − 32y4 + k2y4 = 0 ,

This can be written in matrix-vector form as

32 −16 0−16 32 −16

0 −16 32

y2

y3

y4

− λ

y2

y3

y4

=

000

, (8)

where λ = k2. If we let

A =

32 −16 0−16 32 −16

0 −16 32

, (9)

then the problem can be written in the form

(A− λI)y = 0 , (10)

where

y =

y2

y3

y4

, (11)

and the identity matrix is given by

I =

1 0 00 1 00 0 1

. (12)

Equation (10) is known as an eigenvalue problem, in which we must find the values of λthat satisfy the nontrivial solution for which y 6= 0. The nontrivial solution is determinedby finding the eigenvalues λ of A, which are given by a solution of

det(A− λI) = 0 . (13)

Handout 12 29/08/02 3

The eigenvalues are then given by a solution of the third-order polynomial

(32− λ)[(32− λ)2 − 512

]= 0 , (14)

whose roots are given by

λ1 = 9.37 ,

λ2 = 32.00 ,

λ3 = 54.63 ,

or, in terms of k = ±λ1/2,

k1 = ±3.06 ,

k2 = ±5.66 ,

k3 = ±7.39 .

These are close to the analytical values of

k1 = ±π = ±3.14 ,

k2 = ±2π = ±6.28 ,

k3 = ±3π = ±9.42 ,

but because we only discretized the problem with N = 5 points, the errors are considerablylarge. The errors can be reduced by using more points, but as N gets very large, it becomesmuch too difficult to solve for the eigenvalues. Therefore, we need to come up with othermethods to solve for the eigenvalues and eigenvectors of A that are more efficient.

The power method

The power method computes the largest eigenvalue and its corresponding eigenvector in thefollowing manner:

Pseudocode for the power method

1. Choose a starting vector x = [1, 1, 1]T . Set the starting eigenvalue as λ = 1.

2. Approximate the eigenvector with x ← Ax. The approximate eigenvalue is λ∗ =max(x).

3. Normalize with x← x/λ∗.

4. If |λ∗−λ| < ε then finished, x is the eigenvector and λ is its corresponding eigenvalue.Otherwise set λ = λ∗ and return to step 2.

Handout 12 29/08/02 4

As an example, consider the matrix

A =

32 −16 0−16 32 −16

0 −16 32

. (15)

If we start with x = [1, 1, 1]T , then the first iteration will give us

x = Ax =

32 −16 0−16 32 −16

0 −16 32

111

=

16016

, (16)

which gives us our first approximate eigenvalue as λ∗ = 16, and normalizes the eigenvectoras x = [1, 0, 1]T . Iterating again with this eigenvector, the second guess for the eigenvectoris given by

x = Ax =

32 −16 0−16 32 −16

0 −16 32

101

=

32−3232

, (17)

which gives us our second guess of the eigenvalue as λ∗ = 32, and normalizes the eigenvectoras x = [1,−1, 1]T . Subsequent iterations yield eigenvectors of

x =

48−6448

,

53.33−74.6753.33

,

54.40−76.8054.40

, . . . , (18)

and we see that we are converging upon the correct eigenvalue of λ = 54.63 and the corre-sponding eigenvector x = [1,−1.4142, 1]T .

We can also find the smallest eigenvalue of A by using the power method with the inverseA−1. The advantage of the power method is that it is very simple and easy to program.However, it has poor convergence characteristics and does not behave well for matrices withrepeated eigenvalues. There are a host of other methods available to compute the eigenvaluesof a matrix, but they are beyond the scope of this course. Both Octave and Matlab can beused to compute the eigenvalues of a matrix with

>> [v,d]=eig(A)

where d is a diagonal matrix containing the eigenvalues of A and v is a matrix whose columnsare the eigenvectors of A.

Handout 13 03/09/02 1

Lecture 11: Numerical solution of stochastic differentialequations I

Continuous random variables

In terms of its probability density function p(x), a continuous random variable X will lie inthe range [a, b] with a probability given by

P (a < X ≤ b) =∫ b

ap(x) dx , (1)

where, by definition, ∫ ∞

−∞p(x) dx = 1 . (2)

In terms of its cumulative distribution function F (x), the random variable X will lie in somerange [a, b] with a probability given by

P (a < X ≤ b) = F (b)− F (a) , (3)

The probability density function and the cumulative distribution function of a random vari-able X are related with

p(x) =d

dxF (x) . (4)

From this we can see that

∫ a

−∞p(x) dx =

∫ a

−∞

d

dxF (x) dx ,

∫ a

−∞p(x) dx = F (a)− lim

b→−∞F (b) , (5)

and we can define the cumulative distribution function as

F (a) =∫ a

−∞p(x) dx . (6)

Examples

If X is uniformly distributed on [0, 1], then the probability density function (PDF) of X isgiven by

p(x) =

{1 0 ≤ x ≤ 10 otherwise

. (7)

As a check, ∫ ∞

−∞p(x) dx =

∫ 1

01 dx = 1 . (8)

Handout 13 03/09/02 2

the cumulative distribution function is then given by

F (a) =

0 a < 0a 0 ≤ a < 11 1 ≤ a

(9)

If X is normally distributed with mean 0 and variance 1, that is, if X ∼ N(0; 1), thenthe PDF of X is given by

p(x) =1√2πe−x

2/2 . (10)

The cumulative distribution function is then given by

F (a) =1√2π

∫ a

−∞e−x

2/2 dx . (11)

Discrete random variables

If X is a discrete random variable, then it may only have a finite number of possible valuesat locations xi, so that

pi = P (X = xi) , (12)

and, because X is normalized, we must have

N∑

i=1

pi = 1 , (13)

where N is the number of possible values of X. As an example, a two-point distribution ofX is given by

p1 = P (X = −1) = 1/2 ,

p2 = P (X = +1) = 1/2 .

This is normalized since p1 + p2 = 1.

Moments of random variables

The pth moment of a continuous random variable X is given by

E(Xp) =∫ ∞

−∞xpp(x) dx , (14)

and the pth moment of a discrete random variable X is given by

E(Xp) =N∑

i=1

xpi pi . (15)

The mean or expected value of a continuous random variable is its first moment

µ = E(X) =∫ ∞

−∞xp(x) dx , (16)

Handout 13 03/09/02 3

which is also referred to as “expectation x”. The pth centered moment of X is given by

E((X − µ)p) =∫ ∞

−∞(x− µ)pp(x) dx . (17)

The first centered moment is always 0 by definition, since

E((X − µ)) =∫ ∞

−∞(x− µ)p(x) dx ,

=∫ ∞

−∞xp(x) dx− µ

∫ ∞

−∞p(x) dx ,

= µ− µ · 1 = 0 . (18)

The variance of a continuous random variable X is its second centered moment, and isdefined by

Var(X) = E((X − µ)2) = σ2 . (19)

Expanding the second centered moment yields

E((X − µ)2) =∫ ∞

−∞(x− µ)2p(x) dx ,

=∫ ∞

−∞(x2 − 2µx+ µ2)p(x) dx ,

=∫ ∞

−∞x2p(x) dx− 2µ

∫ ∞

−∞xp(x) dx+ µ2

∫ ∞

−∞p(x) dx ,

= E(X2)− 2µ2∫ ∞

−∞xp(x) dx+ µ2 ,

= E(X2)− µ2 , (20)

so thatVar(X) = E((X − µ)2) = σ2 = E(X2)− µ2 . (21)

Examples

If X is uniformly distributed such that

p(x) =

{1 0 ≤ x ≤ 10 otherwise

, (22)

then the expected value of X is given by

µ =∫ 0

−∞x · 0 dx+

∫ 1

0x · 1 dx+

∫ ∞

1x · 0 dx ,

=x2

2

∣∣∣∣∣

1

0

,

=1

2. (23)

Handout 13 03/09/02 4

The variance of X is then given by

Var(X) = E(X2)− µ2 ,

=∫ ∞

−∞x2p(x) dx− µ2 ,

=∫ 0

−∞x2 · 0 dx+

∫ 1

0x2 · 1 dx+

∫ ∞

1x2 · 0 dx− 1

4,

=x3

3

∣∣∣∣∣

1

0

− 1

4,

=1

12.

If X is normally distributed such that X ∼ N(µ;σ), then the PDF is given by

p(x) =1√

2πσ2e−

(x−µ)2

2σ2 . (24)

The expected value and variance are given by

E(X) = µ ,

Var(X) = σ2 .

Two or more random variables

If X1 and X2 are continuously distributed random variables with a PDF p(x1, x2) then ingeneral,

E(X1 +X2) = E(X1) + E(X2) ,

Var(X1 +X2) 6= Var(X1) + Var(X2) .

The covariance function is given by

Cov(X1, X2) = E ((X1 − µ1)(X2 − µ2)) ,

=∫ ∞

−∞

∫ ∞

−∞(x1 − µ1)(x2 − µ2)p(x1, x2) dx1 dx2 ,

=∫ ∞

−∞

∫ ∞

−∞(x1x2 − µ1x2 − µ2x1 + µ1µ2)p(x1, x2) dx1 dx2 ,

=∫ ∞

−∞

∫ ∞

−∞x1x2 dx1 dx2 − µ1

∫ ∞

−∞

∫ ∞

−∞x2p(x1, x2) dx1 dx2

−µ2

∫ ∞

−∞

∫ ∞

−∞x1p(x1, x2) dx1 dx2 + µ1µ2

∫ ∞

−∞

∫ ∞

−∞p(x1, x2) dx1 dx2 ,

= E(X1X2)− µ2µ1 − µ1µ2 + µ1µ2 ,

= E(X1X2)− µ1µ2 .

If X1 and X2 are independant then we can write p(x1, x2) = p1(x1)p2(x2), and

E(X1X2) = E(X1)E(X2) ,

Var(X1 +X2) = Var(X1) + Var(X2) . (25)

Handout 14 05/09/02 1

Lecture 12: Numerical solution of stochastic differentialequations II

Brownian Motion

The most fundamental example of a stochastic process is termed standard Brownian motionor a standard Wiener process. Brownian motion was discovered by botanist Robert Brownin 1827, when he observed the motion of a pollen grain as it moved randomly in a glass ofwater. Because the water molecules collide with the pollen grain in a random fashion, thepollen grain moves about randomly. The motion of the pollen grain is stochastic becauseits position from one point in time to the next can only be defined in terms of a probabilitydensity function.

Consider a pollen grain in a one-dimensional glass of water that is only allowed to movein the vertical z-direction. At one point in time tn its position is given by zn, and at thenext point in time, say at tn+1, its position is given by zn+1. Depending on how many watermolecules collide with the pollen grain, it may move a very large amount, or it may notmove at all. It turns out that even though we can’t determine how far the pollen grainmoves over a period of time, we can say that the distance it moves is normally distributedwith mean 0 and variance 1, as shown in Figure 1, where the units are arbitrary (they couldbe micrometers, or 10−6 meters, in this example). If the distance it moves from time tn to

Figure 1: Depiction of the position zn and the likely next position zn+1 of a pollen grainundergoing Brownian motion, showing the normal distribution at tn+1.

tn+1 is given by ∆zn, then we can write

zn+1 = zn + ∆zn . (1)

Handout 14 05/09/02 2

If the distance it moves is normally distributed with mean 0 and variance 1, then we canwrite

∆zn ∼ N(0, 1) , (2)

and from this we know that

E(∆zn) = 0 ,

Var(∆zn) = E(∆z2n)− µ2 = E(∆z2

n) = 1 .

Beginning with z1 = 0, we can run a simulation of a pollen track by using the random numbergeneration capabilities of any programming language. In Matlab or Octave, for example, therandn routine returns a normally distributed random number with mean 0 and variance 1.Over 500 time steps, the particle track of a pollen grain is shown in Figure 2. If we repeat

Figure 2: Particle track zn of a pollen grain undergoing Brownian motion over 500 time stepsfrom t1 = 0 to t500 = 499.

the simulation for 100 particles that are all released at z = 0 and assume that they do notinteract with one another, then their resultant tracks are shown in Figure 3. From thisfigure we can see that the particles are more likely to end up closer to where they startedfrom then very far away. To get a better idea of the development of the particle distributions,we can run the simulation with 10 000 particles and plot probability density functions of theparticle distributions at different points in time, as shown in Figure 4. Rather than lookat the PDFs, we can simply look at the standard deviation of the particle distributions as afunction of time. Figure 5 depicts the standard deviation ±σ as a function of time as well asthe theoretical prediction σ = ±

√t for 100 particles, and Figure 6 depicts those for 10 000

particles.

Handout 14 05/09/02 3

Figure 3: Particle tracks zn of 100 pollen grains undergoing Brownian motion over 500 timesteps from t1 = 0 to t500 = 499.

Figure 4: Probability density functions of the particle distributions at different points intime for a total of 10 000 particles.

These figures show us that, in the limit of an infinite number of particles, the standarddeviation of the distribution of particles is given by

σ(t) =√t . (3)

Handout 14 05/09/02 4

Figure 5: Standard deviation σ as a function of time for the particle distributions with 100particles.

Figure 6: Standard deviation σ as a function of time for the particle distributions with 10 000particles.

If we designate the start time of this simulation as t = s, then we know that the standarddeviation grows according to

σ(t) =√t− s . (4)

Handout 14 05/09/02 5

We can see the behavior of the standard deviation with time by looking at the PDFs ofthe particle distributions in Figure 4. Each PDF gets wider and shorter as the simulationprogresses, indicating that the PDFs are normal distributions centered with mean 0 andvariance σ2(t) = t− s. The PDF can then be written as a function of time and space as

p(z, t) =1√

2πσ2(t)e− z2

2σ2(t) . (5)

In the previous simulations we used a time step of ∆t = 1. After this amount of timewe assumed that the particle moved according to a normal distribution with mean 0 andvariance 1, which was represented with

∆zn ∼ N(0, 1) . (6)

Now suppose that the time step is much larger, and say that it is given by ∆t = t−s. Becausethe standard deviation grows with time according to σ =

√t− s, then the likelihood of a

particle being farther away is higher because more time passed. This is accounted for bywriting

∆zn ∼√t− sN(0, 1) , (7)

and as a result we know that

E(∆zn) = 0 ,

Var(∆zn) = E((√t− sN(0, 1))2)− µ2 = (t− s)E(N(0, 1)2) = t− s .

We can then run a particle simulation with 1 time step to see what the distribution ofparticles will be like at any point in time, since we can write the position z of a particle aftera certain amount of time t− s as

z1 = z0 +√t− sN(0, 1) . (8)

This forms the basis for the more formal definition of Brownian motion or a Weiner process.

Standard Brownian Motion or Standard Weiner Process

Standard Brownian motion or a standard Wiener process governs the behavior of the randomvariable W (t) in continuous time 0 ≤ t ≤ T , that satisfies the following conditions (from[1]):

1. W (0) = 0 with probability 1.

2. If 0 < s < t < T , then the random variable ∆W = W (t)−W (s) is normally distributedwith mean 0 and variance t− s, and satisfies

∆W ∼√t− sN(0, 1) . (9)

3. If 0 < s < t < u < v < T and ∆W1 = W (t) −W (s) and ∆W2 = W (v) −W (u), then∆W1 and ∆W2 are independent.

Handout 14 05/09/02 6

The motion of the pollen grain in a glass of water can be written in this standard Browniannotation by letting the position of the pollen grain at time tn be Wn, and say that the pollengrain will move according to

Wn+1 = Wn + ∆Wn , (10)

where, from the first condition, we must have W1 = 0, and the increment is given by

∆Wn =√

∆tN(0, 1) , (11)

where ∆t = tn+1 − tn is the time increment.

Stochastic ODEs and Stochastic Calculus

Suppose we have an ODE given by

dx

dt= f(x, t) + g(x, t)µ(t) , (12)

where µ(t) is some random perturbation. If we write it in differential form, then we have

dx = f(x, t)dt+ g(x, t)µ(t)dt . (13)

If we let µ(t)dt = dW (t), then we have the Stochastic differential equation

dx = f(x, t)dt+ g(x, t)dW (t) . (14)

In order to solve this equation, the standard method would be to integrate both sides toobtain

x(t) = x(0) +∫ t

0f(x(s), s)dx+

∫ t

0g(x(s), s)dW (s) . (15)

It turns out that if the integrands are non-deterministic, standard calculus does not apply.In order to solve the integral, we must make a change of variables and write the stochasticdifferential equation (14) in terms of v(x). This is given by using the Taylor-series expansion

v(x+ ∆x) = v(x) + ∆xdv

dx+

1

2∆x2 d

2v

∆x2+O

(∆x3

). (16)

Then we can write the differential for v(x) as

dv = lim∆x−>0

v(x+ ∆x)− v(x) =dv

dxdx+

1

2dx2 d

2v

dx2+O

(dx3

), (17)

which, when x is deterministic, simplifies to

dv =dv

dxdx . (18)

However, for stochastic calculus, dx2 does not approach 0 faster than dx, so equation (18)is incorrect. Substituting in for dx from equation (14) we have

dv =dv

dx(f(x, t)dt+ g(x, t)dW ) +

1

2

d2v

dx2

(f 2(x, t)dt2 + 2f(x, t)g(x, t)dWdt+ g2(x, t)dW 2

).

(19)

Handout 14 05/09/02 7

We can neglect dt2 and dWdt, but it turns out that dW 2 = dt and it cannot be neglected,so the stochastic differential equation for v is given by

dv =

(f(x, t)

dv

dx+

1

2g2(x, t)

d2v

dx2

)dt+ g(x, t)

dv

dxdW . (20)

We can now integrate this equation using the methods of deterministic calculus to obtainthe solution for v(x) as

v(t) = v(0) +∫ t

0

(f(x, s)

dv

dx+

1

2g2(x, s)

d2v

dx2

)ds+

∫ t

0g(x, s)

dv

dxdW , (21)

where we have assumed that v(x) and v(t) are equivalent since v(x) = v(x(t)).

References

[1] D. J. Higham. An algorithmic introduction to numerical simulation of stochastic differ-ential equations. SIAM Review, 43:525–546, 2001.

Handout 15 12/09/02 1

Lecture 13: Numerical solution of stochastic differentialequations III

Discrete stochastic integrals

We would like an approximate solution to the simplified stochastic differential equation

dx(t) = W (t)dW (t) , (1)

where the behavior of W (t) is governed by the rules of Brownian motion. Integrating bothsides, we obtain

x(t) =∫ t

0W (s)dW (s) , (2)

where we have assumed that x(0) = 0 for simplicity.

The Ito integral

In the Ito integral, the integral (2) is approximated with the Riemann sum

x(t) =N−1∑

j=0

Wj (Wj+1 −Wj) , (3)

where we have assumed that t = N∆t, WN = W (t) and W0 = W (0). This can be written as

x(t) =1

2

N−1∑

j=0

W 2j+1 −W 2

j −(W 2j+1 − 2WjWj+1 +W 2

j

),

=1

2

N−1∑

j=0

W 2j+1 −W 2

j − (Wj+1 −Wj)2 ,

=1

2

[(W 2

1 −W 20 ) + (W 2

2 −W 21 ) + . . .+ (W 2

N−1 −W 2N−2) + (W 2

N −W 2N−1)

]

−N−1∑

j=0

(Wj+1 −Wj)2 ,

=1

2

(W 2N −W 2

0

)− 1

2

N−1∑

j=0

(Wj+1 −Wj)2 ,

=1

2

(W 2N −W 2

0

)− 1

2

N−1∑

j=0

∆W 2j .

The sum can be written as

N−1∑

j=0

∆W 2j = N

1

N

N−1∑

j=0

∆W 2j

, (4)

Handout 15 12/09/02 2

which is just the discrete form of the variance of a random variable with zero mean,

Var(∆W ) = E(∆W 2) =1

N

N−1∑

j=0

∆W 2j . (5)

Since we know that ∆Wj = Wj+1 −Wj is normally distributed with mean 0 and variance∆t because it governs the jump for Brownian motion, then, as N →∞,

1

N

N−1∑

j=0

∆W 2j = ∆t , (6)

so that the approximation to the integral (2) becomes

x(t) =1

2

(W 2N −W 2

0

)− 1

2N∆t . (7)

From the definition of Brownian motion, W0 = W (0) = 0, and after substituting in forWN = W (t) and N∆t = t, we have

x(t) =1

2W (t)2 − 1

2t . (8)

The Stratonovich integral

Rather than approximating the integral at the left side of the interval, the Stratonovichintegral approximate the integral (2) with the midpoint rule

x(t) =N−1∑

j=0

W(tj + tj+1

2

)(Wj+1 −Wj) . (9)

We will approximate the value of W (tj+1/2) with

W (tj+1/2) =1

2(Wj +Wj+1) + Cj , (10)

where Cj must be determined so that the above approximation still satisfies the rules ofBrownian motion. If we let Z(tj) = W (tj+1/2) represent a random variable that must satisfythe rules of Brownian motion, then the second rule of Brownian motion says that if Z(tj) andZ(tj + ∆t) are both random variables, then the increment ∆Zj = Z(tj + ∆t)− Z(tj) musthave mean 0 and variance ∆t, that is, E(∆Zj) = 0 and Var(∆Zj) = ∆t. The increment isgiven by

∆Zj =1

2[(Wj+1 +Wj+2) + 2Cj+1 − (Wj +Wj+1)− 2Cj] ,

=1

2(Wj+2 −Wj) + (Cj+1 − Cj) .

Since we know that Wj+1 = Wj + ∆Wj and Wj+2 = Wj+1 + ∆Wj+1, then Wj+2 = Wj +∆Wj + ∆Wj+1, and the increment is given by

∆Zj =1

2(∆Wj + ∆Wj+1) + ∆Cj , (11)

Handout 15 12/09/02 3

where ∆Cj = Cj+1 − Cj. Taking the mean of ∆Zj yields

E(∆Zj) = E(

1

2(∆Wj + ∆Wj+1) + ∆Cj

),

=1

2[E(∆Wj) + E(∆Wj+1)] + E(∆Cj) , (12)

but since E(∆Wj) = 0 and E(∆Wj+1) = 0, and we require that E(∆Zj) = 0, then we knowthat the behavior of Cj must satisfy E(∆Cj) = E(Cj+1) − E(Cj) = 0. This is most easilysatisfied by requiring that Cj have a zero mean. Taking the variance of ∆Zj yields

Var(∆Zj) = E(∆Z2j )− E(∆Zj)

2 , (13)

but since we require that E(∆Zj) = 0, then the variance is given by

Var(∆Zj) = E(∆Z2j ) . (14)

Substituting in the values from above we have

Var(∆Zj) = E(

1

4(∆Wj + ∆Wj+1 + 2∆Cj)

2),

=1

4

[E((∆Wj + ∆Wj+1)2

)+ 4E (∆Cj∆Wj) + 4E (∆Cj∆Wj+1) + 4E

(∆C2

j

)],

=1

4

[E(∆W 2

j

)+ 2E (∆Wj∆Wj+1) + E

(∆W 2

j+1

)

+4E (∆Cj∆Wj) + 4E (∆Cj∆Wj+1) + 4E(∆C2

j

)].

Because ∆Wj, ∆Wj+1, and ∆Cj are all independent of each other, then we know that

E(∆Wj∆Wj+1) = E(∆Wj)E(∆Wj+1) = 0 ,

E(∆Cj∆Wj+1) = E(∆Cj)E(∆Wj+1) = 0 ,

E(∆Cj∆Wj) = E(∆Cj)E(∆Wj) = 0 ,

so that we have

Var(∆Zj) =1

4

[E(∆W 2

j ) + E(∆W 2j+1)

]+ E(∆C2

j ) . (15)

Because ∆Wj and ∆Wj+1 satisfy the rules for Brownian motion, and we want ∆Zj to satisfythe rules for Brownian motion, then we must have

Var(∆Zj) = ∆t ,

E(∆W 2j ) = ∆t ,

E(∆W 2j+1) = ∆t ,

so that we have

E(∆C2j ) =

∆t

2. (16)

Handout 15 12/09/02 4

Substituting in for ∆Cj = Cj+1 − Cj, we have

E(C2j+1)− 2E(Cj+1Cj) + E(C2

j ) =∆t

2. (17)

Since Cj+1 and Cj are independent, then E(Cj+1Cj) = E(Cj+1)E(Cj) = 0 and we must have

E(C2j+1) + E(C2

j ) =∆t

2, (18)

which is satisfied when

E(C2j+1) = E(C2

j ) =∆t

4. (19)

If Cj is a normal distribution with mean 0 and variance ∆t/4, then W (tj+1/2) will satisfythe rules for Brownian motion, and is given by

W (tj+1/2) =1

2(Wj +Wj+1) + Cj , (20)

where

Cj ∼ N(

0,∆t

4

). (21)


x(t) =1

2

N−1∑

j=0

(Wj +Wj+1 − 2Cj) (Wj+1 −Wj) ,

=1

2

N−1∑

j=0

(W 2j+1 −W 2

j

)−

N−1∑

j=0

Cj (Wj+1 −Wj) ,

=1

2

[(W 2

1 −W 20

)+(W 2

2 −W 21

)+ . . .+

(W 2N−1 −W 2

N−2

)+(W 2N −W 2

N−1

)]

−N−1∑

j=0

Cj (Wj+1 −Wj) ,

=1

2

(W 2N −W 2

0

)−N

1

N

N−1∑

j=0

Cj∆Wj

.

The last term corresponds to an approximation of NE(Cj∆Wj), which, since Cj and ∆Wj

are independent, approaches 0 as N →∞. Therefore, with W0 = W (0) = 0 and WN = W (t),the Stratonovich integral approximates the integral (2) with

x(t) =1

2W (t)2 . (22)

Comparing the two methods, we see that, depending on where W (t) is evaluated whenapproximating

x(t) =∫ t

0W (s)dW (s) , (23)

the result can be very different, since

x(t) = 12W (t)2 − t

2Ito

x(t) = 12W (t)2 Stratonovich

. (24)

Handout 15 12/09/02 5

Numerical discretization techniques

From lecture 12, the model stochastic differential equation is given by

dx(t) = f(x, t)dt+ g(x, t)dW (t) , (25)

where dW (t) represents a random variable in continuous time that satisfies the rules ofBrownian motion and f(x, t) and g(x, t) are deterministic functions. If we integrate bothsides of equation (25) from tn to tn+1 then we have

xn+1 = xn +∫ tn+1

tnf(x, t)dt+

∫ tn+1

tng(x, t)dW (t) , (26)

where xn+1 = x(tn+1) and xn = x(tn). In the next sections we use different methods toapproximate the deterministic and stochastic integrals on the right hand side (for details see[1]).

Strong and weak convergence

A discretization of a stochastic differential equation governing the behavior of the randomvariable x(t) is said to have strong order of convergence n if we can define a constant k suchthat

E |xj − x(j∆t)| ≤ k∆tn , (27)

where xj is the discrete solution and x(j∆t) is the exact solution. Strong convergencetherefore depends on the expected value of the error of the solution at some point in time.A discretization of a stochastic differential equation is said to have weak order convergencen if we can define a constant k such that

|E(xj)− E(x(j∆t))| ≤ k∆tn . (28)

Weak convergence is not as strict as strong convergence because it is a function of the errorof the mean rather than the mean of the error.

The Euler-Maruyama method

In the Euler-Maruyama method, the deterministic integral in equation (26) is approximatedwith the rectangular rule and the Ito rule is used to compute the stochastic integral to yield

xn+1 = xn + ∆tfn + gn∆Wn , (29)

where fn = f(xn, tn), gn = g(xn, tn), ∆Wn = W (tn+1)−W (tn). We can see that this methodreduces to the forward Euler method for deterministic ODEs. This method has a strongorder of convergence of n = 1/2 and a weak order of convergence of n = 1.

Handout 15 12/09/02 6

Milstein’s higher order method

The Euler-Maruyama method can be made to converge strongly to first order by keepinghigher order terms in the stochastic integral in equation (25). Using this method, the dis-cretized stochastic differential equation becomes

xn+1 = xn + ∆tfn + gn∆Wn +1

2gng

′n

(∆W 2

n −∆t), (30)

where fn = f(xn, tn), gn = g(xn, tn), g′n = dg/dx, and ∆Wn = W (tn+1)−W (tn).

References

[1] D. J. Higham. An algorithmic introduction to numerical simulation of stochastic differ-ential equations. SIAM Review, 43:525–546, 2001.

Introduction to numerical methods for finance students

Documents

Transcript of Introduction to numerical methods for finance students