MTH 375 Handout

7/23/2019 MTH 375 Handout

1/67

MTH375

Numerical Computation

Prof. Dr. Tahira Haroon

Chairperson, Department of

Mathematics,

COMSATS Institute of Informaton

Technology,Park Road, Chak sahazad,

Islamabad


2/67

NUMERICAL COMPUTATION

A numerical process of obtaining a solution is to reduce the original problemto a repetition of the same step or series of steps so that computations becomeautomatic is called a numerical method and a numerical method, which can beused to solve a problem, will be called an algorithm. An algorithm is a completeand unambiguous set of procedures leading to the solution of a mathematical prob-lem. The selection or construction of appropriate algorithms properly falls withinthe discipline of numerical analysis. Numerical analysts should consider all thesources of error that may affect the results. They must consider how much accu-racy is required, estimate the magnitude of the round-off and discretization errors,determine an appropriate step size or the number of iterations required, provide for

adequate checks on accuracy, and make allowance for corrective action in case ofnon-convergence.

Representation of numbers on computers and the errors introduced by theserepresentations are as follows:

The number 257,for example, is expressible as

257 = 2 102 + 5 101 + 7 100

we call 10 thebaseof this system. Any integer is expressible as a polynomial in thebase 10 with integral coefficients between 0 and 9 as

N = (anan1an2 a0)10= an 10n +an1 10n1 +an2 10n2 + +a0 100

Modern computers read pulses sent by electrical components. The state of anelectrical impulse is eitheronoroff. It is therefore, convenient to represent numbersin computers in the binary system, the base 2, and the integer coefficient may takethe values 0 and 1. A nonnegative integer Nwill be represented in the binary systemas

N = (anan1an2 a0)2= an

2n +an1

2n1 +an2

2n2 +

+a0

20

where the coefficient ak are either 0 or 1. Note that N is again represented as apolynomial, but now in the base 2.Users of computers prefer to work in the more familiar decimal system. Then com-puter converts their inputs to base 2 (or perhaps base 16), then performs base 2arithmetic, and finally, translates the answer into base 10 before it prints it out tothem.

Conversion of the binary number to decimal may be accomplished as

1


3/67

(11)2 = 1 21 + 1 20 = 3(1101)2 = 1 23 + 1 22 + 0 21 + 1 20 = 13

and decimal number to binary as

187 = (187)10 = (10111011)2

However, if we look into the machine languages, we soon realize that other numbersystems, particularly the octal and hexadecimal systems, are also used. Hexadecimalprovides more efficient use of memory space for real numbers.

It is easy to convert from octal to binary and back since three binary digits make

one octal digit. To convert from octal to binary one merely replaces all octal digitsby their binary equivalent; thus

187 = (187)10 = (1) (10)2 + (8) (10)1 + (7) (10)0= (1)8 (12)28+ (10)8 (12)18+ (7)8 (12)08= (12)8 ((12)8+ (10)8) + (7)8= (12)8 (22)8+ (7)8= (264)8+ (7)8

= (273)8

= (2 7 3)8

= (010 111 011)2

The Three Number Systems

Besides the various bases for representing numbers - decimal, binary, and octal -there are also three distinct number systems that are used in computing machines.

First, there are the integers, or counting numbers, For example, 0, 1, 2, 3, that are used to index and count and have limited usage for numerical analysis.

Second there are the fixed-point numbers. For example

367.143 258 765593, 245.678 9530.001 236 754 56

The fixed-point number system is the one that the programmer has implicitly usedduring much of his own calculation. Perhaps the only feature that is different inhand and machine calculations is that the machine always carries the same numberof digits, whereas in hand calculation the user often changes the number of figureshe carries to fit the current needs of the problem.

2


4/67

Third, there is the floating-point number system, which is the one used in

almost all practical scientific and engineering computations. This number systemdiffers in significant ways from the fixed-point number system. Typically, the com-puter word length includes both the mantissa and exponent; thus the number ofdigits in the mantissa of a floating point number is less than in that of a fixedpoint.

Floating-Point Arithmetic

Scientific and engineering calculations are, nearly in all cases, carried out in floating-point arithmetic. The computer has a number of values it chooses from to store asan approximation to the real number. The term real numbersis for the continuous

(and infinite) set of numbers on the number line. When printed as a number witha decimal point, it is either fixed pointor Floating-point, in contrast to integers.

Floating-point numbers have three parts:

1. the sign (which requires one bit);

2. thefraction part-often called themantissabut better characterized by thename significand;

3. the exponent part-often called the characteristic.

The significand bits(digits) constitute the fractional part of the number. In almost

all cases, numbers arenormalized, meaning that the fraction digits are shifted andexponent adjusted so that a1 is nonzero. e.g.,

27.39 +.2739 102;0.00124 .1240 102;

37000 +.3700 105.Observe that we have normalized the fractions-the first fraction digit is nonzero. Zerois a special case; it usually has a fraction part with all zeros and a zero exponent.This kind of zero is not normalized and never can be.

Most computers permit two or even three types of numbers:

1. single precision, use the letter E in the exponent which is usually equivalentto seven to nine significant decimal digits;

2. double precision, use the letter D in the exponent instead of E varies from14 to 29 significant decimal digits, but is typically about 16 or 17.;

3. extended precision, which may be equivalent to 19 to 20 significant decimaldigits.

3


5/67

Calculation in double precision usually doubles the storage requirements and

more than doubles running time as compared with single precision.The finite range of the exponent also is a source of trouble, namely, what arecalledoverflow and underflowwhich refer respectively to the numbers exceedingthe largest - and the smallest-sized (non-zero) numbers that can be representedwithin the system.

Numerical methods provide estimates that are very close to the exact analyti-cal solutions: obviously, an error is introduced into the computation. This error isnot a human error, such as a blunder or mistake or oversight but rather a discrep-ancy between the exact and approximate (computed) values. In fact, numericalanalysis is a vehicle to study errors in computations. It is not a static dis-cipline. The continuous change in this field is to devise algorithms, which are both

fast and accurate. These algorithms may become obsolete and may be replacedby algorithms that are more powerful. In the practice of numerical analysis it isimportant to be aware that computed solutions are not exact mathematical solu-tions but numerical methods should be sufficiently accurate1 (or unbiased) to meetthe requirements of a particular scientific problem and they also should be precise2

enough. The precision of a numerical solution can be diminished in several subtleways. Understanding these difficulties can often guide the practitioner in the properimplementation and/or development of numerical algorithms.

Error Analysis

Error analysis is the study and evaluation of error. Anerrorin a numericalcomputation is simply the difference between the actual (true) value of a quantityand its computed (approximate) value. There are three common ways to expressthe size of the error in a computed result: Absolute error, Relative error andPercentage Error.

Suppose that x is an approximation (computed value) to x.The erroris

= x x,

Absolute Error:-The absolute error of a given result is defined as

absolute error =|true value - approximate value|Ea =|x x|

1Accuracyis the number of digits to which an answer is correct.2Precision is the number of digits in which a number is expressed or a given, irrespective of

the correctness of the digits.

4


6/67

Relative Error:-

The relative error =|true value - approximate value||true value|

Er = Ea

|x| ; x = 0

If actual value is not known, then

Er = Ea

|x| ; x = 0

is often a better indicator of the accuracy.

Percentage error: -

Relative error expressed in percentage is called the percentage error, defined by

P E= 100 Er

Significant Digits

In considering rounding errors, it is necessary to be precise in the usage of approxi-mate digits. A significant digit in an approximate number is a digit, which givesreliable information about the size of the number. In other words, a significant digitis used to express accuracy, i.e., how many digits in the number have meaning.

Errors

Themain sources of error are

Gross errors, can be avoided by taking enough care,

Errors in original data, nothing can be done to overcome such errors by anychoice of method, but we need to be aware of such uncertainties; in particular,we may need to perform tests to see how sensitive the results are to changesin the input information,

Truncation errors, due to the finite representation of processes,

Round-off errors, due to the finite representation of numbers in the machine.

5


7/67

They all cause the same effect: diversion from the exact answer. Some errors

are small and may be neglected while others may be devastating if overlooked.Rounding Errors is the most basic source of errors in a computer. It occurswhen a calculator or computer is used to perform real number calculations. Thiserror arises because the arithmetic performed in a machine involves numbers withonly a finite number of digits, say, n significant digits by rounding off the (n+ 1)th

place and dropping all digits after the nth with the result that calculations areperformed with approximate representations of the actual numbers. The error thatresults from replacing a number with its floating-point form is called round-offerror (regardless of whether the rounding or chopping method is used).

When machine drop without rounding, which is called chopping; this can causeserious trouble.

Round-off causes trouble mainly when two numbers of about the same size aresubtracted.

A second, more insidious trouble with round off , especially with chopping, isthe presence of internal correlations between numbers in the computation so that,step after step, the small error is always in the same direction:

Error Accumulation in Computations

1. Error Accumulation in Addition,

2. Error Accumulation in Subtraction,

3. Error Accumulation in Multiplication,

4. Error Accumulation in Division,

5. Errors of Powers and Roots,

6. Error in Function Evaluation,

Propagated Error

The local error at any stage of the calculation is propagated through out the remain-ing part of the computation, i.e., the error in the succeeding steps of the process dueto the occurance of an earlier error. Propagated error is more subtle than theother errors-such errors are in addition to the local errors. Propagated error is ofcritical importance. If errors are magnified continuously as the method continues,eventually they will overshadow the true value, destroying its validity; we call sucha methodunstable. For a stablemethod-the desirable kind- errors made at earlypoints die out as the method continues.

6


8/67

Numerical Cancellation

Accuracy is lost when two nearly equal numbers are subtracted. Thus care shouldbe taken to avoid such subtraction where possible, because this is the major sourceof error in floating point operations.

Errors in Converting Values

The numbers that are input to a computer are ordinarily base10 values. Thus theinput must be converted to the computers internal number base, normally base 2.This conversion itself causes some errors.

Machine epsOne important measure in computer arithmetic is how small a difference betweentwo values the computer can recognize. This quantity is termed the computer eps.

The Solution of Nonlinear Equations

Bisection Method

In bisection method, to solve the expression f(x) = 0, we first must know aninterval in which a root lies. Therefore, an interval x1

x

x2 within the interval

[a, b] has been found such that

f(x1)f(x2)< 0

and the method undertakes to decrease the size of the interval. This decrease is ac-

complished by evaluating the function f(x) at the midpoint, x1+x2

2 , of the interval

then using the condition

f(x1)fx1+x1

2 =

0 a zero at x1+x1

20 new interval (x1+x12

, x2)

The magnitude of this error estimate (which is the interval size after n iterationsteps) is precisely

error = b a

2n

This error does not take machine errors into account, which is handled separately.Advantages

7


9/67

The main advantage of bisection method is that it is guaranteed to work iff

(x

) is continuous in [a,b] and if the values ofx

=a

and x

=b

actually bracket aroot. Another important advantage that few other root finding methods share is

that the number of iterations to achieve a specified accuracy is known in advance.Each repetition halves the length of the interval, and 10 repetitions, for exam-

ple, reduce the length of the original interval by a factor of 210 = 1024> 1000 = 103.Thus 10 or, utmost, 20 repetitions are all that are likely to be required.

Objection

This method is slow to converge.Thepossibilities to end the cycle of repetitions are

1). |x1

x2|

absolute accuracy in

x.

2). |x1 x2x1

| relative accuracy in x(except for x1= 0)3). |f(x1) f(x2)| function values small4). Repeat ntimes a good methodWarnings on the Bisection MethodFunction must be continuous.

For example, if

f(x) = 1

x then the bisection processes will come down to a small interval about x = , and

probably none of our test will have caught the error because the bisection methoddoes not recognize the difference between root and singularity. The Secant

MethodAlmost every function can be approximated by a straight line over a small in-

terval. Letx is near to the root r. Assume that f(x) is linear in the vicinity of theroot r. Choose another point, x1, which is near to x and also near to r (which wedont know yet), then from the obvious similar triangles we get

x2= x f(x)(x x1)f(x) f(x1)

Sincef(x) is not exactly linear, x2 is not equal tor, but it should be closer thaneither of the two points we begin with. We can continue to get better estimates ofthe root if we do this repeatedly, always choosing the two xvalues nearest tor fordrawing the straight line.

Linear Interpolation (False-Position or Regula Falsi Method)

This method is the combination of bisection method and secant method. In thismethodassumptionf(a)f(b)< 0 is taken from bisection method while the concept

8


10/67

of using similar triangles is taken from secant method.

In the false position method, we cannot be sure of the number of steps requiredto decrease the interval by a preassigned amount. The Regula Falsi (False-Position) method is the same as the secant method,

except that the conditionf(xn)f(xn1)< 0

which is not used in secant method.

Modified False-Position Method

A simple modification to the false-position method greatly improves it. The mainweakness in the original method is slow, one-sided approach to the zero. Toremedy this, we arbitrarlyat each step divide the function value that we keep by2.

Newton-Raphson Method

One of the most widely used methods of solving equations is Newton-RaphsonMethod. Starting from an initial estimate which is not too far from a root, x1,weextrapolate along the tangent to its intersection with the xaxis, and take that asthe next approximation. This is continued until either the successivexvalues aresufficiently close, or the value of the function is sufficiently near zero.

General formula isxk+1= xk f(xk)

f(xk).

This formula provides a method of going from one guess xk to the next guessxk+1.

Newtons methodwhen it worksis fine. The method does not always converge;it may jump to another root or oscillate around the desired root.

Thus, in practice, unless the local structure of the function is well understood,Newtons method is to be avoided.

Newtons method also works for complex roots.

Newtons method is widely used because; at least in the near neighborhood ofa root, it is quadratically convergent. However, offsetting this is the need for twofunction evaluations at each step, f(xn) and f

(xn).

Mullers MethodMullers method is based on approximating the function in the neighborhood of

the root by a quadratic polynomial. A second degree polynomial is made to fit threepoints near a root,

(x, f(x)), (x1, f(x1)), (x2, f(x2)),

9


11/67

and the proper zero of this quadratic, using the quadratic formula, is used as the

improved estimate of the root. The process is then repeated using the three pointsnearest the root being evaluated.The procedure for Mullers method is developed by writing a quadratic equation

that fits through three points in the vicinity of the root. Then we get,

x= x 2cb b2 4ac

where

a = f1h2+f2h1 f(h2+h1)

h21h2+h1h22

b = f1 f ah21h1

, c= f

h1 = x1 x, h2= x x2with the sign in the denominator taken to give the largest absolute value of thedenominator (i.e., if b > 0, choose plus; if b < 0, choose minus; if b = 0, chooseeither).

To take the root of the polynomial as one of the set of three points for the nextapproximation, taking three points that are most closely spaced (i.e., if the root isto the right ofx, take x, x1,and the root; if to the left, take x, x2,and the root).Always reset the subscripts to make x be the middle of the three values.

Fixed-Position Iteration

Fixed point iteration is a possible method for obtaining a root of the equation

f(x) = 0. (1)

In this method, we rearrange equation (1) of the form

g(x) x= 0, or x= g(x), (2)

so that any solution of (2) i.e., any fixed pointofg (x), is a solution of (1).

Methods to Solve a System of Linear Equations

A matrix is a rectangular array of numbers in which not only the value of thenumber is important but also its position in the array. The number of its rows andcolumns describes the size of the matrix. A matrix ofn rows and m columns is saidto be n m.

10


12/67

A=

a11 a12 a13 a1ma21 a22 a23 a2ma31 a32 a33 a3m an1 an2 an3 anm

= [aij], i= 1, 2, 3, , n, j= 1, 2, 3, , m

Two matrices of the same size may be added or subtracted. The sum ofA =[aij] and B = [bij] is the matrix whose elements are the sum of the correspondingelements ofAand B:

C=A B= [aij bij ] = [cij].Multiplication of two matrices is defined when the number of columns of first matrixis equal to the number of rows of second matrix i.e., whenAisnmandB is mr:

[cij] = [aij][bij]

or cij =mk=1

aikbkj , i= 1, 2, 3, , n, j = 1, 2, 3, , r.

IfAisnm,B must havemrows or else they are said to be nonconformablefor multiplication and their product is undefined. In general, AB

= BA, so the

order of factors must be preserved in matrix multiplication.Ifk is a scalar then

kA = C, or cij =kaij

Matrix with only one columnn 1 in size, is termed as acolumn vector, and oneof only one row, 1 m in size, is called a row vector. Normally, term vector isused for a column vector. IfA is n n, it is called a square matrix.

The set ofn linear equations in munknowns can be written as

a11x1+a12x2+a13x3+

+a1mxm = b1a21x1+a22x2+a23x3+ +a2mxm = b2a31x1+a32x2+a33x3+ +a3mxm = b3

an1x1+an2x2+an3x3+ +anmxm = bn

(3)

much more simply in matrix notation, as

Ax= b,

11


13/67

where

A=

a11 a12 a13 a1ma21 a22 a23 a2ma31 a32 a33 a3m...an1 an2 an3 anm

, x=

x1x2x3...

xm

, b=

b1b2b3...

bm

.

A very important special case is the multiplication of two vectors, when givesa matrix of one row and one column, a pure number, a scalar then this product iscalled thescalar productof the vectors, or inner product.

Reverse the order of multiplication of these two vectors, and get a matrix, thenthis product is called the outer product.

If all the elements above the diagonal are zero, a matrix is called lower-triangular;it is calledupper-triangularwhen all the elements below the diagonal are zero. Ifonly the diagonal terms are nonzero, the matrix is called a diagonal matrix. Whenthe diagonal elements are each equal to unity while all off-diagonal elements are zero,the matrix is said to be the identity matrix. Tridiagonal matrices are thosethat have nonzero elements only on the diagonal and in the positions adjacent to thediagonal. When a matrix is square, a quantity called its trace is defined, which isthe sum of the elements on the main diagonal. All the elements of the null matrixare zero. For a matrix defined by A= [aij ], its transpose is defined by A

T = [aji].The inverse of a matrix A is written as A1 and satisfies AA1 = A1A= I. A

matrix that has orthonormal columns is called an orthogonal matrix. A vectorwhose length is one is called a unit vector. A vector that has all its elements equalto zero except one element, which has a value of unity, is called a unit basis vector.There are three distinct unit basis vectors for order-3 vectors. Null vectors aredefined as the vectors, with all the elements zero. Transpose vector of a vector

u=

x1x2

x3

is given by uT = [x1 x2 x3]

Sigma Notation

ni=1

xi = x1+x2+x3+ +xnn

i=1

c = cn

i=1

1 ==c(1 + 1 + 1 + + 1) =cn1

i=1

xi = x1

12


14/67

n

i=1

xi =n

j=1

xj =n

k=1

xk. i, j, k are dummy index

System (3) can be written as

mj=1

a1jxj = b1

mj=1

a2jxj = b2

m

j=1 a3jxj = b3 m

j=1

anjxj = bn

m

j=1

aijxj =bi i= 1, 2, 3, , n

Product Notaion

ni=1

x1= x1x2x3 xn

The Gaussian Elimination Method

A method in which the unknowns from the set of equations are eliminated by com-bining equations is known as an eliminationmethod. It is calledGaussian elim-inationif a particular systematic scheme, attributed to Gauss, is used in the elim-ination process. This method is classified as direct methods. Using Gausssmethod, a set ofn equations inn unknowns is reduced to anequivalenttriangularset, which is then easily solved by back substitution.

An efficient way of programming Gausss elimination method for the computer,we write a general procedure for reducing matrices as

akij =ak1ij

ak1kj

ak1kkak1ik

k+ 1 j mk+ 1 i n

(4)

where the is, js, k s, and so on, are as previously defined. The superscripts shownmerely correspond to the primes used in identifying successive reduced matrices,and are not needed in a computer program.

The back substitution procedure may be generalized in the form of the followingset of equations:

13


15/67

xn = anmann

(5)

xi =aim

nj=i+1 aijxj

aiii= n 1, n 2, , 1 (6)

There are two points, yet to be considered. First, we must guard against di-viding by zero. Observe that zeros may be created in the diagonal positions evenif they are not present in the original matrix of coefficients. A useful strategy toavoid (if possible) such zero divisors is to rearrange the equations so as to put thecoefficient of large magnitude on the diagonal at each step. This is calledpivoting.Complete pivoting may require both row and column interchanges. This is notfrequently done. Partial pivoting which places a coefficient of larger magnitudeon the diagonal by row interchanges only, will guarantee a nonzero divisor if thereis a solution to the set of equations, and will have the added advantage of improvedarithmetic precision. The diagonal elements that result are calledpivot elements.

The second important point is the effect of the magnitude of the pivot elementson the accuracy of the solution. If the magnitude of the pivot element is appreciablysmaller than the magnitude (absolute), in general, of the other elements in thematrix, the use of the small pivot element will cause a decrease in the solutionaccuracy. Therefore, for overall accuracy, using as a pivot row the row having thelargest pivot element should make each reduction. Such a provision should always

be incorporated in a computer program that is to solve fairly large numbers ofsimultaneous equations.When only a small number of equations are to be solved, the round-off error is

small and usually does not substantially affect the accuracy of the results, but ifmany equations are to be solved simultaneously, the cumulative effect of round-offerror can introduce relatively large solution errors. For this reason, the number ofsimultaneous equations, which can be satisfactorily solved by Gausss eliminationmethod, using seven to ten significant digits in the arithmetic operations, is generallylimited to 15 to 20 when most of all of the unknowns are present in all of theequations (the coefficient matrix is dense). On the other hand if only a few unknownsare present in each equation (the coefficient matrix is sparse), many more equations

may be satisfactorily handled.The number of equations which can be accurately solved also depends to a great

extent on the condition of the system of equations. If a small relative change inone or more of the coefficients of a system of equations results in a small relativechange in the solution, the system of equations is called a well-conditionedsystem.If, however, a small relative change in one or more of the coefficient values resultsin a large relative change in solution values, the system of equations is said to beill conditioned. Since small changes in the coefficients of an equation may result

14


16/67

from round-off error, the use of double-precision arithmetic and partial pivoting

or complete pivoting becomes very important in obtaining meaningful solutionsof such sets of equations. There exists the possibility that the set of equations has no solution or thatthe prior procedure will fail to find it. During the triangularization step, if azero is encountered on the diagonal, we cannot use that row to eliminate coefficientsbelow that zero element. However, in that case, we will continue by interchangingrows and eventually achieve an upper triangular matrix of coefficients. The realstumbling block is finding a zero on the diagonal after we have trangularized. Ifthat occurs, the back substitution fails, for we cannot divide by zero. It also meansthat the determinant is zero: there is no solution.

Gauss-Jordan Elimination Methods

This procedure varies from the Gaussian method in that, when an unknown iseliminated, it is eliminated from allthe other equations, i.e., from those precedingthe pivot equation as well as those following it. This eliminates the necessity ofusing the back substitution process employed in Gausss method.

The elements of new matrixB can be evaluated from the elements of old matrixAby using the following formulas:

bi1,j1= aij

a1jai1

a11

1 < i n1 < j

m

a11 = 0 (7)

bn,j1= a1j

a11

1 < j ma11 = 0

(8)

Equation (7) is used to find all elements of the new matrix B except those makingup the last row of that matrix. For determining the elements of the last row of thenew matrix, equation (8) is used. In these equations,

i = row number of old matrixAj = column number of old matrixAn = maximum number of rows

m = maximum number of columnsa = an elements of old matrixAb = an elements of new matrixB

Choleskys Method

Choleskys method also known as Crouts method. Crouts method transforms thecoefficient matrix,A, into the product of two matrices,L (lower triangular matrix)

15


17/67

andU, (upper triangular matrix) whereUhas one on its main diagonal (the method

in which L has the ones on its diagonal, is known as Doolittles method).The general formula for getting elements ofLand U corresponding to the coef-ficient matrix for n simultaneous equations can be written as

lij = aij j1k=1

likukj j i i= 1, 2, 3, , n (9)

uij =

aij i1k=1

likukj

liii < j j= 2, 3, , n+ 1 (10)

If we make sure that a11 in the original matrix is nonzero, then the divisions ofequations (10) will always be defined since the lii values will be nonzero. This maybe seen by noting that

LU= A

and therefore the determinant ofL times the determinant ofU equals the determi-nant ofA. that is,

|L||U| = |A|We are assuming independent equations, so the determinant ofA is nonzero.

NormDiscussing multicomponent entities like matrices and vectors, we frequently needa way to express their magnitude- some measure of bigness or smallness. Forordinary numbers, the absolute value tells us how large the number is, but fora matrix there are many components, each of which may be larger or small inmagnitude. (We are not talking about the sizeof a matrix, meaning the number ofelement it contains.)

Any good measure of the magnitude of a matrix (the technical term is normmust have four properties that are intuitively essential:

1. the norm must always have a value greater than or equal to zero, and must bezero only when the matrix is the zero matrix i.e.,

A 0 and A = 0 if and only if A= 0.

2. The norm must be multiplied by k if the matrix is multiplied by the scalar k .i.e.,

kA = |k| A.

16


18/67

3. The norm of the sum of two matrices must not exceed the sum of the norms.

i.e., A + B A + B.

4. The norm of the product of two matrices must not exceed the product of thenorms. i.e.,

AB A B.

The third relationship is called thetriangular inequality. The fourth is importantwhen we deal with the product of matrices.

For vectors in two or three space, the length satisfies all four requirements andis a good value to use for the norm of a vector. This norm is called the Euclideannorm, and is computed by

x21+x22+x

23.

Its generalized form will be

xe =

x21+x22+x

23+ +x2n=

ni=1

x2i

12

.

This is not the only way to compute a vector norm, however. The sum of the absolute

values of the xi can be used as a norm: the maximum value of the magnitudes ofthexi will also serve. These three norms can be interrelated by defining thep-normas

xp =

ni=1

|x|pi 1

p

.

From this it is readily seen that

x1 =n

i=1

|xi| = sum of magnitudes;

x2 =

ni=1

x2i

12

= Euclidean norm;

x = max1in

ni=1

x2i

12

= maximum-magnitude norm.

which of these vector norms is best to use may depend on the problem.

17


19/67

The norm of a matrix are developed by a correspondence to vector norms. Matrix

norms that correspond to the above, for matrix A, can be

A1 = max1jn

ni=1

| aij| = maximum of column sum;

A = max1in

nj=1

| aij| = maximum of row sum.

The matrix normA2 that corresponds to the 2-norm of a vector is not readilycomputed. It is related to the eigenvalues of the matrix. This norm is also called

thespectral norm.For an m nmatrix, we can paraphrase the Euclidean (also called Frobenius)norm as

Ae =

mi=1

nj=1

a2ij

12

Why are norms important? For one thing, they let us express the accuracy of thesolution to a set of equations in quantitative terms by starting the norm of the errorvector (the true solution minus the approximate solution vector). Norms are also

used to study quantitatively the convergence of iterative methods of solving linearsystems.

Iterative Methods

Iterative methods, opposed to the direct method of solving a set of linear equationsby elimination, in certain cases, are preferred. Iterative techniques are seldom usefor solving linear systems of small dimension, since the time required for sufficientaccuracy exceeds that the required for direct techniques. When the coefficient matrixis sparse (has many zeros) they may be more rapid.

Jacobi and Gauss-Seidel Methods

Suppose the given system of linear equationsA x = b, whereA is nonsingular, soit can always be rearranged so that the diagonal elements are nonzero and

x = b + C x,

18


20/67

where

b =

b1

a11

b2

a22

b3

a33...

bn

ann

, C=

0 a12

a11

a13

a11

a14

a11 a1n

a11

a21

a220

a23

a22

a24

a22 a2n

a22

a31

a33

a32

a330

a34

a33 a3n

a33...

...an1

ann

an2

ann

an3

ann

an4

ann 0

Assuming we have an initial estimate x(0)

forx, the next estimate x(1)

is obtainedby substituting x(0) on the right side of the above equation and the next estimateis obtained by substituting x(1) on the right side of equation to give x(2). In thisiterative technique, we obtained

x(n+1) =b + C x(n)

If the elements of C are small in comparison to 1, A is said to be diagonallydominant. If the elements of C are smaller in relation to 1, the more likely isthe sequencex(0), x(1), x(2), , x(n) to converge. Nonetheless convergence also maydepend on how good the initial approximation is. This method is called the Jacobi

Iterative Method. It consists of solving the ith

equation in A x = b for xi toobtain (provided aii= 0)

xi=n

j=1j=i

aijxj

aii

+

bi

aii, for i= 1, 2, , n

and generating each x(k)j from components ofx

(k1) for k 1 by

x

(k)

i =

nj=1j=i

aijx(k1)j

+bi

aii , for i= 1, 2, , nNote that this method is exactly the same as the method of fixed-point iteration fora single equation but it is now applied to a set of equations. This method is writtenin the form

x(k+1) =G(x(k)) =b C x(k)which is identical to form

x(k+1)= g(x(k)).

19


21/67

Jacobi method, is also known as the method of simultaneous displace-

ments because each of the equations is simultaneously changed by using the mostrecent set ofx-values. Actually, thex-values of the next trial(new x)are not used,even in part, until we have first found all its components, even though we have com-puted the new x1, we still do not use this value in computing new x2. In nearlyall cases the new values are better than the old, and should be used in preferenceto the poorer values. When this is done, that is, to compute xki , the components

ofx(k1)i are used. Since, for i > 1, x

(k)1 , x

(k)2 , , x(k)i1 have already been computed

and are likely to be better approximations to the actual solutions x1, x2, , xi1thanx

(k1)1 , x

(k1)2 , , x(k1)i1 , it seems reasonable to compute x(k)i using these most

recently calculated values; that is,

x(k)i =

bi i1j=1

(aijx(k)j ) n

j=i+1

(aijx(k1)j )

aii, for each i= 1, 2, , n (11)

This procedure is called theGauss-Seidel Iterative method. In this method ourfirst step is to rearrange the set of equations by solving each equation for one ofthe variables in terms of the others, exactly as we have done in the Jacobi method.We then proceed to improve each x-value in turn, always using the most recentapproximations to the values of the other variables. The rate of convergence is morerapid, as compared to Jacoi method.

Interpolation and Polynomial ApproximationSuppose that we have some tabulated values of a function over a certain range ofits independent variables. The problem ofinterpolationis to find the value of thefunction for some intermediate value of x not included in the table. This x valueis assumed to lie within the range of the tabulated abscissas. If it does not, theproblem of finding the corresponding function value is called extrapolation.

Tabular values may be plotted to give us a graphical picture of points throughwhich the function must pass. Connecting the points with a smooth curve givesus a graphical representation of the function. If only a rough approximation of thefunction value is required for some intermediate x value, it can be obtained by read-

ing the function value directly from the graph. This procedure is called graphicalinterpolation. If the given values of the independent variable are close together,a sufficiently good graphical approximation to the function might be obtained byconnecting the points with straight line segments. An intermediate function valuecan also be obtained analytically by a method based on this piecewise linear ap-proximation of the function. From similar triangles

f(x) f(x)x x =

f(x1) f(x)x1 x

20


22/67

or f(x) = f(x) +f(x1) f(x)

x1

x(x x) (12)

for x values between x and x1. The use of this equation is called linear interpo-lation, familiar to every one who has used log tables.

If we are given the value of a function f(x) and are asked to find the value ofx then the process is called the inverse interpolation. For inverse interpolationsame straight line is used, but the equation is rearranged into the more convenientform

x = xf(x1) x1f(x)

f(x1) f(x) +(x1 x)f(x)f(x1) f(x) (13)

Here we will discuss polynomial interpolation, the simplest and certainly the

most widely used techniques for obtaining polynomial approximations. A best poly-nomial approximation does not give appreciably better results than an appropriatescheme of polynomial interpolation.

Polynomial Interpolation

Several methods have been developed for finding polynomials, some of which makeuse of special properties such as uniform spacing of the points of abscissa. Also, manyof these methods are useful for particular analytical or computational purposes. Itis important to remember that there is only one unique polynomial of degree n

which fits a given set of (n+ 1) points. Hence, the polynomials obtained by thedifferent methods must be the same.

Polynomial Forms

Definition:- Apolynomial p(x) of degree nis a function of the formp(x) =a+a1x+a2x

2 +a3x3 + +anxn (14)

with certain coefficients a, a1, a2, a3, , an. This polynomial has (exact) degreenin case its leading coefficient an is nonzero.

The power form (14) is the standard way to specify a polynomial in mathe-matical discussions. It is very convenient form for differentiating or integrating apolynomial. But, in various specific contexts, other forms are more convenient.

The power form may lead to loss of significance, a remedy to this loss is the useof the shifted power form

p(x) =a+a1(x c) +a2(x c)2 +a3(x c)3 + +an(x c)n (15)wherec is known ascenter.

21


23/67

Derivation of Taylors Series

Consider the nth degree polynomial

p(x) =b+b1x+b2x2 +b3x

3 +b4x4 + +bnxn

Ifp(x) is known at some point x= c and we are interested to find the value ofp(x)in the neighborhood of this point. Then we can write

p(x) =b+b1(x c) +b2(x c)2 +b3(x c)3 +b4(x c)4 + +bn(x c)n

wherec is calledcenter. Since this equation is true for x = c. Therefore, p(c) =b,differentiating upto ntimes and each derivative evaluated for x = c we get

p(x) = p(c) + (x c)p(c) +(x

c)2

2! p(c) +

(x

c)3

3! p(c) + +

(x

c)n

n! p(n)(c)

(16)

which is the Taylors polynomial of degree nabout x= c.A further generalization of the shifted power form, whenn data points are given,

is the Newton form

p(x) = a+a1(x c1) +a2(x c1)(x c2)+ a3(x c1)(x c2)(x c3) + + an(x c1)(x c2) (x cn) (17)

This form plays a major role in the construction of an interpolating polynomial. Itreduces to the shifted power form if the centers c1, c2, c3, c4, , cn all equal c, andto the power form if the centers c1, c2, c3, c4, , cn all equal zero.

Equation (17) takes n+ n(n+1)2 additions and n(n+1)

2 multiplications, instead

p(x) = a+ (x c1){a1+ (x c2){a2+ (x c3){a3+ +an(x cn)}} } (18)

is the nested form, whose evaluation for any particular value ofxtakes 2nadditionsand n multiplications.

Lagrangian Polynomials

Data where thex-values are not equispaced often occur as the result of experimentalobservations or when historical data are examined.

Consider a linear polynomial (equation of a straight line passing through twodistinct data points (x, y) and (x1, y1))

p(x) = (x x1)(x x1)y+

(x x1)(x x1)y1

22


24/67

To generalize the concept of linear interpolation, consider the construction of polyno-

mial of degreen

. For this let (x

, f)

,(x1

, f1)

,(

x2

, f2)

,(x3

, f3)

,

,(xn

, fn) be (

n+ 1)data points. Here we dont assume uniform spacing between the x-values, nor do

we need the x-values arranged in a particular order. The x-values must be distinct,however. As the linear polynomial passing through (x, f(x)) and (x1, f(x1)) isconstructed by using the quotients

l(x) = (x x1)(x x1) l1(x) =

(x x)(x1 x)

Whenx = x, l(x) = 1 and l1(x) = 0. When x= x1, l(x1) = 0 and l1(x1) = 1.Now consider three data points (x, f), (x1, f1), (x2, f2). Through these points

we need to construct l(x), l1(x) and l2(x) with the property that

l(x) = 1, l(x1) = 0, l(x2) = 0,l1(x) = 0, l1(x1) = 1, l1(x2) = 0,l2(x) = 0, l2(x1) = 0, l2(x2) = 1,

i. e., li(xj) = 0 when i =j and = 1 when i= j . Following the pattern of quotientsof linear interpolation we obtain

l2,(x) = (x x1)(x x2)(x x1)(x x2)

l2,1(x) = (x x)(x x2)(x1 x)(x1 x2)

l2,2(x) = (x x)(x x1)(x2 x)(x2 x1)

These quotients are satisfying the required conditions. Therefore, the polynomialpassing through these three data points becomes

p2(x) = f(x)l2,(x) +f(x1)l2,1(x) +f(x2)l2,2(x)

For (n +1) data values we need to construct for eachk = 0, 1, 2, , na quotientln,k(x) with the property that ln,k(xi) = 0 when i= k and ln,k(xk) = 1. To satisfyln,k(xi) = 0 for each i =k requires that the numerator ofln,k(x) contain the term

(x

x)(x

x1)(x

x2)

(x

xk1)(x

xk+1)

(x

xn)

To satisfy ln,k(xk) = 1,the deniminator and numerator oflk(x) must be equal whenevaluated at x= xk.Thus,

ln,k(x) = (x x)(x x1) (x xk1)(x xk+1) (x xn)(xk x)(xk x1) (xk xk1)(xk xk+1) (xk xn)

ln,k(x) =ni=0k=i

x xixk xi for eachk =, 1, 2, 3, , n (19)

23


25/67

The interpolating polynomial is easily described now that the form of ln,k is

known. This polynomial, is called thenth

Lagrangian interpolating polynomial,and defined as

p(x) = f(x)ln,(x) +f(x1)ln,1(x) +f(x2)ln,2(x) + +f(xn)ln,n(x)

=n

k=0

f(xk)ln,k(x) (20)

and

ln,k(xj) = 1 j=k

0 j

=k

j= 0, 1, 2, 3, , n (21)

Forward Differeces

First forward difference of some functionf(x) with respect to an incrementh ofthe independent variable xis

f(x) =f(x+h) f(x)

The operator always implies this operation on any function of x on which itoperates. Since f(x) is a function ofx, it can be operated on by the operator giving

2f(x) = f(x+ 2h) 2f(x+h) +f(x)

The function 2f(x) is called the second forward difference of f(x). Third,fourth, and higher differences are similarly obtained. In summary, the forwarddifference expressions are

f(xi) = f(xi+h) f(xi)2f(xi) = f(xi+ 2h) 2f(xi+h) +f(xi)3f(xi) = f(xi+ 3h) 3f(xi+ 2h) + 3f(xi+h) f(xi)4f(xi) = f(xi+ 4h) 4f(xi+ 3h) + 6f(xi+ 2h) 4f(xi+h) +f(xi)

...

nf(xi) = f(xi+nh) nf(xi+ (n 1)h) + n(n 1)2!

f(xi+ (n 2)h)

n(n 1)(n 2)3!

f(xi+ (n 3)h) + + (1)nf(xi)

in which xi denotes any specific value for x such as x, x1, x2,and so forth.

24


26/67

The nth forward difference is often written as

nf(xi) = f(xi+nh) n1 f(xi+ (n 1)h) + n2 f(xi+ (n 2)h)

n3

f(xi+ (n 3)h) + + (1)n

nk

f(xi+ (n k)h)

+ (1)nf(xi)where n

k

=

n(n 1)(n 2) (n k+ 1)k!

is the familiar symbol used for binomial coefficients.

Backward Differeces

The first backward difference off(x) with respect to increment his defined as

f(x) =f(x) f(x h)The operator always implies this operation on the function of x on which itoperates, so that

[f(xi)] = 2f(xi) = f(xi) 2f(xi h) +f(xi 2h)]where xi denotes any specific values for x such as x, x1, and so forth. In general,nf(xi) = [n1f(xi)],and the backward differences off(xi) are

f

(xi) =

f(

xi)

f(

xi

h)2f(xi) = f(xi) 2f(xi h) +f(xi 2h)

3f(xi) = f(xi) 3f(xi h) + 3f(xi 2h) f(xi 3h)4f(xi) = f(xi) 4f(xi h) + 6f(xi 2h) 4f(xi 3h) +f(xi 4h)

... ...

nf(xi) = f(xi) nf(xi h) + n(n 1)2!

f(xi 2h) n(n 1)(n 2)3!

f(xi 3h)

+ + (1)k n(n 1) (n k+ 1)k!

f(xi kh) +

+ (1)n n(n

1)

3.2.1

n! f(xi nh)

The nth backward difference in terms of binomial coefficients, is written as

nf(xi) = f(xi) n

1

f(xi h) +

n2

f(xi 2h)

n

3

f(xi 3h) + + (1)n

nk

f(xi kh)

+ (1)nf(xi nh)

25


27/67

We may also note that

f(xi) = f(xi h) 2f(xi) = 2f(xi 2h)3f(xi) = 3f(xi 3h) 4f(xi) = 4f(xi 4h)

nf(xi) = nf(xi nh)

In general

kfs = kfsk (k= 1, 2, 3, )

Central Differeces

The first central difference of f(x) with respect to increment h is defined as, byintroducing the central difference operator

f(x) =f(x+h

2) f(x h

2)

The operator always implies this operation on the function ofx on which it oper-ates, so that

2f(xi) =[f(xi)] = [f(xi+h

2) f(xi h

2)]

= 2f(xi h)

where xi denotes any specific values for x such as x, x1, and so forth. In general,the central differences off(xi) are

f(xi) = f(xi h2

)

2f(xi) = 2f(xi h)

3f(xi) = 3f(xi 3h

2)

4f(xi) = 4f(xi 2h)

..

.

..

.nf(xi) =

nf(xi nh2

)

Divided Differences

There are three disadvantages of using the Lagrangian polynomial method for in-terpolation.

26


28/67

1. It involves more arithmetic operations than does the divided difference

method.2. If we desire to add or subtract a point from the set used to construct the

polynomial, we essentially have to start over in the computations.

3. Lagrangian polynomials must repeat all of the arithmetic if we interpolate ata new x-value. The divided different method avoids all of this computation.

Consider a functionf(x) which is known at several values ofx. We do not assumethat thexs are evenly spaced or even that the values are arranged in any particularorder. Then the divided difference off(x) for the two arguments xk and xj willbe writtenf[xk, xj ] and is defined as

f[xk, xj] = f(xk) f(xj)

xk xj= f[xj, xk]

is called first divided difference between xj and xk.Divided difference of three arguments is defined as follows:-

f[xk, xj, xl] = f[xk, xj] f[xj , xl]

xk xl= f[xl, xj, xk] =f[xj, xk, xl]

is calledsecond order divided difference ordivided difference of three argu-ments xk, xj and xl.Similarly

f[xk, xj, xl, xn] = f[xk, xj, xl] f[xj , xl, xn]

xk xn= f[xl, xj , xk, xn]

isthird order divided difference.Thus, for (n+ 1) arguments, divided difference is defined as

f[x, x1, x2, x3,

, xn] =

f[x1, x2, x3, , xn] f[x, x1, x2, , xn1]xn x

which is known asnth order divided differenceor divided differenceof (n + 1)arguments.

A special standard notation used for divided differences is

f[x, x1] = f[1]

f[x, x1, x2] = f[2]

f[x, x1, x2, , xn] = f[n]

27


29/67

The concept is even extended to a zero-order difference:

f[xs] =fs= f[0]s

Newtons General Interpolating Polynomial

Consider the nth degree polynomial written in a special way

p(x) = a+a1(x x) +a2(x x)(x x1)+ a3(x x)(x x1)(x x2) + + an(x x)(x x1)(x x2) (x xn1) (22)

which is known as Newton form. If (x, f(x)), (x1, f(x1)), (x2, f(x2)), (x3, f(x3)),

, (xn, f(xn)) are (n + 1) data points, andPn(x) is an interpolating polynomial, itmust match at (n+ 1) data points i. e.,Pn(xi) =f(xi) for i= 0, 1, 2, , n i. e.,

when x = x

p(x) = a, and p(x) =f(x)

implies a = f(x) =f

when x = x1

p(x1) = a+a1(x1 x), and p(x1) =f(x1) =f1

a1 =

f1 fx1 x

=f[x, x1]

when x = x2

p(x2) = a+a1(x2 x) +a2(x2 x)(x2 x1),and p(x2) = f2 a2= f[x2, x1] f[x1, x]

x2 x =f[x, x1, x2]Similarly when x = x3

p(x3) = a+a1(x3 x) +a2(x3 x)(x3 x1)+ a3(x3 x)(x3 x1)(x3 x2),

and p(x3) = f3 a3= f[x3, x2, x1] f[x2, x1, x]x3

x

a3 = f[x, x1, x2, x3]

and so on. Substituting these values ofa, a1, a2, a3, , an in (22), we get

p(x) = f+n1k=0

f[x, x1, x2, x3, , xk+1](x x)(x x1)(x x2) (x xk)

or = f+n1k=0

f[k+1] (x x)(x x1)(x x2) (x xk)

28


30/67

which is known as Newtons general interpolating polynomial or Newtons

form in terms of divided differences.Now solving for f(x) yields

f(x) = f+f[x, x1](x x) +f[x, x1, x2](x x)(x x1)+ f[x, x1, x2, x3](x x)(x x1)(x x2) + + f[x, x1, x2, x3, , xn](x x)(x x1)(x x2) (x xn1) +R(x)

(23)

where the remainder term, R(x), is given by

R(x) =f[x, x, x1, x2, x3, , xn](x x)(x x1)(x x2) (x xn) (24)

Iff(x) is a polynomial of degree n, the remainder term is zero for all x, i. e.,

R(x) = 0

i.e.,

f[x, x, x1, x2, x3, , xn] = 0

is of degree zero (is a constant).

Omitting the remainder term in equation (23) gives

pn(x) =f+n1k=0

f[x, x1, x2, x3, , xk+1](x x)(x x1)(x x2) (x xk) (25)

is the Newtons general interpolating polynomial.

Error Estimation When Function Is Unknown-The Next-Term

RuleWhen dealing with experimental data, almost always the function f(x) is unknown.But we can estimate the error of interpolation as the nth-order divided difference

is itself an approximation for f(n)(x)

n! means that the error of interpolation is given

approximately by the value of the next term that would be added.

This most important rule for estimating the error of interpolation is known asthenext-term rule.

29


31/67

Newton-Gregory Forward-Interpolating Polynomial

If the values of the function are given at evenly spaced intervals of the independentvariable i. e.,

x1 = x+h

x2 = x+ 2 h...

...

xn = x+n h

and for any general value ofxwe can write

x = x+s h

wheresis a real number, then(x x)

h = s

(x x1)h

= (s 1)(x x2)

h = (s 2)

similarly (x xn1)

h = (s (n 1))

and

f[x, x1] = f

h

f[x, x1, x2] = 2f

2! h2...

...

f[x, x1, x2, x3, , xn] = nf

n! hn

and Newtons general interpolating polynomial transformed as

p(x) = f+s f+s(s 1)

2!

2f+s(s 1)(s 2)

3!

3f

+ + s(s 1)(s 2) (s (n 1))n!

nf

gives the Newton-Gregory forward-interpolating polynomial which may beexpressed more compactly by using the binomial coefficient notation; it is

p(x) = f+s

1

f+

s2

2f+

s3

3f+ +

sn

nf

30


32/67

Newton-Gregory Backward-Interpolating Polynomial

If the values of the function are given at evenly spaced intervals of the independentvariable i. e.,

xn1 = xn hxn2 = xn 2 h

... ...

x = x n hand for any general value ofxwe can write

x = xn s hwheres is a real number, then

(x xn)h

= s(x xn1)

h = (1 s)

(x xn2)h

= (2 s)

similarly (x x1)

h = ((n 1) s)

and f[xn, xn1] = fnh

f[xn, xn1, xn2] = 2fn

2! h2...

...

f[x, x1, x2, x3, , xn] = nfn

n! hn

and Newtons general interpolating polynomial transformed as

p(x) = fn

s

fn+s(s 1)

2! 2fn

s(s 1)(s 2)

3! 3fn+

+ (1)n s(s 1)(s 2) ((s (n 1))

n! nfn

gives theNewton-Gregory backward-interpolating polynomialwhich may beexpressed more compactly by using the binomial coefficient notation; it is

p(x) = fn s

1

fn+

s2

2fn

s3

3fn+ + (1)n

sn

nfn

31


33/67

Differences Versus Divided Differences

We can relate divided differences of functional values with function differences whenthe xvalues are evenly spaced, i. e.,

f[x, x1] = f(x1) f(x)

x1 x =f(x)

h

f[x, x1, x2] = f[x1, x2] f[x, x1]

x2 x =2f(x)

2h2

f[x, x1, x2, x3] = f[x1, x2, x3] f[x, x1, x2]

x3 x =3f(x)

3!h3

... ...

f[x, x1, x2, , xn] = f[x1, x2, , xn] f[x, x1, , xn1]xn x =

nf(x)

n!hn

Interpolating with a Cubic Spline

Often a large number of data points have to be fitted by a single smooth curve, butthe Lagrangian interpolation or Newton interpolation polynomial of a high orderis not suitable for this purpose, because the errors of a single polynomial tend to

increase drastically as its order becomes large i. e., the oscillatory nature of the high-degree polynomials and the properties that a fluctuation over a small portion of theinterval can induce large fluctuations over the entire range restrict their use whenapproximating functions that arise in many situations. One remedy is to the problemis to fit different polynomials to the subregions of f(x). One of these is to dividethe interval into a collection of subintervals and construct a (generally) differentapproximating polynomial on each subinterval. Approximation by functions of thistype is called piecewise polynomial approximation.

The simplest piecewise polynomial approximation is piecewise linear inter-polation, which consists of joining a set of data points

(x, f), (x1, f1), (x2, f2), , (xn, fn)

by a series of straight lines. This is the method of linear interpolation. The problemwith this linear function approximation is that at each of the endpoints of the subin-tervals there is no assurance of differentiability, which means that the interpolatingfunction is not smooth at these points, i.e., the slope is discontinuous at these points.Often it is clear from the physical conditions that such a smoothness condition isrequired and that the approximating function must be continuously differentiable.

32


34/67

The most common piecewise polynomial approximation using cubic polynomial

between each successive pair of data, Cubic spline interpolation, designed to suitthis purpose.The drafting spline bends according to the laws of beam flexure so that both

the slope and curvature are continuous. Our mathematical spline curve must usepolynomials of degree three (or more) to match this behavior. While splines can beof any degree, cubic splines are by far the most popular.

In cubic spline interpolation, a cubic polynomial is used in each interval betweentwo consecutive data points. One cubic polynomial has four free coefficients, so itneeds four conditions. Two of them come from the requirements that the polynomialmust pass through the data points at the two end points of the interval. The othertwo are the requirements that the first and second derivatives of the polynomial

become continuous across each data points.Start with the data points

(x, f), (x1, f1), (x2, f2), , (xn, fn)Write the equation for a cubic in the ith interval, which lies between the points(xi, yi) and (xi+1, yi+1) in the form

gi(x) =ai(x xi)3 +bi(x xi)2 +ci(x xi) +di. (26)Thus the cubic spline function we want is of the form

g(x) =gi(x) on the interval [xi, xi+1], for i= 0, 1, 2, 3, , n 1and meets the conditions:

gi(xi) = yi, i= 0, 1, 2, 3, , n 1 and gn1(xn) =yn; (27)gi(xi+1) = gi+1(xi+1), i= 0, 1, 2, 3, , n 2; (28)gi(xi+1) = g

i+1(xi+1), i= 0, 1, 2, 3, , n 2; (29)

gi(xi+1) = gi+1(xi+1), i= 0, 1, 2, 3, , n 2; (30)

Equations (27-30) say that the cubic spline fits to each of the points (27), is contin-uous (28), and is continuous in slope and curvature, (29) and (30), throughout the

region spanned by the points.Using equation (27) in equation (26) immediately gives

di= yi, 1 = 0, 1, 2, 3, , n 1. (31)Equation (27) then gives

yi+1 = ai(xi+1 xi)3 +bi(xi+1 xi)2 +ci(xi+1 xi) +yi.= aih

3i +bih

2i +cihi+yi, i= 0, 1, 2, 3, , n 1 (32)

33


35/67

wherehi= xi+1 xi, the width of the ith interval.

To relate the slopes and curvatures of the joining splines, we differentiate equa-tion (26):

gi(x) = 3ai(x xi)2 + 2bi(x xi) +ci, (33)gi(x) = 6ai(x xi) + 2bi, for i= 0, 1, 2, 3, , n 1. (34)

Let S(xi) =gi(x) for i= 0, 1, 2, 3, , n 1 and Sn(xn) =g n1(xn) then

Si(xi) = 6ai(xi xi) + 2bi,= 2bi;

Si+1(xi) = 6ai(xi+1 xi) + 2bi,

= 6aihi+ 2bi.

Hence we can write

bi = Si

2, (35)

ai = Si+1 Si

6hi. (36)

Substitute the relations for ai, bi, di given by equations (31), (35) and (36) intoequation (26) we get

gi(xi+1) = Si+1 Si

6hih3i + S

i

2h2i +cihi+yi= yi+1;

implies ci = yi+1 yi

hi hiSi+1+ 2hiSi

6 . (37)

Now equation (33) gives

yi = 3ai(xi xi)2 + 2bi(xi xi) +ci;= ci. (38)

In the previous interval, from xi1 to xi, the equation for cubic spline is

gi1(x) = ai1(x xi1)3 +bi1(x xi1)2 +ci1(x xi1) +di1;gi1(x) = 3ai1(x xi1)2 + 2bi1(x xi1) +ci1;

gi1(xi) = 3ai1(xi xi1)2 + 2bi1(xi xi1) +ci1;(39)

Using equation (29), we obtain

yi = 3ai1h2i1+ 2bi1hi1+ci1; (40)

34


36/67

Equating equations (38) and (40) and using (35) (36) and (37) we get

yi = yi+1 yi

hi hiSi+1+ 2hiSi

6

= 3Si Si1

6hi1h2i1+ 2

Si1

2 hi1+

yi yi1hi1

hi1Si+ 2hi1Si16

(41)

Simplifying this equation, we get

hi1Si1+ 2(hi1+hi)Si+hiSi+1 = 6

yi+1 yi

hi yi yi1

hi1

= 6 (f[xi+1, xi] f[xi, xi1]) . (42)

If all of the intervals are equal in length, this simplifies to a linear differenceequation with constant coefficients:

Si1+ 4Si+Si+1 = 62f(xi1)

h2 . (43)

Equations (42) and (43) represent n 1 relations inn +1 unknowns; so we need twovalues of the second derivative or two more equations involving second derivativesat some of the points xi. Often the end valuesSand Snare chosen. The conditionsfrequently used are

1. S = 0 and Sn = 0. called a natural spline, makes the end cubics approachlinearity at their extremities.

2. Another frequently used condition, normally called clamped spline. Iff(x) =A and f(xn) =B , we get

At left end: 2hS+h1S1 = 6(f[x, x1] A)At right end: hn1Sn1+ 2hnSn = 6(B f[xn1, xn])

3. S = S1 and Sn= Sn1; called parabolically terminated spline.

4. TakeS as a linear extrapolation from S1 and S2 and Sn as a linear extrapo-lation from Sn1 and Sn2. Only this condition gives cubic spline curves thatmatch exactly to f(x) when f(x) is itself a cubic. We get

At left end: S1 S

h=

S2 S1h1

,

implies S =(h1+h)S1 hS2

h1.

35


37/67

At right end: Sn Sn1

hn1=

Sn1 Sn2hn2

,

implies Sn =(hn1+hn2)Sn1 hn1Sn2

hn2.

this gives too much curvature in the end intervals.

For each end condition, the coefficient matrices become

Condition 1 S = 0, Sn= 0 :

2(h+h1) h1h1 2(h1+h2) h2

h2 2(h2+h3) h3h3 2(h3+h4) h4

... ...

hn2 2(hn2+hn1)

Condition 2 f= A, fn= B :

2h h1h 2(h+h1) h1

h1 2(h1+h2) h2h2 2(h2+h3) h3

... ...

hn2 2hn1

Condition 3 S = S1, Sn= Sn1:

(3h+ 2h1) h1h1 2(h1+h2) h2h2 2(h2+h3) h3

h3 2(h3+h4) h4...

...hn2 (2hn2+ 3hn1)

Condition 4 S and Sn are linear extrapolations:

36


38/67

(h+h1)(h+ 2h1)h1

(h2

1 h2

)h1

h1 2(h1+h2) h2h2 2(h2+h3) h3...

...(h2n2 h2n1)

hn2

(hn1+hn2)(hn1+ 2hn2)

hn2

After the Si values are obtained, we can compute ai, bi, ci,and di for the cubics

in each interval, using

ai = Si+1 Si6hi

bi = Si2

ci = f[xi, xi+1] hiSi+1+ 2hiSi6

di= yi

Numerical Differentiation and Numerical Integra-

tion

Numerical integration, ornumerical quadrature as it is often called, consistsessentially of finding a close approximation to the area under a curve of a function

f(x) which has been determined either from experimental data or from a mathe-matical expression.

Before discussing the general situation of quadrature formulas, we recall defini-tions and formulas, which are commonly introduced in calculus courses.

Definition # 1:- Let f be defined on a closed interval [a, b], and let P be apartition of [a, b]. A Riemann sum of f (or f(x)) for P is any expression RP ofthe form

RP =n

k=1

f(k)xk,

wherek is in [xk1, xk] and k= 1, 2, 3, , n.Definition # 2:- Let f be defined on a closed interval [a, b], and let L be a

real number. The statement

limP0

k

f(k)xk = L,

means that for every >0, there is a >0 such that Pis a partition of [a, b] withP < ,then

|k

f(k)xk| <

37


39/67

for any choice of number k in the subintervals [xk1, xk] ofP. The numberL is a

limit of (Riemann) sums.Definition # 3:- Let fbe defined on a closed interval [a, b]. Thedefinite

integral of f from a to b, denoted bybb

f(x)dx, is ba

f(x)dx = limP0

k

f(k)xk,

provided the limit exists.

Rectangle Rules

Using the above definitions, we approximate the value of ba

f(x) dx as a sum ofareas of rectangles. In particular, if we use a regular partition (evenly spaced data

points) with h = x = b a

n , then xk = x+ kh for k = 0, 1, 2, 3, , n, a = x,

b= xn,and ba

f(x)dx n

k=1

f(k)h,

where k is any number in the kth subinterval [xk1, xk]. Each term f(k)h in the

sum is the area of a rectangle of width h and height f(k). The accuracy of such

an approximation to

b

af(x)dx by rectangles is affected by both the location ofk

within each subinterval and the width h of the rectangles.By locating each k at a left-hand endpoint xk1, we obtain a left endpoint ap-

proximation. Alternately, by locating eachkat a right-hand endpointxk, we obtaina right endpoint approximation. A third possibility is to let k be the midpoint of

each subinterval: the k1/2 = xk1+xk

2 . This choice of location for k gives a

midpoint approximation.Rectangle Rules For a rectangle partition of an interval [a, b] with n subin-

tervals, each of width h = b a

n , the definite integral

ba

f(x) dx is approximated

by

1. The left rectangle rule: ba

f(x)dx Al =n

k=1

f(k1)h= b a

n

nk=1

f(k1)

2. The right rectangle rule: ba

f(x)dx Ar =n

k=1

f(k)h= b a

n

nk=1

f(k)

38


40/67

3. The midpoint rule:

ba

f(x)dx Am=n

k=1

f(k1/2)h= b a

n

nk=1

f(k1/2)

If a function is strictly increasing or strictly decreasing over the interval, then theend points rules give the areas of the inscribed and circumscribed rectangles.

The Trapezoidal Rule- A Composite Formula

To evaluate ba

f(x)dx, subdivide the interval froma to b inton subintervals xin

width. The area under the curve in each subinterval is approximated by the trape-zoid formed by replacing the curve by its secant line drawn between the endpointsx < x1 < x2 < < xnof the curve. The integral is then approximated by the sumof all the trapezoid areas. Let hbe the constant x. Since the area of a trapezoidis the sum of the area of a rectangle and the area of a triangle, for each subinterval,

i+1i

f(x)dx xf(xi) +x2

(f(xi+1 f(xi))

= h

2

(f(xi) +f(xi+1))

is known as trapezoidal rule, which can also be obtained from midpoint rule andiff(x)0 on interval [a, b], then to find the area under the curve f(x) over [a, b],represented by

ba

f(x)dx, subdivid [a, b] into nsubintervals of size h, so that

A =

1

f(x)dx h2

(f(x) +f(x1))

A1 = 2

1

f(x)dx h2

(f(x1) +f(x2))

... ...

An1 =

nn1

f(x)dx h2

(f(xn1) +f(xn))

The total area lying between x = a andx= b is given by

A=

ba

f(x)dx A+A1+A2+A3+ +An1 (44)

39


41/67

Substituting the above values (??) in equation (44) , we get

A =

ba

f(x)dx h2

(f(x) + 2f(x1) + 2f(x2) + + 2f(xn1) +f(xn))

or A =

ba

f(x)dx h2

(f(x) + 2n1i=1

f(xi) +f(xn)) (45)

equation (45) is called the composite trapezoidal rule. This method, for replac-ing a curve by a straight line is hardly accurate, unless the subintervals are verysmall.

Simpsons Rules

The trapezoidal rule approximates the area under a curve by summing the areas ofuniform width trapezoids formed by connecting successive points on the curve bystraight lines. Simpsons rule gives a more accurate approximation by connectingsuccessive groups of three points on the curve by second-degree parabolas, knownas Simpsons 1

3 rule, and summing the areas under the parabolas to obtain the

approximate area under the curve, or by connecting successive groups of four pointson the curve by third-degree polynomial, known asSimpsons 3

8rule, and summing

the areas, to obtain the approximate area under the curve.

Simpsons 13

Rules

Consider the area contained in the two strips under the curve of f(x) comprisingwith three data points (x, y), (x1, y1) and (x2, y2) . Approximate this area withthe area under a parabola passing through these three points.

The general form of the equation of the second-degree parabola connecting thethree points is

f(x) =ax2 +bx+c (46)

The integration of equation (46) fromx to x gives the area contained in thetwo strips under the parabola. Hence,

A2strips =

xx

(ax2 +bx+c)dx

=

ax3

3 +

bx2

2 +cx

xx

= 2

3a(x)3 + 2c(x) (47)

40


42/67

The constantsa and c can be determined from the fact that points (x, y), (0, y1)

and (x, y

2) must all satisfy equation (46). The substitution of these three sets ofthe coordinates into equation (46) yields

a = y 2y1+y2

2(x)2 b=

y2 y2(x)

c= y1 (48)

The substitution of the first and the third parts of equation (48) into equation (47)yields

A2strips=x

3 (y+ 4y1+y2) (49)

which gives the area in terms of the three ordinates y, y1,and y2 and the width xof a single strip. This constitutesSimpsons 13 rulefor obtaining the approximatearea contained in two equal-width strips under a curve.

If the area under a curve between two values ofx is divided into n uniform strips(neven), the application of equation (49) shows that

A =x

3 (y+ 4y1+y2)

A2 =x

3 (y2+ 4y3+y4)

A4 =x

3

(y4+ 4y5+y6) (50)

... ...

An2 =x

3 (yn2+ 4yn1+yn)

Summing these areas, we can write

xnx

f(x)dx A+A2+A4+ +An2

= x

3 (y+ 4y1+ 2y2+ 4y3+ 2y4+ 4y5

+ 2y6+ + 4yn1+yn)

= x

3 (y+ 4

i=n1i=1,3,5

yi+ 2i=n2i=2,4,6

yi+yn) (51)

wheren must be an even number. Equation (51) is called Composite/extendedSimpsons 1

3 rule for obtaining the approximate area under a curve. It may be

used when the area is divided into an even number of strips of width x.

41


43/67

Simpsons 38

Rules

If an odd number of strips is used, Simpsons three-eighths rule for obtainingthe area contained in 3 strips under a curve (with four datapoints) can be used.The derivation of the three-eighths rule determines the area under a third-degreepolynomial connecting four points on the given curve. The general form of thethird-degree polynomial is

y= ax3 +bx2 +cx+d

For convenience take (32

x, y), (12

x, y1), (1

2x, y2), (

3

2x, y3). Therefore, the

range of integration is from3

2xto

3

2x, i. e.,

A3strips =

32x

32x

(ax3 +bx2 +cx+d)dx

=

ax4

4 +

bx3

3 +

cx2

2 +dx

32x

32x

= 9

4b(x)3 + 3d(x) (52)

The constants b and d can be determined from the fact that points (3

2x, y),

(12

x, y1), (1

2x, y2), and (

3

2x, y3) must all satisfy equation (52). Using these

four sets of the coordinates into equation (52), we obtain

b = y+y3 y1 y2

4x2 (53)

d = 1

8(9y1+ 9y2 y y3) (54)

The substitution of the b and d from equations (53) and (54) in equation (52)yields

A3strips = 3

8(x)(y+ 3y1+ 3y2+y3) (55)

which is Simpsons three-eighths rule for obtaining the approximate area con-tained in three equal-width strips under a curve. It gives the area in terms of thefour ordinates y, y1, y2,and y3 and the width xof a single strip.

42


44/67

If the area under a curve is divided into n, a multiple of 3, uniform strips then

the application of equation (55) shows that

A =3x

8 (y+ 3y1+ 3y2+y3)

A3 =3x

8 (y3+ 3y4+ 3y5+y6)

A6 =3x

8 (y6+ 3y7+ 3y8+y9) (56)

... ...

An3 =3x

8 (yn3+ 3yn2+ 3yn1+yn)

Summing these areas, we can write xnx

f(x)dx A+A3+A6+ +An3

= 3x

8 (y+ 3y1+ 3y2+ 2y3+ 3y4+ 3y5

+ 2y6+ + 2yn3+ 3yn2+ 3yn1+yn) (57)Equation (57) is called Composite/extended Simpsons 3

8 rule for obtaining

the approximate area under a curve. This rule is applicable to a multiple of threeintervals.

Newton-Cotes Integration Formulas

In developing formulas for numerical integration when the values of data points areequispaced, Newton-Gregory forward polynomial is a convenient starting point.The numerical integration methods that are derived by integrating the Newton-Gregory interpolating polynomials are theNewton-Cotes integration formulas,then

b

a

f(x)dx b

a

Pn(xs)dx.

This formula will not give us the exact answer because the polynomial is not identicalwithf(x). We get an expression for the error by integrating the error term ofPn(xs).

So starting with the Newton-Gregory forward polynomial of degree n

Pn(xs) = f+s f+s(s 1)

2! 2f+

s(s 1)(s 2)3!

3f+

+ s(s 1)(s 2) (s (n 1))

n! nf

43


45/67

For n = 1 (i. e., two data points (x, f), (x1, f1) )

P1(xs) = f+s f

and

ba

f(x)dx x1x

P1(xs)dx=

x1x

(f+s f)dx

Asx = x + shwe getdx = hds and at x = x, s= 0 and at x = x1, s= 1 this gives x1x

(f+s f)dx =

10

(f+s f)hds

= fh s]10+ fh

s2

2 ]10

= h2

(f+f1)

using f= f1 f and for the error estimation, we use the next-term rule i. e.,Error = (Approximately) the value of the next term that would be addedto P1(xs) implies

Error =

x1x

s(s 1)2!

2fdx

= 2fh

2 1

0

(s2 s)ds

= 2fh

2

s3

3 s

2

2

1

0

= h12

2f = h3

12f

Since 2f

h2 = f . Taking (x, x1) such that f

() is the maximum value in

(x, x1), then

Error = h3

12f()

Therefore, Newton-Cotes formula for n= 1 becomes x1x

f(x)dx= h

2(f+f1) h

3

12f() where (x, x1)

For n= 2 (i. e., three data points (x, f), (x1, f1), (x2, f2) )

P2(xs) = f+s f+s(s 1)

2! 2f

44


46/67

and b

a

f(x)dx

x2

x

P2(xs)dx

=

x2x

(f+s f+s(s 1)

2! 2f)dx

Atx = x, s= 0,and at x= x2, s= 2 this gives x2x

(f + s f+s(s 1)

2! 2f)dx

=

20

(f+s f+s(s 1)

2! 2f)hds

= fh s20 + f

h s2

22

0+

2fh

2s3

3s2

22

0

= 2fh+ 2hf+2fh

2

8

3 2

=

h

3(f+ 4f1+f2)

using f= f1 f and 2f= f2 2f1+f.Using next-term rule for the error evaluation, we get

Error =

x2x

s(s 1)(s 2)3!

3fdx

= 2

0

s(s 1)(s 2)

3!

3fhds

= 3fh

6

s4

4 s3 +s2

20

= 0

Considering next term for error, we get

Error =

x2x

s(s 1)(s 2)(s 3)4!

4fdx

=

20

s(s 1)(s 2)(s 3)4!

4fhds

= 4fh

24s5

56s4

4 +

11s3

3 6s2

22

0

= 4fh

24

4

15

=

4fh

90

Since 4f

h4 = fiv . Taking (x, x2) such that f

iv() is the maximum value in

(x, x2), then

Error = h5

90fiv()

45


47/67

Therefore, Newton-Cotes formula for n= 2 becomes x2

x

f(x)dx= h3

(f+ 4f1+f2) h590

fiv() where (x, x2)

For n= 3 (four data points (x, f), (x1, f1), (x2, f2), (x3, f3) )

P3(xs) = f+s f+s(s 1)

2! 2f+

s(s 1)(s 2)3!

3f

and

ba

f(x)dx x3x

P3(xs)dx

=

3

0

(f+s f+s(s 1)

2! 2f+

s(s 1)(s 2)3!

3f)hds

= fh s30

+ fh s2

2

3

0

+2fh

2

s3

3 s2

2

3

0

+ 3fh

6

s4

4 s3 +s2

30

= 3h(f+3

2(f1 f) +3

4(f2 2f1+f) +1

8(f3 3f2+ 3f1 f))

= 3h

8(f+ 3f1+ 3f2+f3)

where f = f1 f, 2f= f2 2f1+f and 3f= f3 3f2+ 3f1 f.

Considering next term for error, we obtainError =

x3x

s(s 1)(s 2)(s 3)4!

4fdx

=

30

s(s 1)(s 2)(s 3)4!

4fhds

= 4fh

24

s5

56s

4

4 +

11s3

3 6s

2

2

30

= 380

4fh

Taking (x, x3) such that fiv() is the maximum value in (x, x3), then

Error =

3

80

h5fiv()

Therefore, Newton-Cotes formula for n= 3 is derived as x3x

f(x)dx=3h

8(f+ 3f1+ 3f2+f3) 3

80h5fiv() where (x, x3)

Since the order of error of the3

8rule is the same as that of the

1

3rule, there is no

gain in accuracy over the 3

8rule when one has a free choice between the two rules.

46


48/67

Gaussian Quadrature

Gauss observed that if we dont consider the function at predetermined x-values,a three term formula will contain six parameters and should corresponds to aninterpolating polynomial of degree five. Formulas based on this principle are calledGaussian quadrature formulas. They can be applied only when f(x) is knownexplicitly, so that it can be evaluated at any desired value ofx. Consider the simplecase of two-term formula containing four unknown parameters: 1

1

f(t)dt af(t1) +bf(t2)

a symmetrical interval of integration is used to simplfy the arithmetic. This formula

is valid for any polynomial of degree three; hence it will hold if f(t) = t3, f(t) =t2, f(t) =t, andf(t) = 1;

f(t) = t3;

11

t3 dt = 0 = at31+bt32;

f(t) = t2;

11

t2 dt = 2

3 = at21+bt

22;

f(t) = t;

11

t dt = 0 = at1+bt2;

f(t) = 1; 1

1

dt = 2 = a+b.

(58)

Multiplying the third equation by t21,and subtracting from the first, we get

0 = 0 +b(t32 t2t21) =b(t2)(t2 t1)(t2+t1).

this implies that either b = 0, t2 = 0, t1 = t2, or t1 =t2. Only the last of thesepossibilities is satisfactory, the other being invalid, or else reduce our formuls to onlya single term, so we choose t1 = t2.Then

a = b = 1,

t2 =

t1 = 1

3= 0.5773,

(59)

11

f(t)dt f(0.5773) +f(0.5773).

i.e., adding these two values of the function gives the exact value for the integral ofany cubic polynomial over the interval from1 to 1.

Consider a problem with limits of integration from a to b, not from1 to 1 forwhich we derived this formula. We must change the interval of integration to (1, 1)

47


49/67

by a change of variable, by the following scheme:

Letx = At + B, lower limit: x= a, t= 1 this givesa = A + B,and upper limit: x= b, t= 1 this gives b= A +B, combining these two

equations we get A= b a

2 , and B =

b+a

2 , i.e.,

x= b a

2 t+

b+a

2 , so that dx=

b a2

dt,

then ba

f(x)dx=b a

2

11

f(b a

2 t+

b +a

2 )dt.

Gaussian quadrature can be extended beyond two terms. The formula is thengiven by 1

1

f(t)dt n

i=1

wif(ti), for n points.

This formula is exact for functions f(t) that are polynomials of degree 2n 1 orless.

Moreover, by extending the method we used previously for the 2-point formula,for each nwe obtain a system of 2nequation:

w1tk1+w2t

k2+ w3t

k3+ +wnt

kn=

0, for k= 1, 3, 5,

, 2n

1;

2k+ 1

for k= 0, 2, 4, , 2n 2.

Adaptive Quadrature

The composite quadrature rules necessitate the use of equally spaced points. It isuseful to introduce a method that adjusts the step size to be smaller over portionsof the curve where a larger functional variation occurs. This technique is calledadaptive quadrature. The method is based on Simpsons rule.

Simpsons rule uses two subintervals over [ak, bk]:

bkak

f(x)dx h3

[f(ak) + 4f(ck) +f(bk)] =S(ak, bk), say (60)

where ck = 12

(ak + bk) is the center of [ak, bk] and h = bk ak

2 . Furthermore,

f C4[ak, bk] then there exists a value 1 [ak, bk] so that bkak

f(x)dx= S(ak, bk) h5

90f(4)(1) (61)

48


50/67

A composite Simpsons rule using four subintervals of [ak, bk] can be performed by

bisecting this interval into two equal subintervals [ak1

, bk1] and [

ak2

, bk2] and applyingformula (60) recursively over each piece. Only two additional evaluations of f(x)

are needed, and the result is

S(ak1, bk1) + S(ak2, bk2) = h

6[f(ak1) + 4f(ck1) + f(bk1)] +

h

6[f(ak2) + 4f(ck2) + f(bk2)]

(62)whereak1 =ak, bk1 =ak2 = ck, bk2 =bk, ck1 is the midpoint of [ak1, bk1] and ck2 isthe midpoint of [ak2, bk2]. In formula (62) the step size is

h2

, accounts for the factorsh6 on the right side of equation. Furthermore, iff C4[ak, bk], there exists a value2 C4[ak, bk] so that

bk

ak

f(x)dx= S(ak1, bk1) +S(ak2, bk2) h516

f(4)(2)90

. (63)

Assume thatf(4)(1) f(4)(2); then the right sides of equations (61) and (63) areused to obtain the relation

S(ak, bk) h5

90f(4)(2) S(ak1, bk1) +S(ak2, bk2) h

5

16

f(4)(2)

90 (64)

which can be written as

h5

90f(4)(2)

16

15[S(ak1, bk1) +S(ak2, bk2)

S(ak, bk)] (65)

Then (65) is substituted in (63) to obtain the error estimate:

| bkak

f(x)dxS(ak1, bk1)S(ak2, bk2)| 115

|S(ak1, bk1)+S(ak2, bk2)S(ak, bk)| (66)

Numerical Integration using Cubic Splines

Cubic splines can be used for finding derivatives and integrals of functions, even whenthe function is known only as a table of values. The smoothness of splines, becauseof the requirements that each portion have the same first and second derivatives asits neighbor where they join, can give improved accuracy in some cases.

For the cubic spline that approximates f(x), we can write, for the interval xix xi+1,

f(x) =ai(x xi)3 +bi(x xi)2 +ci(x xi) +di,where the coefficients are determined as in section , are

ai = Si+1 Si6(xi+1 xi) ,

49


51/67

bi = Si

2,

ci = f(xi+1) f(xi)

(xi+1 xi) (xi+1 xi)Si+1+ 2(xi+1 xi)Si

6 ,

di = f(xi).

Approximating the integral of f(x) over the n intervals where f(x) is approxi-mated by the spline is straightforward: ni+1xi

f(x)dx =

ni+1xi

f(x)dx

=n

i=1

ai

4(x xi)4 + bi

3(x xi)3 + ci

2(x xi)2 +di(x xi)

xi+1

xi

=n

i=1

ai

4(xi+1 xi)4 + bi

3(xi+1 xi)3 + ci

2(xi+1 xi)2 +di(xi+1 xi)

.

If the intervals are all of the same size, (h= xi+1 xi), this equation becomes ni+1xi

f(x)dx = h4

4

ni=1

ai+h3

3

ni=1

bi+h2

2

ni=1

ci+hn

i=1

di.

Numerical Differentiation

For numerical differentiation we shall use Taylor-series for a function y = f(x) at(xi+h) expanded about xi

f(xi+h) =f(xi) +hf(xi) +

h2

2!f(xi) +

h3

3!f(xi) + (67)

whereh= x and f(xi) is the ordinate corresponding to xi and (xi+ h) is in theregion of convergence. The function at (xi h) is similarly given by

f(xi h) =f(xi) hf(xi) + h2

2!f(xi) h

3

3!f(xi) + (68)

Subtracting equation (68) from equation (67), we obtain

f(xi) = f(xi+h) f(xi h)

2h

h2

6f(xi) +

(69)

If we designate equally spaced points to the right of xi as xi+1, xi+2, and so on,and those to the left ofxi asxi1, xi2,and identify the corresponding ordinates asfi+1, fi+2, fi1, fi2,respectively, equation (69) can be written in the form

fi = fi+1 fi1

2h (70)

50


52/67

with error of order h2. Equation (70) is called the central-difference approximation

off

(x

) atx

i with errors, of orderh2

.If we add equations (67) and (68), we may write the second derivative as:

fi = fi+1 2fi+fi1

h2

1

12fivi h

2 +

(71)

The approximate expression using only the first term to the right of the equal signis the central-difference approximation of the second derivative of the functionat xi, with error of order h

2. As with the first derivative, the approximations witherror of higher order can also be derived.

To obtain an expression for the third derivative, we expand function at xi2 andat xi+2 such that

fi+2= fi+ 2hfi +

(2h)2

2! fi +

(2h)3

3! fi + (72)

fi2= fi 2hfi +(2h)2

2! fi

(2h)3

3! fi + (73)

With the help of equations (67), (68), (72), and (73) we derive

fi = fi+2 2fi+1+ 2fi1 fi2

2h3 (74)

Equation (74) gives the central-differenceapproximation for the third derivative

of the function f(x) at xi, with error of order h2

.Successively higher derivatives can be obtained by this method, but since theyrequire the solution of increasingly larger number of simultaneous equations, theprocess becomes quite tedious. The same technique may also be used to find moreaccurate expressions for the derivatives, by using additional terms in the Taylor-series expansion.

It can be noted that the central-difference expressions for the various derivativesinvolve values of the function on both sides of the x values at which the deriva-tive of the function is desired. Using Taylor-series expression we can also obtainexpressions for the derivatives which are entirely in terms of values of the functionat xi and points to the right ofxi. These are known as forward-finite-difference

expressions. In a similar manner, derivative expressions which are entirely in termsof values of the function at xi and points to the left ofxi can be found. These areknown as backward-finite-difference expressions. In numerical differentiation,forward-finite-difference expressions are used when data to the left of a point atwhich a derivative is desired are not available, and backward-difference expressionsare used when data to the right of desired point are not available. Central-differenceexpressions, however, are more accurate than either forward- or backward-differenceexpressions.

51

MTH 375 Handout

Documents

Transcript of MTH 375 Handout