03 mathematics

93

description

 

Transcript of 03 mathematics

Page 1: 03 mathematics
Page 2: 03 mathematics

Copyright © 2008, 1997, 1984, 1973, 1963, 1950, 1941, 1934 by The McGraw-Hill Companies, Inc. All rights reserved. Manufactured in the UnitedStates of America. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributedin any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher.

0-07-154210-8

The material in this eBook also appears in the print version of this title: 0-07-151126-1.

All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we usenames in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such desig-nations appear in this book, they have been printed with initial caps.

McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs.For more information, please contact George Hoare, Special Sales, at [email protected] or (212) 904-4069.

TERMS OF USE

This is a copyrighted work and The McGraw-Hill Companies, Inc. (“McGraw-Hill”) and its licensors reserve all rights in and to the work. Use of thiswork is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you maynot decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publishor sublicense the work or any part of it without McGraw-Hill’s prior consent. You may use the work for your own noncommercial and personal use;any other use of the work is strictly prohibited. Your right to use the work may be terminated if you fail to comply with these terms.

THE WORK IS PROVIDED “AS IS.” McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THEACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED FROM USING THE WORK, INCLUDING ANYINFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIMANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY ORFITNESS FOR A PARTICULAR PURPOSE. McGraw-Hill and its licensors do not warrant or guarantee that the functions contained in the work willmeet your requirements or that its operation will be uninterrupted or error free. Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inaccuracy, error or omission, regardless of cause, in the work or for any damages resulting therefrom. McGraw-Hill has noresponsibility for the content of any information accessed through the work. Under no circumstances shall McGraw-Hill and/or its licensors be liablefor any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inability to use the work, even if any ofthem has been advised of the possibility of such damages. This limitation of liability shall apply to any claim or cause whatsoever whether such claimor cause arises in contract, tort or otherwise.

DOI: 10.1036/0071511261

Page 3: 03 mathematics

This page intentionally left blank

Page 4: 03 mathematics

MATHEMATICSGeneral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3Miscellaneous Mathematical Constants. . . . . . . . . . . . . . . . . . . . . . . . . . 3-4The Real-Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4Algebraic Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5

MENSURATION FORMULASPlane Geometric Figures with Straight Boundaries . . . . . . . . . . . . . . . . 3-6Plane Geometric Figures with Curved Boundaries . . . . . . . . . . . . . . . . 3-6Solid Geometric Figures with Plane Boundaries . . . . . . . . . . . . . . . . . . 3-7Solids Bounded by Curved Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7Miscellaneous Formulas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8Irregular Areas and Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8

ELEMENTARY ALGEBRAOperations on Algebraic Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8The Binomial Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9Progressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9Permutations, Combinations, and Probability. . . . . . . . . . . . . . . . . . . . . 3-10Theory of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10

ANALYTIC GEOMETRYPlane Analytic Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11Solid Analytic Geometry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13

PLANE TRIGONOMETRYAngles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16Functions of Circular Trigonometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16Inverse Trigonometric Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17Relations between Angles and Sides of Triangles . . . . . . . . . . . . . . . . . . 3-17Hyperbolic Trigonometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18Approximations for Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . 3-18

DIFFERENTIAL AND INTEGRAL CALCULUSDifferential Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18Multivariable Calculus Applied to Thermodynamics . . . . . . . . . . . . . . . 3-21Integral Calculus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22

INFINITE SERIESDefinitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25Operations with Infinite Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25Tests for Convergence and Divergence. . . . . . . . . . . . . . . . . . . . . . . . . . 3-26Series Summation and Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26

COMPLEX VARIABLESAlgebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27Special Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27Trigonometric Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27Powers and Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27Elementary Complex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27Complex Functions (Analytic) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-28

DIFFERENTIAL EQUATIONSOrdinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29Ordinary Differential Equations of the First Order . . . . . . . . . . . . . . . . 3-30Ordinary Differential Equations of Higher Order . . . . . . . . . . . . . . . . . 3-30Special Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-31Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-32

DIFFERENCE EQUATIONSElements of the Calculus of Finite Differences . . . . . . . . . . . . . . . . . . . 3-34Difference Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-34

INTEGRAL EQUATIONSClassification of Integral Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-36Relation to Differential Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-36Methods of Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-37

INTEGRAL TRANSFORMS (OPERATIONAL METHODS)

Laplace Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-37Convolution Integral. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-39z-Transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-39Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-39Fourier Cosine Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-39

3-1

Section 3

Mathematics

Bruce A. Finlayson, Ph.D. Rehnberg Professor, Department of Chemical Engineering,University of Washington; Member, National Academy of Engineering (Section Editor, numeri-cal methods and all general material)

Lorenz T. Biegler, Ph.D. Bayer Professor of Chemical Engineering, Carnegie Mellon Uni-versity (Optimization)

Copyright © 2008, 1997, 1984, 1973, 1963, 1950, 1941, 1934 by The McGraw-Hill Companies, Inc. Click here for terms of use.

Page 5: 03 mathematics

MATRIX ALGEBRA AND MATRIX COMPUTATIONSMatrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-40Matrix Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-41

NUMERICAL APPROXIMATIONS TO SOME EXPRESSIONS

Approximation Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-43

NUMERICAL ANALYSIS AND APPROXIMATE METHODSIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-43Numerical Solution of Linear Equations. . . . . . . . . . . . . . . . . . . . . . . . . 3-44Numerical Solution of Nonlinear Equations in

One Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-44Methods for Multiple Nonlinear Equations . . . . . . . . . . . . . . . . . . . . . . 3-44Interpolation and Finite Differences. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-45Numerical Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-47Numerical Integration (Quadrature) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-47Numerical Solution of Ordinary Differential Equations as Initial Value Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-48

Ordinary Differential Equations-Boundary Value Problems . . . . . . . . . 3-51Numerical Solution of Integral Equations. . . . . . . . . . . . . . . . . . . . . . . . 3-54Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-54Numerical Solution of Partial Differential Equations. . . . . . . . . . . . . . . 3-54Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-59

OPTIMIZATIONIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-60Gradient-Based Nonlinear Programming . . . . . . . . . . . . . . . . . . . . . . . . 3-60Optimization Methods without Derivatives . . . . . . . . . . . . . . . . . . . . . . 3-65Global Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-66Mixed Integer Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-67Development of Optimization Models . . . . . . . . . . . . . . . . . . . . . . . . . . 3-70

STATISTICSIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-70Enumeration Data and Probability Distributions . . . . . . . . . . . . . . . . . . 3-72Measurement Data and Sampling Densities . . . . . . . . . . . . . . . . . . . . . . 3-73Tests of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-78Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-84Error Analysis of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-86Factorial Design of Experiments and Analysis of Variance . . . . . . . . . . 3-86

DIMENSIONAL ANALYSIS

PROCESS SIMULATIONClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-89Thermodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-89Process Modules or Blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-89Process Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-90Commercial Packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-90

3-2 MATHEMATICS

Page 6: 03 mathematics

GENERAL

The basic problems of the sciences and engineering fall broadly intothree categories:

1. Steady state problems. In such problems the configuration ofthe system is to be determined. This solution does not change withtime but continues indefinitely in the same pattern, hence the name“steady state.” Typical chemical engineering examples include steadytemperature distributions in heat conduction, equilibrium in chemicalreactions, and steady diffusion problems.

2. Eigenvalue problems. These are extensions of equilibriumproblems in which critical values of certain parameters are to bedetermined in addition to the corresponding steady-state configura-tions. The determination of eigenvalues may also arise in propagationproblems and stability problems. Typical chemical engineering prob-lems include those in heat transfer and resonance in which certainboundary conditions are prescribed.

3. Propagation problems. These problems are concerned withpredicting the subsequent behavior of a system from a knowledge ofthe initial state. For this reason they are often called the transient(time-varying) or unsteady-state phenomena. Chemical engineeringexamples include the transient state of chemical reactions (kinetics),the propagation of pressure waves in a fluid, transient behavior of anadsorption column, and the rate of approach to equilibrium of apacked distillation column.

The mathematical treatment of engineering problems involves fourbasic steps:

1. Formulation. The expression of the problem in mathematicallanguage. That translation is based on the appropriate physical lawsgoverning the process.

2. Solution. Appropriate mathematical and numerical operationsare accomplished so that logical deductions may be drawn from themathematical model.

3. Interpretation. Development of relations between the mathe-matical results and their meaning in the physical world.

4. Refinement. The recycling of the procedure to obtain betterpredictions as indicated by experimental checks.

Steps 1 and 2 are of primary interest here. The actual details are left tothe various subsections, and only general approaches will be discussed.

The formulation step may result in algebraic equations, differenceequations, differential equations, integral equations, or combinationsof these. In any event these mathematical models usually arise fromstatements of physical laws such as the laws of mass and energy con-servation in the form

Input of x – output of x production of x = accumulation of x

or

Rate of input of x rate of output of x rate of production of x= rate of accumulation of x

where x mass, energy, etc. These statements may be abbreviated bythe statement

Input − output + production = accumulation

Many general laws of the physical universe are expressible by dif-ferential equations. Specific phenomena are then singled out from theinfinity of solutions of these equations by assigning the individual ini-tial or boundary conditions which characterize the given problem. Forsteady state or boundary-value problems (Fig. 3-1) the solution must

satisfy the differential equation inside the region and the prescribedconditions on the boundary.

In mathematical language, the propagation problem is known as aninitial-value problem (Fig. 3-2). Schematically, the problem is charac-terized by a differential equation plus an open region in which theequation holds. The solution of the differential equation must satisfythe initial conditions plus any “side” boundary conditions.

The description of phenomena in a “continuous” medium such as agas or a fluid often leads to partial differential equations. In particular,phenomena of “wave” propagation are described by a class of partialdifferential equations called “hyperbolic,” and these are essentiallydifferent in their properties from other classes such as those thatdescribe equilibrium (“elliptic”) or diffusion and heat transfer (“para-bolic”). Prototypes are:

1. Elliptic. Laplace’s equation

+ = 0

Poisson’s equation

+ = g(x,y)

These do not contain the variable t (time) explicitly; accordingly, theirsolutions represent equilibrium configurations. Laplace’s equationcorresponds to a “natural” equilibrium, while Poisson’s equation cor-responds to an equilibrium under the influence of g(x, y). Steady heat-transfer and mass-transfer problems are elliptic.

2. Parabolic. The heat equation

= +

describes unsteady or propagation states of diffusion as well as heattransfer.

3. Hyperbolic. The wave equation

= +

describes wave propagation of all types when the assumption is madethat the wave amplitude is small and that interactions are linear.

∂2u∂y2

∂2u∂x2

∂2u∂t2

∂2u∂y2

∂2u∂x2

∂u∂t

∂2u∂y2

∂2u∂x2

∂2u∂y2

∂2u∂x2

MATHEMATICS

FIG. 3-1 Boundary conditions.

FIG. 3-2 Propagation problem.

GENERAL REFERENCES: Abramowitz, M., and I. A. Stegun, Handbook ofMathematical Functions, National Bureau of Standards, Washington, D.C.(1972); Finlayson, B.A., Nonlinear Analysis in Chemical Engineering,McGraw-Hill, New York (1980), Ravenna Park, Seattle (2003); Jeffrey, A.,Mathematics for Engineers and Scientists, Chapman & Hall/CRC, New York(2004); Jeffrey, A., Essentials of Engineering Mathematics, 2d ed., Chapman &

Hall/CRC, New York (2004); Weisstein, E. W., CRC Concise Encyclopedia ofMathematics, 2d ed., CRC Press, New York (2002); Wrede, R. C., and MurrayR. Spiegel, Schaum's Outline of Theory and Problems of Advanced Calculus, 2ded., McGraw-Hill, New York (2006); Zwillinger, D., CRC Standard Mathemat-ical Tables and Formulae, 1st ed., CRC Press, New York (2002); http://eqworld.ipmnet.ru/.

3-3

Page 7: 03 mathematics

The solution phase has been characterized in the past by a concen-tration on methods to obtain analytic solutions to the mathematicalequations. These efforts have been most fruitful in the area of the lin-ear equations such as those just given. However, many natural phe-nomena are nonlinear. While there are a few nonlinear problems thatcan be solved analytically, most cannot. In those cases, numericalmethods are used. Due to the widespread availability of software forcomputers, the engineer has quite good tools available.

Numerical methods almost never fail to provide an answer to anyparticular situation, but they can never furnish a general solution ofany problem.

The mathematical details outlined here include both analytic andnumerical techniques useful in obtaining solutions to problems.

Our discussion to this point has been confined to those areas in whichthe governing laws are well known. However, in many areas, informa-tion on the governing laws is lacking and statistical methods are reused.Broadly speaking, statistical methods may be of use whenever conclu-sions are to be drawn or decisions made on the basis of experimentalevidence. Since statistics could be defined as the technology of the sci-entific method, it is primarily concerned with the first two aspects of themethod, namely, the performance of experiments and the drawing ofconclusions from experiments. Traditionally the field is divided into twoareas:

1. Design of experiments. When conclusions are to be drawn ordecisions made on the basis of experimental evidence, statistical tech-niques are most useful when experimental data are subject to errors.The design of experiments may then often be carried out in such afashion as to avoid some of the sources of experimental error andmake the necessary allowances for that portion which is unavoidable.Second, the results can be presented in terms of probability state-ments which express the reliability of the results. Third, a statisticalapproach frequently forces a more thorough evaluation of the experi-mental aims and leads to a more definitive experiment than wouldotherwise have been performed.

2. Statistical inference. The broad problem of statistical infer-ence is to provide measures of the uncertainty of conclusions drawnfrom experimental data. This area uses the theory of probability,enabling scientists to assess the reliability of their conclusions in termsof probability statements.

Both of these areas, the mathematical and the statistical, are inti-mately intertwined when applied to any given situation. The methodsof one are often combined with the other. And both in order to be suc-cessfully used must result in the numerical answer to a problem—thatis, they constitute the means to an end. Increasingly the numericalanswer is being obtained from the mathematics with the aid of com-puters. The mathematical notation is given in Table 3-1.

MISCELLANEOUS MATHEMATICAL CONSTANTS

Numerical values of the constants that follow are approximate to thenumber of significant digits given.

π = 3.1415926536 Pie = 2.7182818285 Napierian (natural) logarithm baseγ = 0.5772156649 Euler’s constant

ln π = 1.1447298858 Napierian (natural) logarithm of pi, base elog π = 0.4971498727 Briggsian (common logarithm of pi, base 10

Radian = 57.2957795131°Degree = 0.0174532925 radMinute = 0.0002908882 radSecond = 0.0000048481 rad

γ = limn→∞

n

m = 1

− ln n = 0.577215

THE REAL-NUMBER SYSTEM

The natural numbers, or counting numbers, are the positive integers:1, 2, 3, 4, 5, . . . . The negative integers are −1, −2, −3, . . . .

A number in the form a/b, where a and b are integers, b ≠ 0, is arational number. A real number that cannot be written as the quotientof two integers is called an irrational number, e.g., 2, 3, 5, π,e, 3 2.

1m

3-4 MATHEMATICS

TABLE 3-1 Mathematical Signs, Symbols, and Abbreviations

(") plus or minus (minus or plus): divided by, ratio sign

proportional sign< less than

not less than> greater than

not greater than approximately equals, congruent∼ similar to

equivalent to≠ not equal to

approaches, is approximately equal to∝ varies as∞ infinity∴ therefore

square root

3 cube root

n nth root angle⊥ perpendicular to parallel to

|x| numerical value of xlog or log10 common logarithm or Briggsian logarithm

loge or ln natural logarithm or hyperbolic logarithm or Naperianlogarithm

e base (2.718) of natural system of logarithmsa° an angle a degrees

a′ a prime, an angle a minutesa″ a double prime, an angle a seconds, a secondsin sinecos cosinetan tangent

ctn or cot cotangentsec secantcsc cosecant

vers versed sinecovers coversed sineexsec exsecantsin−1 anti sine or angle whose sine issinh hyperbolic sinecosh hyperbolic cosinetanh hyperbolic tangent

sinh−1 anti hyperbolic sine or angle whose hyperbolic sine isf(x) or φ(x) function of x

∆x increment of x summation ofdx differential of x

dy/dx or y′ derivative of y with respect to xd2y/dx2 or y″ second derivative of y with respect to x

dny/dxn nth derivative of y with respect to x∂y/∂x partial derivative of y with respect to x

∂ny/∂xn nth partial derivative of y with respect to x

nth partial derivative with respect to x and y

integral of

b

aintegral between the limits a and b

y first derivative of y with respect to timey second derivative of y with respect to time

∆ or ∇2 the “Laplacian”

+ + δ sign of a variation sign for integration around a closed path

∂2

∂z2

∂2

∂y2

∂2

∂x2

∂nz∂x∂y

Page 8: 03 mathematics

There is a one-to-one correspondence between the set of real num-bers and the set of points on an infinite line (coordinate line).

Order among Real Numbers; Inequalitiesa > b means that a − b is a positive real number.If a < b and b < c, then a < c.If a < b, then a c < b c for any real number c.If a < b and c > 0, then ac < bc.If a < b and c < 0, then ac > bc.If a < b and c < d, then a + c < b + d.If 0 < a < b and 0 < c < d, then ac < bd.If a < b and ab > 0, then 1/a > 1/b.If a < b and ab < 0, then 1/a < 1/b.

Absolute Value For any real number x, |x| = x if x ≥ 0−x if x < 0

PropertiesIf |x| = a, where a > 0, then x = a or x = −a.|x| = |−x|; −|x| ≤ x ≤ |x|; |xy| = |x| |y|.If |x| < c, then −c < x < c, where c > 0.||x| − |y|| ≤ |x + y| ≤ |x| + |y|.x2 = |x|.

Proportions If = , then = , = ,

= .

Indeterminants

Form Example

(∞)(0) xe−x x → ∞00 xx x → 0+

∞0 (tan x)cos x x → a π−

1∞ (1 + x)1/x x → 0+

∞ − ∞ x+ 1 − x− 1 x → ∞

x → 0

x → ∞

Integral Exponents (Powers and Roots) If m and n are posi-tive integers and a, b are numbers or functions, then the followingproperties hold:

a−n = 1/an a ≠ 0

(ab)n = anbn

(an)m = anm, anam = an + m

n

a = a1/n if a > 0

m

na = mn

a, a > 0

am/n = (am)1/n = n

am, a > 0

a0 = 1 (a ≠ 0)

0a = 0 (a ≠ 0)

Logarithms log ab = log a + log b, a > 0, b > 0log an = n log a

log (a/b) = log a − log blog

na = (1/n) log a

The common logarithm (base 10) is denoted log a or log10 a. The nat-ural logarithm (base e) is denoted ln a (or in some texts loge a). If thetext is ambiguous (perhaps using log x for ln x), test the formula byevaluating it.

Roots If a is a real number, n is a positive integer, then x is calledthe nth root of a if xn = a. The number of nth roots is n, but not all ofthem are necessarily real. The principal nth root means the following:(1) if a > 0 the principal nth root is the unique positive root, (2) if

ex

x

∞∞

sin x

x00

c − dc + d

a − ba + b

c − d

da − b

bc + d

da + b

bcd

ab

a < 0, and n odd, it is the unique negative root, and (3) if a < 0 and neven, it is any of the complex roots. In cases (1) and (2), the root canbe found on a calculator by taking y = ln a/n and then x = ey. In case(3), see the section on complex variables.

ALGEBRAIC INEQUALITIES

Arithmetic-Geometric Inequality Let An and Gn denote respec-tively the arithmetic and the geometric means of a set of positive num-bers a1, a2, . . . , an. The An ≥ Gn, i.e.,

≥ (a1a2 ⋅ ⋅ ⋅ an)1/n

The equality holds only if all of the numbers ai are equal.Carleman’s Inequality The arithmetic and geometric means

just defined satisfy the inequality

n

r = 1

(a1a2 ⋅ ⋅ ⋅ ar)1/r ≤ neAn

where e is the best possible constant in this inequality.Cauchy-Schwarz Inequality Let a = (a1, a2, . . . , an), b = (b1,

b2, . . . , bn), where the ai’s and bi’s are real or complex numbers. Then

n

k = 1

akbk2

≤ n

k = 1

|ak|2n

k = 1

|bk|2The equality holds if, and only if, the vectors a, b are linearly depen-dent (i.e., one vector is scalar times the other vector).

Minkowski’s Inequality Let a1, a2, . . . , an and b1, b2, . . . , bn

be any two sets of complex numbers. Then for any real numberp > 1,

n

k = 1

|ak + bk|p1/p

≤ n

k = 1

|ak|p1/p

+ n

k = 1

|bk|p1/p

Hölder’s Inequality Let a1, a2, . . . , an and b1, b2, . . . , bn be anytwo sets of complex numbers, and let p and q be positive numberswith 1/p + 1/q = 1. Then

n

k = 1

akbk ≤ n

k = 1

|ak|p1/p

n

k = 1

|bk|q1/q

The equality holds if, and only if, the sequences |a1|p, |a2|p, . . . , |an|pand |b1|q, |b2|q, . . . , |bn|q are proportional and the argument (angle) ofthe complex numbers akbk is independent of k. This last condition is ofcourse automatically satisfied if a1, . . . , an and b1, . . . , bn are positivenumbers.

Lagrange’s Inequality Let a1, a2, . . . , an and b1, b2, . . . , bn bereal numbers. Then

n

k = 1

akbk2

= n

k = 1

ak2

n

k = 1

bk2 −

1 ≤ k ≤ j ≤ n

(akbj − ajbk)2

Example Two chemical engineers, John and Mary, purchase stock in thesame company at times t1, t2, . . . , tn, when the price per share is respectively p1,p2, . . . , pn. Their methods of investment are different, however: John purchasesx shares each time, whereas Mary invests P dollars each time (fractional sharescan be purchased). Who is doing better?

While one can argue intuitively that the average cost per share for Mary doesnot exceed that for John, we illustrate a mathematical proof using inequalities.The average cost per share for John is equal to

= = n

i = 1

pi

The average cost per share for Mary is

=n

n

i = 1

p1

i

nP

n

i = 1

pP

i

1n

x n

i = 1

pi

nx

Total money investedNumber of shares purchased

a1 + a2 + ⋅ ⋅ ⋅ + an

n

MATHEMATICS 3-5

Page 9: 03 mathematics

Thus the average cost per share for John is the arithmetic mean of p1, p2, . . . , pn,whereas that for Mary is the harmonic mean of these n numbers. Since the har-monic mean is less than or equal to the arithmetic mean for any set of positivenumbers and the two means are equal only if p1 = p2 = ⋅⋅⋅ = pn, we conclude thatthe average cost per share for Mary is less than that for John if two of the pricespi are distinct. One can also give a proof based on the Cauchy-Schwarz inequal-ity. To this end, define the vectors

a = (p1−1/2, p2

−1/2, . . . , pn−1/2) b = (p1

1/2, p21/2, . . . , pn

1/2)

Then a ⋅ b = 1 + ⋅⋅⋅ + 1 = n, and so by the Cauchy-Schwarz inequality

(a ⋅ b)2 = n2 ≤ n

i = 1

n

i = 1

pi

with the equality holding only if p1 = p2 = ⋅⋅⋅ = pn. Therefore

n

i = 1

pi

n

n

n

i = 1

p1

i

1pi

3-6 MATHEMATICS

FIG. 3-3 Parallelogram. FIG. 3-4 Regular polygon. FIG. 3-5 Circle.

MENSURATION FORMULAS

REFERENCES: Liu, J., Mathematical Handbook of Formulas and Tables,McGraw-Hill, New York (1999); http://mathworld.wolfram.com/SphericalSector.html, etc.

Let A denote areas and V volumes in the following.

PLANE GEOMETRIC FIGURES WITHSTRAIGHT BOUNDARIES

Triangles (see also “Plane Trigonometry”) A = abh where b =base, h = altitude.

Rectangle A = ab where a and b are the lengths of the sides.Parallelogram (opposite sides parallel) A = ah = ab sin α where

a, b are the lengths of the sides, h the height, and α the angle betweenthe sides. See Fig. 3-3.

Rhombus (equilateral parallelogram) A = aab where a, b are thelengths of the diagonals.

Trapezoid (four sides, two parallel) A = a(a + b)h where thelengths of the parallel sides are a and b, and h = height.

Quadrilateral (four-sided) A = aab sin θ where a, b are thelengths of the diagonals and the acute angle between them is θ.

Regular Polygon of n Sides See Fig. 3-4.

A = nl 2 cot where l = length of each side

R = csc where R is the radius of the circumscribed circle

r = cot where r is the radius of the inscribed circle

Radius r of Circle Inscribed in Triangle with Sides a, b, c

r = where s = a(a + b + c)

Radius R of Circumscribed Circle

R =abc

4s(s− a)(s− b)(s− c)

(s − a)(s − b)(s − c)

s

180°

nl

2

180°

nl

2

180°

n14

Area of Regular Polygon of n Sides Inscribed in a Circle ofRadius r

A = (nr 2/2) sin (360°/n)

Perimeter of Inscribed Regular Polygon

P = 2nr sin (180°/n)

Area of Regular Polygon Circumscribed about a Circle ofRadius r

A = nr 2 tan (180°/n)

Perimeter of Circumscribed Regular Polygon

P = 2nr tan

PLANE GEOMETRIC FIGURES WITH CURVED BOUNDARIES

Circle (Fig. 3-5) LetC = circumferencer = radius

D = diameterA = areaS = arc length subtended by θl = chord length subtended by θ

H = maximum rise of arc above chord, r − H = dθ = central angle (rad) subtended by arc SC = 2πr = πD (π = 3.14159 . . .)S = rθ = aDθl = 2r2 − d2 = 2r sin (θ/2) = 2d tan (θ/2)

d = 4r2 − l2 = l cot

θ = = 2 cos−1 = 2 sin−1 lD

dr

Sr

θ2

12

12

180°

n

Page 10: 03 mathematics

A (circle) = πr2 = dπD2

A (sector) = arS = ar 2θA (segment) = A (sector) − A (triangle) = ar 2(θ − sin θ)

Ring (area between two circles of radii r1 and r2 ) The circles neednot be concentric, but one of the circles must enclose the other.

A = π(r1 + r2)(r1 − r2) r1 > r2

Ellipse (Fig. 3-6) Let the semiaxes of the ellipse be a and b

A = πabC = 4aE(e)

where e2 = 1 − b2/a2 and E(e) is the complete elliptic integral of thesecond kind,

E(e) = 1 − 2

e2 + ⋅ ⋅ ⋅[an approximation for the circumference C = 2π (a2+ b2)/2].

Parabola (Fig. 3-7)

Length of arc EFG = 4x2 + y2 + ln

Area of section EFG = xy

Catenary (the curve formed by a cord of uniform weight sus-pended freely between two points A, B; Fig. 3-8)

y = a cosh (x/a)

Length of arc between points A and B is equal to 2a sinh (L/a). Sag ofthe cord is D = a cosh (L/a) − a.

SOLID GEOMETRIC FIGURES WITH PLANE BOUNDARIES

Cube Volume = a3; total surface area = 6a2; diagonal = a3,where a = length of one side of the cube.

Rectangular Parallelepiped Volume = abc; surface area =2(ab + ac + bc); diagonal = a2 + b2+ c2, where a, b, c are the lengthsof the sides.

Prism Volume = (area of base) × (altitude); lateral surface area =(perimeter of right section) × (lateral edge).

Pyramid Volume = s (area of base) × (altitude); lateral area ofregular pyramid = a (perimeter of base) × (slant height) = a (numberof sides) (length of one side) (slant height).

43

2x + 4x2 + y2

yy2

2x

12

π2

Frustum of Pyramid (formed from the pyramid by cutting offthe top with a plane

V = s (A1 + A2 + A1⋅A2)hwhere h = altitude and A1, A2 are the areas of the base; lateral area ofa regular figure = a (sum of the perimeters of base) × (slant height).

Volume and Surface Area of Regular Polyhedra with Edge l

Type of surface Name Volume Surface area

4 equilateral triangles Tetrahedron 0.1179 l3 1.7321 l2

6 squares Hexahedron (cube) 1.0000 l3 6.0000 l2

8 equilateral triangles Octahedron 0.4714 l3 3.4641 l2

12 pentagons Dodecahedron 7.6631 l3 20.6458 l2

20 equilateral triangles Icosahedron 2.1817 l3 8.6603 l2

SOLIDS BOUNDED BY CURVED SURFACES

Cylinders (Fig. 3-9) V = (area of base) × (altitude); lateral surfacearea = (perimeter of right section) × (lateral edge).

Right Circular Cylinder V = π (radius)2 × (altitude); lateral sur-face area = 2π (radius) × (altitude).

Truncated Right Circular Cylinder

V = πr 2h; lateral area = 2πrh

h = a (h1 + h2)

Hollow Cylinders Volume = πh(R2 − r 2), where r and R are theinternal and external radii and h is the height of the cylinder.

Sphere (Fig. 3-10)

V (sphere) = 4⁄3πR3, jπD3

V (spherical sector) = wπR2hi = 2 (open spherical sector), i 1(spherical cone)

V (spherical segment of one base) = jπh1(3r 22 + h1

2)

V (spherical segment of two bases) = jπh2(3r 12 + 3r 2

2 + h22)

A (sphere) = 4πR2 = πD2

A (zone) = 2πRh = πDh

A (lune on the surface included between two great circles, the incli-nation of which is θ radians) = 2R2θ.

Cone V = s (area of base) × (altitude).Right Circular Cone V = (π/3) r 2h, where h is the altitude and r

is the radius of the base; curved surface area = πr r2 + h2, curved sur-face of the frustum of a right cone = π(r1 + r2) h2+ (r1 − r2)2, wherer1, r2 are the radii of the base and top, respectively, and h is the alti-tude; volume of the frustum of a right cone = π(h/3)(r 1

2 + r1r2 + r 22) =

h/3(A1 + A2 + A1A2), where A1 = area of base and A2 = area of top.Ellipsoid V = (4 ⁄3)πabc, where a, b, c are the lengths of the semi-

axes.Torus (obtained by rotating a circle of radius r about a line whose

distance is R > r from the center of the circle)

V = 2π2Rr 2 Surface area = 4π2Rr

MENSURATION FORMULAS 3-7

FIG. 3-6 Ellipse.

FIG. 3-7 Parabola.

FIG. 3-8 Catenary. FIG. 3-9 Cylinder. FIG. 3-10 Sphere.

Page 11: 03 mathematics

Prolate Spheroid (formed by rotating an ellipse about its majoraxis [2a])

Surface area = 2πb2 + 2π(ab/e) sin−1 e V = 4 ⁄3πab2

where a, b are the major and minor axes and e = eccentricity (e < 1).Oblate Spheroid (formed by the rotation of an ellipse about its

minor axis [2b]) Data as given previously.

Surface area = 2πa2 + π ln V = 4 ⁄3πa2b

For process vessels, the formulas reduce to the following:Hemisphere

V = 1#

2 D3, A =

#

2 D2

For a hemisphere (concave up) partially filled to a depth h1, use theformulas for spherical segment with one base, which simplify to

V = #h12(Rh1/3) = #h1

2(D/2 − h1/3)

A = 2#Rh1 #Dh1

For a hemisphere (concave down) partially filled from the bottom, usethe formulas for a spherical segment of two bases, one of which is aplane through the center, where h = distance from the center plane tothe surface of the partially filled hemisphere.

V = #h(R2h2/3) = #h(D2/4 − h2/3)

A = 2#Rh = #Dh

Cone For a cone partially filled, use the same formulas as forright circular cones, but use r and h for the region filled.

Ellipsoid If the base of a vessel is one-half of an oblate spheroid(the cross section fitting to a cylinder is a circle with radius of D/2 andthe minor axis is smaller), then use the formulas for one-half of anoblate spheroid.

V 0.1745D3, S 1.236D2, minor axis D/3

V 0.1309D3, S 1.084D2, minor axis D/4

MISCELLANEOUS FORMULAS

See also “Differential and Integral Calculus.”Volume of a Solid Revolution (the solid generated by rotating

a plane area about the x axis)

V = π b

a

[ f(x)]2 dx

where y = f(x) is the equation of the plane curve and a ≤ x ≤ b.

1 + e1 − e

b2

e

Area of a Surface of Revolution

S = 2π b

a

y ds

where ds = 1 + (dy/dx)2 dx and y = f(x) is the equation of the planecurve rotated about the x axis to generate the surface.

Area Bounded by f(x), the x Axis, and the Lines x = a, x = b

A = b

a

f(x) dx [ f(x) ≥ 0]

Length of Arc of a Plane CurveIf y = f(x),

Length of arc s = b

a1 +

2 dx

If x = g(y),

Length of arc s = d

c1 +

2 dy

If x = f(t), y = g(t),

Length of arc s = t1

t0

2 + 2 dt

In general, (ds)2 = (dx)2 + (dy)2.

IRREGULAR AREAS AND VOLUMES

Irregular Areas Let y0, y1, . . . , yn be the lengths of a series ofequally spaced parallel chords and h be their distance apart (Fig. 3-11).The area of the figure is given approximately by any of the following:

AT = (h/2)[(y0 + yn) + 2(y1 + y2 + ⋅ ⋅ ⋅ + yn − 1)] (trapezoidal rule)

As = (h/3)[(y0 + yn) + 4(y1 + y3 + y5 + ⋅ ⋅ ⋅ + yn − 1)

+ 2(y2 + y4 + ⋅ ⋅ ⋅ + yn − 2)] (n even, Simpson’s rule)

The greater the value of n, the greater the accuracy of approximation.Irregular Volumes To find the volume, replace the y’s by cross-

sectional areas Aj and use the results in the preceding equations.

dydt

dxdt

dxdy

dydx

3-8 MATHEMATICS

FIG. 3-11 Irregular area.

ELEMENTARY ALGEBRA

REFERENCES: Stillwell, J. C., Elements of Algebra, CRC Press, New York(1994); Rich, B., and P. Schmidt, Schaum's Outline of Elementary Algebra,McGraw-Hill, New York (2004).

OPERATIONS ON ALGEBRAIC EXPRESSIONS

An algebraic expression will here be denoted as a combination of let-ters and numbers such as

3ax − 3xy + 7x2 + 7x 3/ 2 − 2.8xy

Addition and Subtraction Only like terms can be added or sub-tracted in two algebraic expressions.

Example (3x + 4xy − x2) + (3x2 + 2x − 8xy) = 5x − 4xy + 2x2.

Example (2x + 3xy − 4x1/2) + (3x + 6x − 8xy) = 2x + 3x + 6x − 5xy − 4x1/2.

Multiplication Multiplication of algebraic expressions is term byterm, and corresponding terms are combined.

Example (2x + 3y − 2xy)(3 + 3y) = 6x + 9y + 9y2 − 6xy2.

Division This operation is analogous to that in arithmetic.

Example Divide 3e2x + ex + 1 by ex + 1.

Page 12: 03 mathematics

Dividend

Divisor ex + 1 | 3e2x + ex + 1 3ex − 2 quotient3e2x + 3ex

−2ex + 1−2ex − 2

+ 3 (remainder)

Therefore, 3e2x + ex + 1 = (ex + 1)(3ex − 2) + 3.

Operations with Zero All numerical computations (except divi-sion) can be done with zero: a + 0 = 0 + a = a; a − 0 = a; 0 − a = −a;(a)(0) = 0; a0 = 1 if a ≠ 0; 0/a = 0, a ≠ 0. a/0 and 0/0 have no meaning.

Fractional Operations

− = − = = ; = ; = , if a ≠ 0.

= x

yz

; = ; = =Factoring That process of analysis consisting of reducing a given

expression into the product of two or more simpler expressions calledfactors. Some of the more common expressions are factored here:

(1) (x2 − y2) = (x − y)(x + y)

(2) x2 + 2xy + y2 = (x + y)2

(3) x3 − y3 = (x − y)(x2 + xy + y2)

(4) (x3 + y3) = (x + y)(x2 − xy + y2)

(5) (x4 − y4) = (x − y)(x + y)(x2 + y2)

(6) x5 + y5 = (x + y)(x4 − x3y + x2y2 − xy3 + y4)

(7) xn − yn = (x − y)(xn − 1 + xn − 2y + xn − 3y2 + ⋅ ⋅ ⋅ + yn − 1)

Laws of Exponents(an)m = anm; an + m = an ⋅ am; an/m = (an)1/m; an − m = an/am; a1/m = ma;

a1/2 = a; x2 = |x| (absolute value of x). For x > 0, y > 0, xy = xy; for x > 0

nxm = xm/n; n 1/x = 1/

nx

THE BINOMIAL THEOREM

If n is a positive integer,

(a + b)n = an + nan − 1b + an − 2 b2

+ an − 3b3 + ⋅ ⋅ ⋅ + bn = n

j = 0 an − jbj

where = = number of combinations of n things taken j at

a time. n! = 1 ⋅ 2 ⋅ 3 ⋅ 4 ⋅ ⋅ ⋅ n, 0! = 1.

Example Find the sixth term of (x + 2y)12. The sixth term is obtained bysetting j = 5. It is

x12 − 5(2y)5 = 792x7(2y)5

Example 14

j = 0 = (1 + 1)14 = 214.

If n is not a positive integer, the sum formula no longer applies andan infinite series results for (a + b)n. The coefficients are obtainedfrom the first formulas in this case.

Example (1 + x)1/2 = 1 + ax − a ⋅ dx2 + a ⋅ d ⋅ 3⁄6 x3 ⋅ ⋅ ⋅ (convergent for x2 < 1).

Additional discussion is under “Infinite Series.”

nj

125

n!j!(n − j)!

nj

nj

n(n − 1)(n − 2)

3!

n(n − 1)

2!

xtyz

tz

xy

x/yz/t

xzyt

zt

xy

zy

xy

axay

xy

−x−y

xy

−xy

x−y

−x−y

xy

PROGRESSIONS

An arithmetic progression is a succession of terms such that eachterm, except the first, is derivable from the preceding by the additionof a quantity d called the common difference. All arithmetic progres-sions have the form a, a + d, a + 2d, a + 3d, . . . . With a = first term, l = last term, d = common difference, n = number of terms, and s =sum of the terms, the following relations hold:

l = a + (n − 1)d = + d

s = [2a + (n − 1)d] = (a + l) = [2l − (n − 1)d]

a = l − (n − 1)d = − = − l

d = = =

n = + 1 =

The arithmetic mean or average of two numbers a, b is (a + b)/2;of n numbers a1, . . . , an is (a1 + a2 + ⋅ ⋅ ⋅ + an)/n.

A geometric progression is a succession of terms such that eachterm, except the first, is derivable from the preceding by the multipli-cation of a quantity r called the common ratio. All such progressionshave the form a, ar, ar 2, . . . , arn − 1. With a = first term, l = last term, r = ratio, n = number of terms, s = sum of the terms, the following rela-tions hold:

l = ar n − 1 = =

s = = = =

a = = , r = , log r =

n = + 1 =

The geometric mean of two nonnegative numbers a, b is ab; of nnumbers is (a1a2 . . . an)1/n. The geometric mean of a set of positivenumbers is less than or equal to the arithmetic mean.

Example Find the sum of 1 + a + d + ⋅ ⋅ ⋅ + 1⁄64. Here a = 1, r = a, n = 7.Thus

s = = 127/64

s = a + ar + ar 2 + ⋅ ⋅ ⋅ + arn − 1 = −

If |r| < 1, then limn→∞

s =

which is called the sum of the infinite geometric progression.

Example The present worth (PW) of a series of cash flows Ck at the endof year k is

PW = n

k = 1

where i is an assumed interest rate. (Thus the present worth always requiresspecification of an interest rate.) If all the payments are the same, Ck = R, thepresent worth is

PW = R n

k = 1

This can be rewritten as

PW = n

k = 1

= n − 1

j = 0

1(1 + i) j

R1 + i

1(1 + i)k − 1

R1 + i

1(1 + i)k

Ck(1 + i)k

a1 − r

ar n

1 − r

a1 − r

a(1⁄64) − 1

a − 1

log[a + (r − 1)s] − log a

log rlog l − log a

log r

log l − log a

n − 1s − as − l

(r − 1)srn − 1

lrn − l

lr n − lrn − r n − 1

rl − ar − 1

a(1 − rn)

1 − ra(r n − 1)

r − 1

(r − 1)srn − 1

rn − 1

[a + (r − 1)s]

r

2sl + a

l − a

d

2(nl − s)n(n − 1)

2(s − an)n(n − 1)

l − an − 1

2sn

(n − 1)d

2s

n

n2

n2

n2

(n − 1)

2s

n

ELEMENTARY ALGEBRA 3-9

Page 13: 03 mathematics

This is a geometric series with r = 1/(1 + i) and a = R/(1 + i). The formulas abovegive

PW (=s) =

The same formula applies to the value of an annuity (PW) now, to provide forequal payments R at the end of each of n years, with interest rate i.

A progression of the form a, (a + d )r, (a + 2d)r 2, (a + 3d)r 3, etc., isa combined arithmetic and geometric progression. The sum of n suchterms is

s = +

If |r| < 1, limn→∞

s = + rd/(1 − r)2.

The non-zero numbers a, b, c, etc., form a harmonic progression iftheir reciprocals 1/a, 1/b, 1/c, etc., form an arithmetic progression.

Example The progression 1, s, 1⁄5, 1⁄7, . . . , 1⁄31 is harmonic since 1, 3, 5,7, . . . , 31 form an arithmetic progression.

The harmonic mean of two numbers a, b is 2ab/(a + b).

PERMUTATIONS, COMBINATIONS, AND PROBABILITY

Each separate arrangement of all or a part of a set of things is called apermutation. The number of permutations of n things taken r at atime, written

P(n, r) = = n(n − 1)(n − 2) ⋅⋅⋅ (n − r + 1)

Each separate selection of objects that is possible irrespective of theorder in which they are arranged is called a combination. The numberof combinations of n things taken r at a time, written C(n, r) = n!/[r!(n − r)!].

An important relation is r! C(n, r) = P(n, r).If an event can occur in p ways and fail to occur in q ways, all ways

being equally likely, the probability of its occurrence is p/(p + q), andthat of its failure q/(p + q).

Example Two dice may be thrown in 36 separate ways. What is the prob-ability of throwing such that their sum is 7? Seven may arise in 6 ways: 1 and 6,2 and 5, 3 and 4, 4 and 3, 5 and 2, 6 and 1. The probability of shooting 7 is j.

THEORY OF EQUATIONS

Linear Equations A linear equation is one of the first degree(i.e., only the first powers of the variables are involved), and theprocess of obtaining definite values for the unknown is called solvingthe equation. Every linear equation in one variable is written Ax + B =0 or x = −B/A. Linear equations in n variables have the form

a11x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n xn = b1

a21x1 + a22 x2 + ⋅ ⋅ ⋅ + a2n xn = b2

am1 x1 + am2 x2 + ⋅ ⋅ ⋅ + amnxn = bm

The solution of the system may then be found by elimination or matrixmethods if a solution exists (see “Matrix Algebra and Matrix Compu-tations”).

Quadratic Equations Every quadratic equation in one variableis expressible in the form ax 2 + bx + c = 0. a ≠ 0. This equation has twosolutions, say, x1, x2, given by

=If a, b, c are real, the discriminant b2 − 4ac gives the character of theroots. If b2 − 4ac > 0, the roots are real and unequal. If b2 − 4ac < 0, theroots are complex conjugates. If b2 − 4ac = 0 the roots are real andequal. Two quadratic equations in two variables can in general besolved only by numerical methods (see “Numerical Analysis andApproximate Methods”).

−b b2− 4ac

2ax1

x2

n!(n − r)!

a1 − r

rd(1 − r n − 1)

(1 − r)2

a − [a + (n − 1)d]rn

1 − r

(1 + i)n − 1

(1 + i)n

Ri

Cubic Equations A cubic equation, in one variable, has the formx3 + bx2 + cx + d = 0. Every cubic equation having complex coefficientshas three complex roots. If the coefficients are real numbers, then atleast one of the roots must be real. The cubic equation x3 + bx2 + cx +d = 0 may be reduced by the substitution x = y − (b/3) to the form y3 +py + q = 0, where p = s(3c − b2), q = 1⁄27(27d − 9bc + 2b3). This equa-tion has the solutions y1 = A + B, y2 = −a(A + B) + (i3/2)(A − B),y3 = −a(A + B) − (i3/2)(A − B), where i2 = −1, A = 3

−q/2 + R,B = 3

−q/2 − R, and R = (p/3)3 + (q/2)2. If b, c, d are all real and ifR > 0, there are one real root and two conjugate complex roots; if R =0, there are three real roots, of which at least two are equal; if R < 0,there are three real unequal roots. If R < 0, these formulas are imprac-tical. In this case, the roots are given by yk = " 2 −p/3 cos [(φ/3) +120k], k = 0, 1, 2 where

φ = cos−1 and the upper sign applies if q > 0, the lower if q < 0.

Example y3 − 7y + 7 = 0. p = −7, q = 7, R < 0. Hence

yk = − cos + 120kwhere φ = cos−1 , = 3.6311315 rad = 3°37′52″

The roots are approximately −3.048917, 1.692021, and 1.356896.

Example Many equations of state involve solving cubic equations for thecompressibility factor Z. For example, the Redlich-Kwong-Soave equation ofstate requires solving

Z3 − Z2 + cZ + d = 0, d < 0

where c and d depend on critical constants of the chemical species. In this case,only positive solutions, Z > 0, are desired.

Quartic Equations See Abramowitz and Stegun (1972, p. 17).General Polynomials of the nth Degree Denote the general

polynomial equation of degree n by

P(x) = a0xn + a1 xn − 1 + ⋅ ⋅ ⋅ + an − 1x + an = 0

If n > 4, there is no formula which gives the roots of the general equa-tion. For fourth and higher order (even third order), the roots can befound numerically (see “Numerical Analysis and Approximate Meth-ods”). However, there are some general theorems that may prove useful.

Remainder Theorems When P(x) is a polynomial and P(x) isdivided by x − a until a remainder independent of x is obtained, thisremainder is equal to P(a).

Example P(x) = 2x4 − 3x2 + 7x − 2 when divided by x + 1 (here a = −1)results in P(x) = (x + 1)(2x3 − 2x2 − x + 8) − 10 where −10 is the remainder. It iseasy to see that P(−1) = −10.

Factor Theorem If P(a) is zero, the polynomial P(x) has the fac-tor x − a. In other words, if a is a root of P(x) = 0, then x − a is a factorof P(x).

If a number a is found to be a root of P(x) = 0, the division of P(x) by(x − a) leaves a polynomial of degree one less than that of the originalequation, i.e., P(x) = Q(x)(x − a). Roots of Q(x) = 0 are clearly roots ofP(x) = 0.

Example P(x) = x3 − 6x2 + 11x − 6 = 0 has the root + 3. Then P(x) =(x − 3)(x2 − 3x + 2). The roots of x2 − 3x + 2 = 0 are 1 and 2. The roots of P(x) aretherefore 1, 2, 3.

Fundamental Theorem of Algebra Every polynomial of degreen has exactly n real or complex roots, counting multiplicities.

Every polynomial equation a0 xn + a1xn − 1 + ⋅⋅⋅ + an = 0 with rationalcoefficients may be rewritten as a polynomial, of the same degree, withintegral coefficients by multiplying each coefficient by the least commonmultiple of the denominators of the coefficients.

Example The coefficients of 3⁄2 x4 + 7⁄3 x3 − 5⁄6 x2 + 2x − j = 0 are rationalnumbers. The least common multiple of the denominators is 2 × 3 = 6. There-fore, the equation is equivalent to 9x4 + 14x3 − 5x2 + 12x − 1 = 0.

φ3

2728

φ3

283

q2/4−p3/27

3-10 MATHEMATICS

Page 14: 03 mathematics

Determinants Consider the system of two linear equations

a11x1 + a12x2 = b1

a21x1 + a22x2 = b2

If the first equation is multiplied by a22 and the second by −a12 and theresults added, we obtain

(a11a22 − a21a12)x1 = b1a22 − b2a12

The expression a11a22 − a21a12 may be represented by the symbol

= a11a22 − a21a12

This symbol is called a determinant of second order. The value of thesquare array of n2 quantities aij, where i = 1, . . . , n is the row index, j = 1, . . . , n the column index, written in the form

|A| = is called a determinant. The n2 quantities aij are called the elementsof the determinant. In the determinant |A| let the ith row and jthcolumn be deleted and a new determinant be formed having n − 1rows and columns. This new determinant is called the minor of aij

denoted Mij.

Example The minor of a23 is M23 = The cofactor Aij of the element aij is the signed minor of aij determined

by the rule Aij = (−1) i + jMij. The value of |A| is obtained by forming any of theequivalent expressions n

j = 1 aijAij, ni = 1 aijAij, where the elements aij must be

taken from a single row or a single column of A.

a12

a32

a11

a31

a13

a23

a33

a12

a22

a32

a11

a21

a31

a13 ⋅ ⋅ ⋅ a1n

⋅ ⋅ ⋅ ⋅ ⋅ a2n

an3 ⋅ ⋅ ⋅ ann

a12

a22

an2

a11

a21

an1

a12

a22

a11

a21

Example

= a31A31 + a32A32 + a33A33

= a31 − a32 + a33 In general, Aij will be determinants of order n − 1, but they may in turn be

expanded by the rule. Also,

n

j = 1

ajiAjk = n

j = 1

aijAjk = |A| i = k0 i ≠ k

Fundamental Properties of Determinants1. The value of a determinant |A| is not changed if the rows and

columns are interchanged.2. If the elements of one row (or one column) of a determinant are

all zero, the value of |A| is zero.3. If the elements of one row (or column) of a determinant are

multiplied by the same constant factor, the value of the determinant ismultiplied by this factor.

4. If one determinant is obtained from another by interchangingany two rows (or columns), the value of either is the negative of thevalue of the other.

5. If two rows (or columns) of a determinant are identical, the valueof the determinant is zero.

6. If two determinants are identical except for one row (or col-umn), the sum of their values is given by a single determinantobtained by adding corresponding elements of dissimilar rows (orcolumns) and leaving unchanged the remaining elements.

7. The value of a determinant is not changed if one row (or col-umn) is multiplied by a constant and added to another row (or col-umn).

a12

a22

a11

a21

a13

a23

a11

a21

a13

a23

a12

a22

a13

a23

a33

a12

a22

a32

a11

a21

a31

ANALYTIC GEOMETRY 3-11

ANALYTIC GEOMETRY

REFERENCES: Fuller, G., Analytic Geometry, 7th ed., Addison Wesley Longman(1994); Larson, R., R. P. Hostetler, and B. H. Edwards, Calculus with AnalyticGeometry, 7th ed., Houghton Mifflin (2001); Riddle, D. F., Analytic Geometry, 6thed., Thompson Learning (1996); Spiegel, M. R., and J. Liu, Mathematical Hand-book of Formulas and Tables, 2d ed., McGraw-Hill (1999); Thomas, G. B., Jr., andR. L. Finney, Calculus and Analytic Geometry, 9th ed., Addison-Wesley (1996).

Analytic geometry uses algebraic equations and methods to study geo-metric problems. It also permits one to visualize algebraic equations interms of geometric curves, which frequently clarifies abstract concepts.

PLANE ANALYTIC GEOMETRY

Coordinate Systems The basic concept of analytic geometry isthe establishment of a one-to-one correspondence between the pointsof the plane and number pairs (x, y). This correspondence may bedone in a number of ways. The rectangular or cartesian coordinatesystem consists of two straight lines intersecting at right angles (Fig.3-12). A point is designated by (x, y), where x (the abscissa) is thedistance of the point from the y axis measured parallel to the x axis,

positive if to the right, negative to the left. y (ordinate) is the distanceof the point from the x axis, measured parallel to the y axis, positive ifabove, negative if below the x axis. The quadrants are labeled 1, 2, 3,4 in the drawing, the coordinates of points in the various quadrantshaving the depicted signs. Another common coordinate system is thepolar coordinate system (Fig. 3-13). In this system the position of apoint is designated by the pair (r, θ), r = x2 + y2 being the distance tothe origin 0(0,0) and θ being the angle the line r makes with the posi-tive x axis (polar axis). To change from polar to rectangular coordinates,use x = r cos θ and y = r sin θ. To change from rectangular to polar coordinates, use r = x2 + y2 and θ = tan−1 (y/x) if x ≠ 0; θ = π/2if x = 0. The distance between two points (x1, y1), (x2, y2) is definedby d = (x1− x2)2+ (y1− y2)2 in rectangular coordinates or by d =r 12+ r2

2 − 2r1r2 cos(θ1− θ2) in polar coordinates. Other coordinatesystems are sometimes used. For example, on the surface of a spherelatitude and longitude prove useful.

The Straight Line (Fig. 3-14) The slope m of a straight line isthe tangent of the inclination angle θ made with the positive x axis. If

FIG. 3-12 Rectangular coordinates. FIG. 3-13 Polar coordinates. FIG. 3-14 Straight line.

Page 15: 03 mathematics

(x1, y1) and (x2, y2) are any two points on the line, slope = m = (y2 −y1)/(x2 − x1). The slope of a line parallel to the x axis is zero; parallel tothe y axis, it is undefined. Two lines are parallel if and only if they havethe same slope. Two lines are perpendicular if and only if the productof their slopes is −1 (the exception being that case when the lines areparallel to the coordinate axes). Every equation of the type Ax + By +C = 0 represents a straight line, and every straight line has an equationof this form. A straight line is determined by a variety of conditions:

Given conditions Equation of line

(1) Parallel to x axis y = constant(2) Parallel y axis x = constant(3) Point (x1, y1) and slope m y − y1 = m(x − x1)(4) Intercept on y axis (0, b), m y = mx + b(5) Intercept on x axis (a, 0), m y = m(x − a)

(6) Two points (x1, y1), (x2, y2) y − y1 = (x − x1)

(7) Two intercepts (a, 0), (0, b) x/a + y/b = 1

The angle β a line with slope m1 makes with a line having slope m2

is given by tan β = (m2 − m1)/(m1m2 + 1). A line is determined if thelength and direction of the perpendicular to it (the normal) from theorigin are given (see Fig. 3-15). Let p = length of the perpendicularand α the angle that the perpendicular makes with the positive x axis.The equation of the line is x cos + y sin = p. The equation of a lineperpendicular to a given line of slope m and passing through a point(x1, y1) is y − y1 = −(1/m) (x − x1). The distance from a point (x1, y1) toa line with equation Ax + By + C = 0 is

d =

Occasionally some nonlinear algebraic equations can be reduced tolinear equations under suitable substitutions or changes of variables.In other words, certain curves become the graphs of lines if the scalesor coordinate axes are appropriately transformed.

Example Consider y = bxn. B = log b. Taking logarithms log y =n log x + log b. Let Y = log y, X = log x, B = log b. The equation then has the form Y = nX + B, which is a linear equation. Consider k = k0 exp (−E/RT), taking log-arithms loge k = loge k0 − E/(RT). Let Y = loge k, B = loge k0, and m = −E/R,X = 1/T, and the result is Y = mX + B. Next consider y = a + bxn. If the substitu-tion t = xn is made, then the graph of y is a straight line versus t.

Asymptotes The limiting position of the tangent to a curve as thepoint of contact tends to an infinite distance from the origin is calledan asymptote. If the equation of a given curve can be expanded in aLaurent power series such that

f(x) = n

k = 0

ak xk + n

k = 1

and limx→∞

f(x) = n

k = 0

akxk

then the equation of the asymptote is y = nk = 0 ak xk. If n = 1, then the

asymptote is (in general oblique) a line. In this case, the equation ofthe asymptote may be written as

y = mx + b m = limx→∞

f ′(x)

b = limx→∞

[ f(x) − xf ′(x)]

bkxk

|Ax1 + By1 + C|

A2+ B2

y2 − y1x2 − x1

Geometric Properties of a Curve When the Equation IsGiven The analysis of the properties of an equation is facilitatedby the investigation of the equation by using the following tech-niques:

1. Points of maximum, minimum, and inflection. These may beinvestigated by means of the calculus.

2. Symmetry. Let F(x, y) = 0 be the equation of the curve.

Condition on F(x, y) Symmetry

F(x, y) = F(−x, y) With respect to y axisF(x, y) = F(x, −y) With respect to x axisF(x, y) = F(−x, −y) With respect to originF(x, y) = F(y, x) With respect to the line y = x

3. Extent. Only real values of x and y are considered in obtainingthe points (x, y) whose coordinates satisfy the equation. The extent ofthem may be limited by the condition that negative numbers do nothave real square roots.

4. Intercepts. Find those points where the curves of the functioncross the coordinate axes.

5. Asymptotes. See preceding discussion.6. Direction at a point. This may be found from the derivative of

the function at a point. This concept is useful for distinguishing amonga family of similar curves.

Example y2 = (x2 + 1)/(x2 − 1) is symmetric with respect to the x and y axis,the origin, and the line y = x. It has the vertical asymptotes x = 1. When x = 0,y2 = −1; so there are no y intercepts. If y = 0, (x2 + 1)/(x2 − 1) = 0; so there are nox intercepts. If |x| < 1, y2 is negative; so |x| > 1. From x2 = (y2 + 1)/(y2 − 1), y = 1are horizontal asymptotes and |y| > 1. As x → 1+, y → + ∞; as x → + ∞, y → + 1.The graph is given in Fig. 3-16.

Conic Sections The curves included in this group are obtainedfrom plane sections of the cone. They include the circle, ellipse,parabola, hyperbola, and degeneratively the point and straight line. Aconic is the locus of a point whose distance from a fixed point calledthe focus is in a constant ratio to its distance from a fixed line, calledthe directrix. This ratio is the eccentricity e. If e = 0, the conic is a cir-cle; if 0 < e < 1, the conic is an ellipse; if e = 1, the conic is a parabola;if e > 1, the conic is a hyperbola. Every conic section is representableby an equation of second degree. Conversely, every equation of sec-ond degree in two variables represents a conic. The general equationof the second degree is Ax2 + Bxy + Cy2 + Dx + Ey + F = 0. Let bedefined as the determinant

= The table characterizes the curve represented by the equation.

B2 − 4AC < 0 B2 − 4AC = 0 B2 − 4AC > 0

A < 0A ≠ C, an ellipse

≠ 0 A < 0A = C, a circle Parabola HyperbolaA > 0, no locus

2 parallel lines if Q = D2 + E2 −4(A + C)F > 0 2 intersecting

= 0 Point 1 straight line if Q = 0, no locus straight linesif Q < 0

DE2F

B2CE

2ABD

3-12 MATHEMATICS

FIG. 3-16 Graph of y2 = (x2 + 1)/(x2 − 1).FIG. 3-15 Determination of line.

Page 16: 03 mathematics

Example 3x2 + 4xy − 2y2 + 3x − 2y + 7 = 0.

= = −596 ≠ 0, B2 − 4AC = 40 > 0

The curve is therefore a hyperbola.

The following tabulation gives the form of the more common equa-tions.

Polar equation Type of curve

(1) r = a Circle, Fig. 3-17(2) r = 2a cos θ Circle, Fig. 3-18(3) r = 2a sin θ Circle, Fig. 3-19(4) r2 − 2br cos (θ − β) + b2 − a2 = 0 Circle at (b, β), radius a

e = 1 parabola, Fig. 3-22(5) r = 0 < e < 1 ellipse, Fig. 3-20

e > 1 hyperbola, Fig. 3-21

Parametric Equations It is frequently useful to write the equa-tions of a curve in terms of an auxiliary variable called a parameter.For example, a circle of radius a, center at (0, 0), can be written inthe equivalent form x = a cos φ, y = a sin φ where φ is the parameter.

ke1 − e cos θ

3−214

4−4−2

643

Similarly, x = a cos φ, y = b sin φ are the parametric equations of theellipse x2/a2 + y2/b2 = 1 with parameter φ.

SOLID ANALYTIC GEOMETRY

Coordinate Systems The commonly used coordinate systemsare three in number. Others may be used in specific problems [seeMorse, P. M., and H. Feshbach, Methods of Theoretical Physics, vols.I and II, McGraw-Hill, New York (1953)]. The rectangular (carte-sian) system (Fig. 3-25) consists of mutually orthogonal axes x, y, z. Atriple of numbers (x, y, z) is used to represent each point. The cylin-drical coordinate system (r, θ, z; Fig. 3-26) is frequently used to locatea point in space. These are essentially the polar coordinates (r, θ) cou-pled with the z coordinate. As before, x = r cos θ, y = r sin θ, z = z andr2 = x2 + y2, y/x = tan θ. If r is held constant and θ and z are allowed tovary, the locus of (r, θ, z) is a right circular cylinder of radius r alongthe z axis. The locus of r = C is a circle, and θ = constant is a plane con-taining the z axis and making an angle θ with the xz plane. Cylindricalcoordinates are convenient to use when the problem has an axis ofsymmetry.

The spherical coordinate system is convenient if there is a point of symmetry in the system. This point is taken as the origin and thecoordinates (ρ, φ, θ) illustrated in Fig. 3-27. The relations are x =

ANALYTIC GEOMETRY 3-13

Some common equations in parametric form are given below.

Circle (Fig. 3-23) Parameter is angle θ.

Ellipse (Fig. 3-20) Parameter is angle φ.

Circle Parameter is t = = slope of tangent at (x, y).

Parabola (Fig. 3-22)

Hyperbola with the origin at the center (Fig. 3-21)

Catenary (such as hanging cable under gravity) Parameter s = arc length from (0, a)to (x, y).

Fig. 3.24

dydx

x = h + a cos θy = k + a sin θx = h + a cos φy = k + a sin φ

x =

y =

x = a sinh−1

y2 = a2 + s2

x = a(φ − sin φ)y = a(1 − cos φ)

sa

at2 + 1

−att2 + 1

(1) (x − h)2 + (y − k)2 = a2

(2) + = 1

(3) x2 + y2 = a2

(4) x2 = y + k

(5) ax2

2

by2

2 = 1

(6) y = a cosh

(7) Cycloid

xa

(y − k)2

b2

(x − h)2

a2

FIG. 3-17 Circle center (0,0) r = a. FIG. 3-18 Circle center (a,0) r = 2a cos θ. FIG. 3-19 Circle center (0,a) r = 2a sin θ.

FIG. 3-20 Ellipse, 0 < e < 1. FIG. 3-21 Hyperbola, e > 1. FIG. 3-22 Parabola, e = 1.

Page 17: 03 mathematics

ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ, and r = ρ sin φ. θ = constantis a plane containing the z axis and making an angle θ with the xz plane.φ = constant is a cone with vertex at 0. ρ = constant is the surface of asphere of radius ρ, center at the origin 0. Every point in the space maybe given spherical coordinates restricted to the ranges 0 ≤ φ ≤ π, ρ ≥ 0,0 ≤ θ < 2π.

Lines and Planes The distance between two points (x1, y1, z1),(x2, y2, z2) is d = (x1− x2)2+ (y1− y2)2+ (z1 − z2)2. There is nothing inthe geometry of three dimensions quite analogous to the slope of aline in the plane case. Instead of specifying the direction of a line by a trigonometric function evaluated for one angle, a trigonometricfunction evaluated for three angles is used. The angles α, β, γ thata line segment makes with the positive x, y, and z axes, respectively,are called the direction angles of the line, and cos α, cos β,cos γ are called the direction cosines. Let (x1, y1, z1), (x2, y2, z2) be on the line. Then cos α = (x2 − x1)/d, cos β = (y2 − y1)/d, cos γ =(z2 − z1)/d, where d = the distance between the two points. Clearly cos2 α + cos2 β + cos2 γ = 1. If two lines are specified by the directioncosines (cos α1, cos β1, cos γ1), (cos α2, cos β2, cos γ2), then the angle θbetween the lines is cos θ = cos α1 cos α2 + cos β1 cos β2 + cos γ1 cos γ2.Thus the lines are perpendicular if and only if θ = 90° or cos α1

cos α2 + cos β1 cos β2 + cos γ1 cos γ2 = 0. The equation of a line withdirection cosines (cos α, cos β, cos γ) passing through (x1, y1, z1) is (x − x1)/cos α = (y − y1)/cos β = (z − z1)/cos γ.

The equation of every plane is of the form Ax + By + Cz + D = 0.The numbers

, ,

are direction cosines of the normal lines to the plane. The planethrough the point (x1, y1, z1) whose normals have these as directioncosines is A(x − x1) + B(y − y1) + C(z − z1) = 0.

Example Find the equation of the plane through (1, 5, −2) perpendicularto the line (x + 9)/7 = (y − 3)/−1 = z/8. The numbers (7, −1, 8) are called direc-tion numbers. They are a constant multiple of the direction cosines. cos α =7/114, cos β = −1/114, cos γ = 8/114. The plane has the equation 7(x − 1) −1(y − 5) + 8(z + 2) = 0 or 7x − y + 8z + 14 = 0.

The distance from the point (x1, y1, z1) to the plane Ax + By + Cz +D = 0 is

d =

Space Curves Space curves are usually specified as the set ofpoints whose coordinates are given parametrically by a system ofequations x = f(t), y = g(t), z = h(t) in the parameter t.

Example The equation of a straight line in space is (x − x1)/a = (y − y1)/b =(z − z1)/c. Since all these quantities must be equal (say, to t), we may write x =x1 + at, y = y1 + bt, z = z1 + ct, which represent the parametric equations of theline.

Example The equations z = a cos βt, y = a sin βt, z = bt, a, β, b positiveconstants, represent a circular helix.

|Ax1 + By1 + Cz1 + D|

A2+ B2+ C2

CA2+ B2+ C2

BA2+ B2+ C2

AA2+ B2+ C2

3-14 MATHEMATICS

FIG. 3-23 Circle.

FIG. 3-24 Cycloid.

FIG. 3-25 Cartesian coordinates.

FIG. 3-26 Cylindrical coordinates.

FIG. 3-27 Spherical coordinates. FIG. 3-28 Parabolic cylinder.

Page 18: 03 mathematics

ANALYTIC GEOMETRY 3-15

Surfaces The locus of points (x, y, z) satisfying f(x, y, z) = 0,broadly speaking, may be interpreted as a surface. The simplest sur-face is the plane. The next simplest is a cylinder, which is a surfacegenerated by a straight line moving parallel to a given line and passingthrough a given curve.

Example The parabolic cylinder y = x2 (Fig. 3-28) is generated by astraight line parallel to the z axis passing through y = x2 in the plane z = 0.

A surface whose equation is a quadratic in the variables x, y, and zis called a quadric surface. Some of the more common such surfacesare tabulated and pictured in Figs. 3-28 to 3-36.

FIG. 3-30 Hyperboloid of one sheet. + − = 1z2

c2

y2

b2

x2

a2

FIG. 3-31 Hyperboloid of two sheets. + − = −1z2

c2

y2

b2

x2

a2

FIG. 3-32 Cone. + + = 0z2

c2

y2

b2

x2

a2

FIG. 3-33 Elliptic paraboloid.

+ + cz = 0y2

b2

x2

a2

FIG. 3-34 Hyperbolic paraboloid. − + cz = 0y2

b2

x2

a2

FIG. 3-35 Elliptic cylinder. + = 1y2

b2

x2

a2

FIG. 3-36 Hyperbolic cylinder.

− = 1y2

b2

x2

a2

FIG. 3-29 Ellipsoid. + + = 1 (sphere if a = b = c)z2

c2

y2

b2

x2

a2

Page 19: 03 mathematics

3-16 MATHEMATICS

PLANE TRIGONOMETRY

REFERENCES: Gelfand, I. M., and M. Saul, Trigonometry, Birkhäuser, Boston(2001); Heineman, E. Richard, and J. Dalton Tarwater, Plane Trigonometry, 7thed., McGraw-Hill (1993).

ANGLES

An angle is generated by the rotation of a line about a fixed centerfrom some initial position to some terminal position. If the rotation isclockwise, the angle is negative; if it is counterclockwise, the angle ispositive. Angle size is unlimited. If α, β are two angles such that α +β = 90°, they are complementary; they are supplementary if α + β =180°. Angles are most commonly measured in the sexagesimal systemor by radian measure. In the first system there are 360 degrees in onecomplete revolution; one degree = 1⁄90 of a right angle. The degree issubdivided into 60 minutes; the minute is subdivided into 60 seconds.In the radian system one radian is the angle at the center of a circlesubtended by an arc whose length is equal to the radius of the circle.Thus 2# rad = 360°; 1 rad = 57.29578°; 1° = 0.01745 rad; 1 min =0.00029089 rad. The advantage of radian measure is that it is dimen-sionless. The quadrants are conventionally labeled as Fig. 3-37 shows.

FUNCTIONS OF CIRCULAR TRIGONOMETRY

The trigonometric functions of angles are the ratios between the vari-ous sides of the reference triangles shown in Fig. 3-38 for the various quadrants. Clearly r = x2 + y2 ≥ 0. The fundamental functions (seeFigs. 3-39, 3-40, 3-41) are

Plane Trigonometry

Sine of θ = sin θ = y/r Secant of θ = sec θ = r/xCosine of θ = cos θ = x/r Cosecant of θ = csc θ = r/yTangent of θ = tan θ = y/x Cotangent of θ = cot θ = x/y

Values of the Trigonometric Functions for Common Angles

θ° θ, rad sin θ cos θ tan θ

0 0 0 1 030 π/6 1/2 3/2 3/345 π/4 2/2 2/2 160 π/3 3/2 1/2 390 π/2 1 0 +∞

If 90° ≤ θ ≤ 180°, sin θ = sin (180° − θ); cos θ = −cos (180° − θ);tan θ = −tan (180° − θ). If 180° ≤ θ ≤ 270°, sin θ = −sin (270° − θ);cos θ = −cos (270° − θ); tan θ = tan (270° − θ). If 270° ≤ θ ≤ 360°,sin θ = −sin (360° − θ); cos θ = cos (360° − θ); tan θ = −tan (360° − θ).The reciprocal properties may be used to find the values of the otherfunctions.

If it is desired to find the angle when a function of it is given, theprocedure is as follows: There will in general be two angles between0° and 360° corresponding to the given value of the function.

Find an acuteGiven (a > 0) angle θ0 such that Required angles are

sin θ = +a sin θ0 = a θ0 and (180° − θ0)cos θ = +a cos θ0 = a θ0 and (360° − θ0)tan θ = +a tan θ0 = a θ0 and (180° + θ0)sin θ = −a sin θ0 = a 180° + θ0 and 360° − θ0

cos θ = −a cos θ0 = a 180° − θ0 and 180° + θ0

tan θ = −a tan θ0 = a 180° − θ0 and 360° − θ0

Relations between Functions of a Single Angle sec θ = 1/cos θ; csc θ = 1/sin θ, tan θ = sin θ/cos θ = sec θ/csc θ = 1/cot θ; sin2 θ +cos2 θ = 1; 1 + tan2 θ = sec2 θ; 1 + cot2 θ = csc2 θ. For 0 ≤ θ ≤ 90° thefollowing results hold:

sin θ = 2 sin cos θ2θ2

FIG. 3-38 Triangles.

FIG. 3-39 Graph of y = sin x.

III

III IV

FIG. 3-37 Quadrants.

FIG. 3-40 Graph of y = cos x.

FIG. 3-41 Graph of y = tan x.

Page 20: 03 mathematics

PLANE TRIGONOMETRY 3-17

and cos θ = cos2 − sin2 The cofunction property is very important. cos θ = sin (90° − θ),sin θ = cos (90° − θ), tan θ = cot (90° − θ), cot θ = tan (90° − θ), etc.

Functions of Negative Angles sin (−θ) = −sin θ, cos (−θ) =cos θ, tan (−θ) = −tan θ, sec (−θ) = sec θ, csc (−θ) = −csc θ, cot (−θ) =−cot θ.

IdentitiesSum and Difference Formulas Let x, y be two angles. sin (x y) =

sin x cos y cos x sin y; cos (x y) = cos x cos y " sin x sin y; tan(x y) = (tan x tan y)/(1 " tan x tan y); sin x sin y = 2 sin a(x y) cos a(x " y); cos x + cos y = 2 cos a(x + y) cos a(x − y); cos x − cosy = −2 sin a(x + y) sin a(x − y); tan x tan y = [sin (x y)]/(cos x cosy); sin2 x − sin2 y = cos2 y − cos2 x = sin (x + y) sin (x − y); cos2 x − sin2 y =cos2 y − sin2 x = cos (x + y) cos (x − y); sin (45° + x) = cos (45° − x);sin (45° − x) = cos (45° + x); tan (45° x) = cot (45° " x).

Multiple and Half Angle Identities Let x = angle, sin 2x = 2 sin xcos x; sin x = 2 sin ax cos ax; cos 2x = cos2 x − sin2x = 1 − 2 sin2x =2 cos2x − 1. tan 2x = (2 tan x)/(1 − tan2 x); sin 3x = 3 sin x − 4 sin3x;cos 3x = 4 cos3 x − 3 cos x. tan 3x = (3 tan x − tan3 x)/(1 − 3 tan2 x);sin 4x = 4 sin x cos x − 8 sin3 x cos x; cos 4x = 8 cos4 x − 8 cos2 x + 1.

sin = a(1 − cosx)

cos = a(1 + cosx)

tan = = =

INVERSE TRIGONOMETRIC FUNCTIONS

y = sin −1 x = arcsin x is the angle y whose sine is x.

Example y = sin−1 a, y is 30°.The complete solution of the equation x = sin y is y = (−1)n sin−1 x + n(180°),

−π/2 ≤ sin−1 x ≤ π/2 where sin−1 x is the principal value of the angle whose sine is x. The range of principal values of the cos−1 x is 0 ≤ cos−1 x ≤ π and −π/2 ≤tan−1 x ≤ π/2. If these restrictions are allowed to hold, the following formulasresult:

sin−1 x = cos−1 1 − x2 = tan−1 = cot−1

= sec−1 = csc−1 = − cos−1 x

cos−1 x = sin−1 1 − x2 = tan−1

= cot−1 = sec−1

= csc−1 = − sin−1 x

tan−1 x = sin−1 = cos−1

= cot−1 = sec−1 1 + x2 = csc−1

RELATIONS BETWEEN ANGLES AND SIDES OF TRIANGLES

Solutions of Triangles (Fig. 3-42) Let a, b, c denote the sidesand α, β, γ the angles opposite the sides in the triangle. Let 2s = a +

1 + x2

x1x

11 + x2

x1 − x2

π2

11 − x2

1x

x1 − x2

1 − x2

x

π2

1x

11 − x2

1 − x2

x

x1 − x2

1 − cos x

sin xsin x

1 + cos x

1 − cos x1 + cos x

x2

x2

x2

θ2

θ2

b + c, A = area, r = radius of the inscribed circle, R = radius of thecircumscribed circle, and h = altitude. In any triangle α + β + γ =180°.

Law of Sines sin α /a = sin β/b = sin γ /c 1/(2R).Law of Tangents

= ; = ; =

Law of Cosines a2 = b2 + c2 − 2bc cos α; b2 = a2 + c2 − 2ac cos β;c2 = a2 + b2 − 2ab cos γ.

Other Relations In this subsection, where appropriate, twomore formulas can be generated by replacing a by b, b by c, c by a,α by β, β by γ, and γ by α. cos α = (b2 + c2 − a2)/2bc; a = b cos γ + ccos β; sin α = (2/bc) s(s− a)(s− b)(s− c) ;

sin = ; cos = ;

A = bh = ab sin γ = = s(s− a)(s− b)(s− c) = rs

where r = R = a/(2 sin α) = abc/4A; h = c sin a = a sin γ = 2rs/b.

Right Triangle (Fig. 3-43) Given one side and any acute angle αor any two sides, the remaining parts can be obtained from the fol-lowing formulas:

a = (c + b)(c− b) = c sin α = b tan αb = (c + a)(c− a) = c cos α = a cot α

c = a2 + b2, sin α = , cos α = , tan α = , β = 90° − α

A = ab = = =

Oblique Triangles (Fig. 3-44) There are four possible cases.1. Given b, c and the included angles α,

(β + γ) = 90° − α; tan (β − γ) = tan (β + γ)

β = (β + γ) + (β − γ); γ = (β + γ) − (β − γ); a = b sin α

sin β12

12

12

12

12

b − cb + c

12

12

12

c2 sin 2α

4b2 tan α

2a2

2 tan α

12

ab

bc

ac

(s − a)(s − b)(s − c)

s

a2 sin β sin γ

2 sin α12

12

s(s − a)

bc

α2

(s − b)(s − c)

bcα2

tan a(α + γ)tan a(α − γ)

a + ca − c

tan a(β + γ)tan a(β − γ)

b + cb − c

tan a(α + β)tan a(α − β)

a + ba − b

FIG. 3-42 Triangle.

FIG. 3-43 Right triangle. FIG. 3-44 Oblique triangle.

Page 21: 03 mathematics

2. Given the three sides a, b, c, s = a (a + b + c);

r = tan α = ; tan β = ; tan γ =

3. Given any two sides a, c and an angle opposite one of them α, sinγ = (c sin α)/a; β = 180° − a − γ; b = (a sin β)/(sin α). There may be twosolutions here. γ may have two values γ1, γ2; γ1 < 90°, γ2 = 180° −γ1 > 90°. If α + γ2 > 180°, use only γ1. This case may be impossible ifsin γ > 1.

4. Given any side c and two angles α and β, γ = 180° − α − β; a = (csin α)/(sin γ); b = (c sin β)/(sin γ).

HYPERBOLIC TRIGONOMETRYThe hyperbolic functions are certain combinations of exponentials ex

and e−x.

cosh x = ; sinh x = ; tanh x = =

coth x = = = ; sech x = = ;

csch x = =

Fundamental Relationships sinh x + cosh x = ex; cosh x − sinhx = e−x; cosh2 x − sinh2 x = 1; sech2 x + tanh2 x = 1; coth2 x − csch2 x = 1;sinh 2x = 2 sinh x cosh x; cosh 2x = cosh2 x + sinh2 x = 1 + 2 sinh2 x =2 cosh2 x − 1. tanh 2x = (2 tanh x)/(1 + tanh2 x); sinh (x y) = sinh x

2ex − e−x

1sinh x

2ex + e−x

1cosh x

cosh xsinh x

1tanh x

ex + e−x

ex − e−x

ex − e−x

ex + e−x

sinh xcosh x

ex − e−x

2

ex + e−x

2

rs − c

12

rs − b

12

rs − a

12

(s − a)(s − b)(s − c)

s

cosh y cosh x sinh y; cosh (x y) = cosh x cosh y sinh x sinh y;2 sinh2 x/2 = cosh x − 1; 2 cosh2 x/2 = cosh x + 1; sinh (−x) = −sinh x;cosh (−x) = cosh x; tanh (−x) = −tanh x.

When u = a cosh x, v = a sinh x, then u2 − v2 = a2; which is the equa-tion for a hyperbola. In other words, the hyperbolic functions in theparametric equations u = a cosh x, v = a sinh x have the same relationto the hyperbola u2 − v2 = a2 that the equations u = a cos θ, v = a sin θhave to the circle u2 + v2 = a2.

Inverse Hyperbolic Functions If x = sinh y, then y is the in-verse hyperbolic sine of x written y = sinh−1 x or arcsinh x. sinh−1 x =loge (x + x2 + 1)

cosh−1 x = loge (x x2 1); tanh −1 x = loge ;

coth−1 x = loge ; sech−1 x = loge ;

csch−1 = loge Magnitude of the Hyperbolic Functions cosh x ≥ 1 with

equality only for x = 0; −∞ < sinh x < ∞; −1 < tanh x < 1. cosh x ∼ ex/2as x → ∞; sinh x → ex/2 as x → ∞.

APPROXIMATIONS FOR TRIGONOMETRIC FUNCTIONS

For small values of θ (θ measured in radians) sin θ ≈ θ, tan θ ≈ θ;cos θ ≈ 1 − (θ2/2). The behavior ratio of the functions as θ → 0 is givenby the following:

limθ→0

sin θ/θ = 1; sin θ/tan θ = 1.

1 + 1 + x2

x

1 + 1 − x2

xx + 1x − 1

12

1 + x1 − x

12

3-18 MATHEMATICS

DIFFERENTIAL AND INTEGRAL CALCULUS

REFERENCES: Char, B. W., et al., Maple V Language Reference Manual,Springer-Verlag, New York (1991); Wolfram, S., The Mathematics Book, 5th ed.,Wolfram Media (2003).

DIFFERENTIAL CALCULUSAn Example of Functional Notation Suppose that a storage

warehouse of 16,000 ft3 is required. The construction costs per squarefoot are $10, $3, and $2 for walls, roof, and floor respectively. What arethe minimum cost dimensions? Thus, with h = height, x = width, andy = length, the respective costs are

Walls = 2 × 10hy + 2 × 10hx = 20h(y + x)Roof = 3xy

Floor = 2xyTotal cost = 2xy + 3xy + 20h(x + y) = 5xy + 20h(x + y) (3-1)

and the restrictionTotal volume = xyh (3-2)

Solving for h from Eq. (3-2),

h = volume/xy = 16,000/xy (3-3)

Cost = 5xy + (y + x) = 5xy + 320,000 + (3-4)

In this form it can be shown that the minimum cost will occur for x = y; therefore

Cost = 5x2 + 640,000 (1/x)

By evaluation, the smallest cost will occur when x = 40.

Cost = 5(1600) + 640,000/40 = $24,000

1y

1x

320,000

xy

The dimensions are then x = 40 ft, y = 40 ft, h = 16,000/(40 × 40) =10 ft. Symbolically, the original cost relationship is written

Cost = f(x, y, h) = 5xy + 20h(y + x)

and the volume relation

Volume = g(x, y, h) = xyh = 16,000

In terms of the derived general relationships (3-1) and (3-2), x, y, and hare independent variables—cost and volume, dependent variables.That is, the cost and volume become fixed with the specification ofdimensions. However, corresponding to the given restriction of theproblem, relative to volume, the function g(x, y, z) = xyh becomes aconstraint function. In place of three independent and two dependentvariables the problem reduces to two independent (volume has beenconstrained) and two dependent as in functions (3-3) and (3-4). Fur-ther, the requirement of minimum cost reduces the problem to threedependent variables (x, y, h) and no degrees of freedom, that is,freedom of independent selection.

Limits The limit of function f(x) as x approaches a (a is finite orelse x is said to increase without bound) is the number N.

limx→a

f(x) = N

This states that f(x) can be calculated as close to N as desirable bymaking x sufficiently close to a. This does not put any restriction onf(x) when x = a. Alternatively, for any given positive number ε, a num-ber δ can be found such that 0 < |a − x| < δ implies that |N − f(x)| < ε.

The following operations with limits (when they exist) are valid:

limx→a

bf(x) = b limx→a

f(x)

limx→a

[ f(x) + g(x)] = limx→a

f(x) + limx→a

g(x)

Page 22: 03 mathematics

limx→a

[ f(x)g(x)] = limx→a

f(x) ⋅ limx→a

g(x)

limx→a

= if limx→a

g(x) ≠ 0

See “Indeterminant Forms” below when g(a) 0.Continuity A function f(x) is continuous at the point x = a if

limh→0

[ f(a + h) − f(a)] = 0

Rigorously, it is stated f(x) is continuous at x = a if for any positive εthere exists a δ > 0 such that | f(a + h) − f(a)| < ε for all x with |x − a| < δ.For example, the function (sin x)/x is not continuous at x = 0 andtherefore is said to be discontinuous. Discontinuities are classifiedinto three types:

1. Removable y = sin x/x at x = 02. Infinite y = 1/x at x = 03. Jump y = 10/(1 + e1/x) at x = 0+ y = 0+

x = 0 y = 0x = 0− y = 10

Derivative The function f(x) has a derivative at x = a, which canbe denoted as f ′(a), if

limh→0

exists. This implies continuity at x = a. Conversely, a function may becontinuous but not have a derivative. The derivative function is

f ′(x) = = limh→0

Differentiation Define ∆y = f(x + ∆x) − f(x). Then dividing by ∆x

=

Call lim∆x→0

=

then = lim∆x→0

Example Find the derivative of y = sin x.

= lim∆x→0

= lim∆x→0

= lim∆x→0

+ lim∆x→0

= cos x since lim∆x→0

= 1

Differential Operations The following differential operationsare valid: f, g, . . . are differentiable functions of x, c and n are con-stants; e is the base of the natural logarithms.

= 0 (3-5)

= 1 (3-6)

( f + g) = + (3-7)

( f × g) = f + g (3-8)dfdx

dgdx

ddx

dgdx

dfdx

ddx

dxdx

dcdx

sin ∆x∆x

sin ∆x cos x

∆x

sin x(cos ∆x − 1)

∆x

sin x cos ∆x + sin ∆x cos x − sin x

∆x

sin (x + ∆x) − sin(x)

∆x

dydx

f(x + ∆x) − f(x)

∆xdydx

dydx

∆y∆x

f(x + ∆x) − f(x)

∆x∆y∆x

f(x + h) − f(x)

hdfdx

f(a + h) − f(a)

h

limx→a

f(x)limx→a

g(x)f(x)g(x)

= if ≠ 0 (3-9)

f n = nf n − 1 (3-10)

= (3-11)

= × (chain rule) (3-12)

= gf g − 1 + f g ln f (3-13)

= (ln a) ax (3-14)

Example Derive dy/dx for x2 + y3 = x + xy + A.

Here x2 + y3 = x + xy + A

2x + 3y2 = 1 + y + x + 0

by rules (3-10), (3-10), (3-6), (3-8), and (3-5) respectively.

Thus =

Differentials

dex = ex dx (3-15a)

d(ax) = ax log a dx (3-15b)

d ln x = (1/x) dx (3-16)

d log x = (log e/x)dx (3-17)

d sin x = cos x dx (3-18)

d cos x = −sin x dx (3-19)

d tan x = sec2 x dx (3-20)

d cot x = −csc2 x dx (3-21)

d sec x = tan x sec x dx (3-22)

d csc x = −cot x csc x dx (3-23)

d sin−1 x = (1 − x2)−1/2 dx (3-24)

d cos−1x = −(1 − x2)−1/2 dx (3-25)

d tan−1 x = (1 + x2)−1 dx (3-26)

d cot−1 x = −(1 + x2)−1 dx (3-27)

d sec−1 x = x−1(x2 − 1)−1/2 dx (3-28)

d csc−1 x = −x−1(x2 − 1)−1/2 dx (3-29)

d sinh x = cosh x dx (3-30)

d cosh x = sinh x dx (3-31)

d tanh x = sech2 x dx (3-32)

d coth x = −csch2 x dx (3-33)

d sech x = −sech x tanh x dx (3-34)

d csch x = −csch x coth x dx (3-35)

d sinh−1 x = (x2 + 1)−1/2 dx (3-36)

d cosh−1 = (x2 − 1)−1/2 dx (3-37)

d tanh−1 x = (1 − x2)−1 dx (3-38)

d coth−1 x = −(x2 − 1)−1 dx (3-39)

d sech−1 x = −(1/x)(1 − x2)−1/2 dx (3-40)

d csch−1 x = −x−1(x2 + 1)−1/2 dx (3-41)

2x − 1 − y

x − 3y2

dydx

dydx

dydx

ddx

ddx

ddx

ddx

ddx

dax

dx

dgdx

dfdx

dfg

dx

dvdx

dfdv

dfdx

g(df/dx) − f(dg/dx)

g2

fg

ddx

dfdx

ddx

dxdy

1dx/dy

dydx

DIFFERENTIAL AND INTEGRAL CALCULUS 3-19

Page 23: 03 mathematics

Example Find dy/dx for y = x cos (1 − x2). Using

= x cos (1 − x2) + cos (1 − x2) x (3-8)

cos (1 − x2) = −sin (1 − x2) (1 − x2) (3-19)

= −sin (1 − x2)(0 − 2x) (3-5), (3-10)

= x−1/2 (3-10)

= 2x3/2 sin (1 − x2) + x−1/2 cos (1 − x2)

Example Find the derivative of tan x with respect to sin x.

v = sin x

y = tan x Using

= = (3-12)

= (3-9)

= sec2 x/cos x (3-18), (3-20)

Very often in experimental sciences and engineering functions andtheir derivatives are available only through their numerical values. Inparticular, through measurements we may know the values of a func-tion and its derivative only at certain points. In such cases the preced-ing operational rules for derivatives, including the chain rule, can beapplied numerically.

Example Given the following table of values for differentiable functions fand g; evaluate the following quantities:

x f(x) f ′(x) g(x) g′(x)

1 3 1 4 −43 0 2 4 74 −2 10 3 6

[ f(x) + g(x)]|x = 4 = f ′(4) + g′(4) = 10 + 6 = 16

′(1) = = = = 1

Higher Differentials The first derivative of f(x) with respect to xis denoted by f ′ or df/dx. The derivative of the first derivative is calledthe second derivative of f(x) with respect to x and is denoted by f″, f (2),or d 2 f/dx2; and similarly for the higher-order derivatives.

Example Given f(x) = 3x3 + 2x + 1, calculate all derivative values at x = 3.

= 9x2 + 2 x = 3, f ′(3) = 9(9) + 2 = 83

= 18x x = 3, f″(3) = 18(3) = 54

= 18 x = 3, f″(3) = 18

= 0 for n ≥ 4

If f ′(x) > 0 on (a, b), then f is increasing on (a, b). If f ′(x) < 0 on (a, b), then f is decreasing on (a, b).

The graph of a function y = f(x) is concave up if f ′ is increasing on (a, b); it is concave down if f ′ is decreasing on (a, b).

If f ″(x) exists on (a, b) and if f ″(x) > 0, then f is concave up on (a, b).If f ″(x) < 0, then f is concave down on (a, b).

An inflection point is a point at which a function changes the direc-tion of its concavity.

dnf(x)

dxn

d3f(x)

dx3

d2f(x)

dx2

df(x)

dx

1616

1 ⋅ 4 − 3(−4)

(4)2

f ′(1)g(1) − f(1)g′(1)

[g(1)]2

fg

ddx

1

d s

dixn x

d tan x

dx

dxdv

dydx

dydv

d tan xd sin x

12

dydx

12

dx

dx

ddx

ddx

ddx

ddx

dydx

Indeterminate Forms: L’Hospital’s Theorem Forms of thetype 0/0, ∞/∞, 0 × ∞, etc., are called indeterminates. To find thelimiting values that the corresponding functions approach, L’Hospi-tal’s theorem is useful: If two functions f(x) and g(x) both becomezero at x = a, then the limit of their quotient is equal to the limit ofthe quotient of their separate derivatives, if the limit exists, or is +∞or −∞.

Example Find limn→0

.

Here limx→0

= limx→0

= limx→0

= 1

Example Find limx→∞

.

limx→∞

= limx→∞

= limx→∞

Obviously limx→∞

= ∞ since repeated application of the rule will reduce the

denominator to a finite number 1000! while the numerator remains infinitelylarge.

Example Find limx→∞

x3 e−x.

limx→∞

x3 e−x = limx→∞

= limx→∞

= 0

Example Find limx→0

(1 − x)1/x.

Let y = (1 − x)1/x

ln y = (1/x) ln (1 − x)

limx→0

(ln y) = limx→0

= −1

Therefore, limx→0

y = e−1

Partial Derivative The abbreviation z = f(x, y) means that z is afunction of the two variables x and y. The derivative of z with respectto x, treating y as a constant, is called the partial derivative withrespect to x and is usually denoted as ∂z/∂x or ∂f(x, y)/∂x or simply fx.Partial differentiation, like full differentiation, is quite simple to apply.Conversely, the solution of partial differential equations is appreciablymore difficult than that of differential equations.

Example Find ∂z/∂x and ∂z/∂y for z = yex 2 + xey.

= y + ey = ex2 + x

= 2xyex2 + ey = ex2 + xey

Order of Differentiation It is generally true that the order ofdifferentiation is immaterial for any number of differentiations orvariables provided the function and the appropriate derivatives arecontinuous. For z = f(x, y) it follows:

= =

General Form for Partial Differentiation1. Given f(x, y) = 0 and x = g(t), y = h(t).

Then = +

= 2

+ 2 + 2

+

+ d2ydt2

∂f∂y

d 2xdt2

∂f∂x

dydt

∂2f∂y2

dydt

dxdt

∂2f∂x ∂y

dxdt

∂2f∂x2

d 2fdt2

dydt

∂f∂y

dxdt

∂f∂x

dfdt

∂3f∂x ∂y2

∂3f∂y ∂x ∂y

∂3f∂y2 ∂x

∂ey

∂y

∂y∂y

∂z∂y

∂x∂x

∂ex2

∂x

∂z∂x

ln(1 − x)

x

6ex

x3

ex

1.1x

x1000

(ln 1.1)(1.1)x

1000x999

d(1.1)x

dx1000

(1.1)x

x1000

(1.1)x

x1000

cos x

1d sin x

dxsin x

x

sin x

x

3-20 MATHEMATICS

Page 24: 03 mathematics

Example Find df/dt for f = xy, x = ρ sin t, y = ρ cos t.

= + = y(ρ cos t) + x(−ρ sin t)= ρ2 cos2 t − ρ2 sin2 t

2. Given f(x, y) = 0 and x = g(t, s), y = h(t, s).

Then = +

= +

Differentiation of Composite Function

Rule 1. Given f(x, y) = 0, then = − ≠ 0.

Rule 2. Given f(u) = 0 where u = g(x), then

= f ′(u)

= f″(u) 2

+ f ′(u)

Example Find df/dx for f = sin2 u and u = 1 − x2

=

= 2 sin u cos u (−2x)(1 − x2)−1/2

= −2 sin u cos u

Rule 3. Given f(u) = 0 where u = g(x,y), then

= f ′(u) + = f ′(u)

= f″ 2

+ f ′

= f″ + f ′

= f″ 2

+ f ′

MULTIVARIABLE CALCULUS APPLIED TO THERMODYNAMICS

Many of the functional relationships needed in thermodynamics aredirect applications of the rules of multivariable calculus. This sectionreviews those rules in the context of the needs of themodynamics. Theseideas were expounded in one of the classic books on chemical engineer-ing thermodynamics [see Hougen, O. A., et al., Part II, “Thermodynam-ics,” in Chemical Process Principles, 2d ed., Wiley, New York (1959)].

State Functions State functions depend only on the state of thesystem, not on past history or how one got there. If z is a function oftwo variables, x and y, then z(x,y) is a state function, since z is knownonce x and y are specified. The differential of z is

dz = M dx + N dy

The line integral

C

(M dx + N dy)

is independent of the path in x-y space if and only if

= (3-42)∂N∂x

∂M∂y

∂2u∂y2

∂u∂y

∂2f∂y2

∂2u∂x ∂y

∂u∂y

∂u∂x

∂2f∂x ∂y

∂2u∂x2

∂u∂x

∂2f∂x2

∂u∂y

∂f∂y

∂u∂x

∂f∂x

1 − u2

u

12

d1 − x2

dxd sin2 u

dudfdx

d 2udx2

dudx

d2fdx2

dudx

dfdx

∂f∂y

∂f/∂x∂f/∂y

dydx

∂y∂s

∂f∂y

∂x∂s

∂f∂x

∂f∂s

∂y∂t

∂f∂y

∂x∂t

∂f∂x

∂f∂t

d ρ cos t

dt∂(xy)

∂y

d ρ sin t

dt∂(xy)

∂xdfdt

The total differential can be written as

dz = ydx +

xdy (3-43)

and the following condition guarantees path independence.

y=

x

or = (3-44)

Example Suppose z is constant and apply Eq. (3-43).

0 = ydx +

xdy

z

Rearrangement gives

y= −

z

x= − (3-45)

Alternatively, divide Eq. (3-43) by dy when holding some other variable w con-stant to obtain

w=

y

w+

x(3-46)

Also divide both numerator and denominator of a partial derivative by dw whileholding a variable y constant to get

y= =

y

y(3-47)

Themodynamic State Functions In thermodynamics, the statefunctions include the internal energy, U; enthalpy, H; and Helmholtzand Gibbs free energies, A and G, respectively, defined as follows:

H = U + p VA = U − TSG = H − TS = U + pV − TS = A + pV

S is the entropy, T the absolute temperature, p the pressure, and V thevolume. These are also state functions, in that the entropy is specifiedonce two variables (like T and p) are specified, for example. Likewise,V is specified once T and p are specified; it is therefore a state function.

All applications are for closed systems with constant mass. If aprocess is reversible and only p-V work is done, the first law and dif-ferentials can be expressed as follows.

dU = T dS − p dVdH = T dS + V dpdA = −S dT − p dVdG = −S dT + V dp

Alternatively, if the internal energy is considered a function of S and V,then the differential is:

dU = V

dS + S

dV

This is the equivalent of Eq. (3-43) and gives the following definitions.

T = V, p = −

S

Since the internal energy is a state function, then Eq. (3-44) must besatisfied.

=

This is S= −

V

This is one of the Maxwell relations, and the other Maxwell relationscan be derived in a similar fashion by applying Eq. (3-44).

See Sec. 4, Thermodynamics, “Constant-Composition Systems.”

∂p∂S

∂T∂V

∂2U∂S ∂V

∂2U∂V ∂S

∂U∂V

∂U∂S

∂U∂V

∂U∂S

∂w∂x

∂z∂w

(∂z/∂w)y(∂x/∂w)y

∂z∂x

∂z∂y

∂x∂y

∂z∂x

∂z∂y

(∂y/∂x)z(∂y/∂z)x

∂z∂y

∂y∂x

∂z∂x

∂z∂y

∂z∂x

∂2z∂x ∂y

∂2z∂y ∂x

∂z∂y

∂∂x

∂z∂x

∂∂y

∂z∂y

∂z∂x

DIFFERENTIAL AND INTEGRAL CALCULUS 3-21

Page 25: 03 mathematics

Partial Derivatives of All Thermodynamic Functions Thevarious partial derivatives of the thermodynamic functions can beclassified into six groups. In the general formulas below, the variablesU, H, A, G or S are denoted by Greek letters, while the variables V, T,or p are denoted by Latin letters.

Type I (3 possibilities plus reciprocals)

General: c; Specific:

V

Eq. (3-45) gives

V= −

p

T= −

Type II (30 possibilities plus reciprocals)

General: c; Specific:

V

The differential for G gives

V= −S + V

V

Using the other equations for U, H, A, or S gives the other possibilities.Type III (15 possibilities plus reciprocals)

General: α; Specific: S

First expand the derivative using Eq. (3-45).

S= −

V

T= −

Then evaluate the numerator and denominator as type II derivatives.

S= − =

These derivatives are of importance for reversible, adiabatic processes(such as in an ideal turbine or compressor), since then the entropy isconstant. An example is the Joule-Thomson coefficient for constant H.

H= −V + T

p

Type IV (30 possibilities plus reciprocals)

General: c; Specific:

p

Use Eq. (3-47) to introduce a new variable.

p= p p

=

This operation has created two type II derivatives; by substitution weobtain

p=

Type V (60 possibilities plus reciprocals)

General: β; Specific: A

Start from the differential for dG. Then we get

A= −S

A+ V

The derivative is type III and can be evaluated by using Eq. (3-45).

A= S + V

(∂A/∂p)T(∂A/∂T)p

∂G∂p

∂T∂p

∂G∂p

∂G∂p

∂α∂b

SS + p (∂V/∂T)p

∂G∂A

(∂G/∂T)p(∂A/∂T)p

∂T∂A

∂G∂T

∂G∂A

∂G∂A

∂α∂β

∂V∂T

1Cp

∂T∂p

∂∂Vp

T

∂∂VT

p

CVT

CT

V

−∂∂VT

p

∂∂Vp

T

∂V∂T

(∂S/∂T)V(∂S/∂V)T

∂V∂S

∂S∂T

∂V∂T

∂V∂T

∂a∂b

∂p∂T

∂G∂T

∂G∂T

∂α∂b

(∂V/∂T)p(∂V/∂p)T

∂p∂V

∂V∂T

∂p∂T

∂p∂T

∂a∂b

The two type II derivatives are then evaluated.

A= + V

These derivatives are also of interest for free expansions or isentropicchanges.

Type VI (30 possibilities plus reciprocals)

General: γ; Specific: H

We use Eq. (3-47) to obtain two type V derivatives.

H=

These can then be evaluated using the procedures for Type V derivatives.

INTEGRAL CALCULUS

Indefinite Integral If f ′(x) is the derivative of f(x), an antideriv-ative of f ′(x) is f(x). Symbolically, the indefinite integral of f ′(x) is

f ′(x) dx = f(x) + c

where c is an arbitrary constant to be determined by the problem. Byvirtue of the known formulas for differentiation the following rela-tionships hold (a is a constant):

(du + dv + dw) = du + dv + dw (3-48)

a dv = a dv (3-49)

vn dv = + c (n ≠ −1) (3-50)

= ln |v| + c (3-51)

av dv = + c (3-52)

ev dv = ev + c (3-53)

sin v dv = −cos v + c (3-54)

cos v dv = sin v + c (3-55)

sec2 v dv = tan v + c (3-56)

csc2 v dv = −cot v + c (3-57)

sec v tan v dv = sec v + c (3-58)

csc v cot v dv = −csc v + c (3-59)

= tan−1 + c (3-60)

= sin−1 + c (3-61)

= ln + c (3-62)

= ln |v + v2 a2| + c (3-63)

sec v dv = ln (sec v + tan v) + c (3-64)

dvv2 a2

v − av + a

12a

dvv2 − a2

va

dva2 − v2

va

1a

dvv2 + a2

av

ln a

dvv

vn + 1

n + 1

(∂G/∂T)H(∂A/∂T)H

∂G∂A

∂G∂A

∂α∂β

Sp (∂V/∂p)TS + p (∂V/∂T)p

∂G∂p

3-22 MATHEMATICS

Page 26: 03 mathematics

csc v dv = ln (csc v − cot v) + c (3-65)

Example Find ∫ (3x2 + ex − 10) dx using Eq. (3-48). ∫ (3x2 + ex − 10) dx =3 ∫ x2 dx + ∫ ex dx − 10 ∫ dx = x3 + ex − 10x + c (by Eqs. 3-50, 3-53).

Example Find . Let v = 2 − 3x2; dv = −6x dx

Thus = 7 = − = − , with v 2 3x2 and dv 6x dx

= − ln |v| + c = − ln |2 − 3x2| + c

Example—Constant of Integration By definition the derivative ofx3 is 3x2, and x3 is therefore the integral of 3x2. However, if f = x3 + 10, it followsthat f ′ = 3x2, and x3 + 10 is therefore also the integral of 3x2. For this reason theconstant c in ∫ 3x2 dx = x3 + c must be determined by the problem conditions,i.e., the value of f for a specified x.

Methods of Integration In practice it is rare when generallyencountered functions can be directly integrated. For example, theintegrand in ∫ sin x dx which appears quite simple has no elementary function whose derivative is sin x. In general, there is no explicit wayof determining whether a particular function can be integrated into anelementary form. As a whole, integration is a trial-and-error proposi-tion which depends on the effort and ingenuity of the practitioner.The following are general procedures which can be used to find theelementary forms of the integral when they exist. When they do notexist or cannot be found either from tabled integration formulas ordirectly, the only recourse is series expansion as illustrated later.Indefinite integrals cannot be solved numerically unless they are rede-fined as definite integrals (see “Definite Integral”), i.e., F(x) = ∫ f(x) dx,indefinite, whereas F(x) = ∫ x

a f(t) dt, definite.Direct Formula Many integrals can be solved by transformation

in the integrand to one of the forms given previously.

Example Find ∫ x2 3x3 + 10 dx. Let v = 3x3 + 10 for which dv = 9x2 dx.Thus

x2 3x3 + 10 dx = (3x3 + 10)1/2 (x2 dx)

= (3x3 + 10)1/2(9x2 dx) = v1/2 dv

= + c [by Eq. (3-50)]

= (3x3 + 10)3/2 + c

Trigonometric Substitution This technique is particularly welladapted to integrands in the form of radicals. For these the function istransformed into a trigonometric form. In the latter form they may bemore easily recognizable relative to the identity formulas. These func-tions and their transformations are

x2 − a2 Let x = a sec θ

x2 + a2 Let x = a tan θ

a2 − x2 Let x = a sin θ

Example Find dx. Let x = sin θ; then dx = cos θ dθ.

3 dx = 3 cos θ dθ23

2/31 − sin2θ

(2/3)2 sin2 θ(2/3)2 − x2

x2

23

23

49x2

x2

227

v3/2

3⁄2

19

19

19

76

76

dvv

76

−6x dx2 − 3x2

76

x dx2 − 3x2

7x dx2 − 3x2

7x dx2 − 3x2

= 3 dθ = 3 cot2 θ dθ

= −3 cot θ − 3θ + c by trigonometric transform

= − − 3 sin−1 x + c in terms of x

Algebraic Substitution Functions containing elements of thetype (a + bx)1/n are best handled by the algebraic transformation yn =a + bx.

Example Find . Let 3 + 4x = y4; then 4dx = 4y3 dy and

= = y2(y4 − 3) dy

= − + c = (3 + 4x)7/4 − (3 + 4x)3/4 + c

General The number of possible transformations one might useare unlimited. No specific overall rules can be given. Success in han-dling integration problems depends primarily upon experience andingenuity. The following example illustrates the extent to which alter-native approaches are possible.

Example Find . Let ex = y; then ex dx = dy or dx = 1/y dy.

= = = ln = ln

Partial Fractions Rational functions are of the type f(x)/g(x)where f(x) and g(x) are polynomial expressions of degrees m and nrespectively. If the degree of f is higher than g, perform the algebraicdivision—the remainder will then be at least one degree less than thedenominator. Consider the following types:

Type 1 Reducible denominator to linear unequal factors. Forexample,

=

= + +

=

=

Equate coefficients and solve for A, B, and C.

A + B + C = 0−3A + B = 0

2A − 2B − 4C = 1A = 1⁄12, B = d, C = −s

= + −

Hence

= + − Parts An extremely useful formula for integration is the relation

d(uv) = u dv + v du

and uv = u dv + v du

or u dv = uv − v du

dx3(x − 1)

dx4(x − 2)

dx12(x + 2)

dxx3 − x2 − 4x + 4

13(x − 1)

14(x − 2)

112(x + 2)

1x3 − x2 − 4x + 4

x2(A + B + C) + x(−3A + B) + (2A − 2B − 4C)

(x + 2)(x − 2)(x − 1)

A(x − 2)(x − 1) + B(x + 2)(x − 1) + C(x + 2)(x − 2)

(x + 2)(x − 2)(x − 1)

Cx − 1

Bx − 2

Ax + 2

1(x + 2)(x − 2)(x − 1)

1x3 − x2 − 4x + 4

ex − 1

ex

y − 1

ydy

y2 − y

(1/y) dy

y − 1dx

ex − 1

dxex − 1

14

128

y3

3

34

y7

7

14

14

y4

4− 3 y3 dy

y

x dx(3 + 4x)1/4

x dx(3 + 4x)1/4

32

4 − 9x2

x

cos2 θsin2 θ

DIFFERENTIAL AND INTEGRAL CALCULUS 3-23

Page 27: 03 mathematics

No general rule for breaking an integrand can be given. Experiencealone limits the use of this technique. It is particularly useful fortrigonometric and exponential functions.

Example Find xex dx. Let

u = x and dv = ex dx

du = dx v = ex

Therefore xex dx = xex − ex dx

= xex − ex + c

Example Find ex sin x dx. Let

u = ex dv = sin x dx

du = ex dx v = −cos x

ex sin x dx = −ex cos x + ex cos x dx

Again u = ex dv = cos x dx

du = ex dx v = sin x

ex sin x dx = −ex cos x + ex sin x − ex sin x dx + c

= (ex/2)(sin x − cos x) +

Series Expansion When an explicit function cannot be found,the integration can sometimes be carried out by a series expansion.

Example Find e−x2dx. Since

e−x2 = 1 − x2 + − + ⋅ ⋅ ⋅

e−x2dx = dx − x2 dx + dx − dx + ⋅ ⋅ ⋅

= x − + − + ⋅ ⋅ ⋅ for all x

Definite Integral The concept and derivation of the definiteintegral are completely different from those for the indefinite integral.These are by definition different types of operations. However, theformal operation ∫ as it turns out treats the integrand in the same wayfor both.

Consider the function f(x) = 10 − 10e−2x. Define x1 = a and xn = b,and suppose it is desirable to compute the area between the curve andthe coordinate axis y = 0 and bounded by x1 = a, xn = b. Obviously, bya sufficiently large number of rectangles this area could be approxi-mated as closely as desired by the formula

n − 1

i = 1

f(ξi)(xi + 1 − xi) = f(ξ1)(x2 − a) + f(ξ2)(x3 − x2)

+ ⋅ ⋅ ⋅ + f(ξn − 1)(b − xn − 1) xi − 1 ≤ ξi − 1 ≤ xi

The definite integral of f(x) is defined as

b

af(x) dx = lim

n→∞ n

i = 1

f(ξi)(xi + 1 − xi)

where the points x1, x2, . . . , xn are equally spaced. Thus, the value of a definite integral depends on the limits a, b, and

any selected variable coefficients in the function but not on thedummy variable of integration x. Symbolically

F(x) = f(x) dx indefinite integral where dF/dx = f(x)

or F(a, b) = b

af(x) dx definite integral

F(α) = b

af(x, α) dx

There are certain restrictions of the integration definition, “The func-tion f(x) must be continuous in the finite interval (a, b) with at most afinite number of finite discontinuities,” which must be observedbefore integration formulas can be generally applied. Relaxing two of

x7

7.3!

x5

5.2!

x3

3

x6

3!

x4

2!

x6

3!

x4

2!

c2

these restrictions gives rise to so-called improper integrals andrequires special handling. These occur when

1. The limits of integration are not both finite, i.e., ∫ ∞0 e−x dx.

2. The function becomes infinite within the interval of integration,i.e.,

1

0dx

Techniques for determining when integration is valid under theseconditions are available in the references.

Properties The fundamental theorem of calculus states

b

af(x) dx = F(b) − F(a)

where dF(x)/dx = f(x)

Other properties of the definite integral are

b

ac[ f(x) dx] = c b

af(x) dx

b

a[ f1(x) + f2(x)] dx = b

af1(x) dx + b

af2(x) dx

b

af(x) dx = −a

bf(x) dx

b

af(x) dx = c

af(x) dx + b

cf(x) dx

b

af(x) dx = (b − a) f(ξ) for some ξ in (a, b)

b

af(x) dx = f(b)

b

af(x) dx = −f(a)

= b

adx if a and b are constant

b

adx d

cf(x, α) dα = d

cdα b

af(x, α) dx (3-66)

When F(x) = b(x)

a(x)f(x, y) dy, the Leibniz rule gives

= f [x, b(x)] − f [x, a(x)] + b(x)

a(x)dy

Example Find 2

0. Direct application of the formula would yield

the incorrect value

2

0= −

0

2

= −2

It should be noted that f(x) = 1/(x − 1)2 becomes unbounded as x → 1and by Rule 2 the integral diverges and hence is said not to exist.

Methods of Integration All the methods of integration availablefor the indefinite integral can be used for definite integrals. In addi-tion, several others are available for the latter integrals and are indi-cated below.

Change of Variable This substitution is basically the same as pre-viously indicated for indefinite integrals. However, for definite inte-grals, the limits of integration must also be changed: i.e., for x = φ(t),

b

af(x) dx = t1

t0

f [φ(t)]φ′(t) dt

where t = t0 when x = at = t1 when x = b

1x − 1

dx(x − 1)2

dx(x − 1)2

∂f∂x

dadx

dbdx

dFdx

∂f(x, α)

∂αdF(α)

∂∂a

∂∂b

1x

3-24 MATHEMATICS

Page 28: 03 mathematics

Example Find 4

016 − x2 dx. Let

x = 4 sin θ (x = 0, θ = 0)dx = 4 cos θ dθ (x = 4, θ = π/2)

Then 4

016 − x2 dx = 16 π/2

0cos2 θ dθ = 16[aθ + d sin 2θ]0

π/2 = 4π

Integration It is sometimes useful to generate a double integralto solve a problem. By this approach, the fundamental theorem indi-cated by Eq. (3-66) can be used.

Example Find 1

0dx

xb − xa

ln x

Consider 1

0xα dx = (α > −1)

Then multiplying both sides by dα and integrating between a and b,

b

adα 1

0xα dx = b

a= ln

But also

b

adα 1

0xα dx = 1

0dx b

axα dα = 1

0dx

Therefore 1

0dx = ln b + 1

a + 1

xb − xa

ln x

xb − xa

ln x

b + 1a + 1

dαα + 1

1α + 1

INFINITE SERIES 3-25

INFINITE SERIES

REFERENCES: de Brujin, N. G., Asymptotic Methods in Analysis, Dover, NewYork (1981); Folland, G. B., Advanced Calculus, Prentice-Hall, Saddle River,N.J. (2002); Gradshteyn, I. S., and I. M. Ryzhik, Tables of Integrals, Series, andProducts, Academic, New York (2000); Kaplan, W., Advanced Calculus, 5th ed.,Addison-Wesley, Redwood City, Calif. (2003).

DEFINITIONS

A succession of numbers or terms that are formed according to somedefinite rule is called a sequence. The indicated sum of the terms of a sequence is called a series. A series of the form a0 + a1(x − c) +a2(x − c)2 + ⋅ ⋅ ⋅ + an(x − c)n + ⋅ ⋅ ⋅ is called a power series.

Consider the sum of a finite number of terms in the geometricseries (a special case of a power series).

Sn = a + ar + ar 2 + ar 3 + ⋅ ⋅ ⋅ + arn − 1 (3-67)

For any number of terms n, the sum equals

Sn = a

In this form, the geometric series is assumed finite.In the form of Eq. (3-67), it can further be defined that the terms in

the series be nonending and therefore an infinite series.

S = a + ar + ar 2 + ⋅ ⋅ ⋅ + arn + ⋅ ⋅ ⋅ (3-68)

However, the defined sum of the terms [Eq. (3-67)]

Sn = a r ≠ 1

while valid for any finite value of r and n now takes on a differentinterpretation. In this sense it is necessary to consider the limit of Sn asn increases indefinitely:

S = limn→∞

Sn

= a limn→∞

For this, it is stated the infinite series converges if the limit of Sn

approaches a fixed finite value as n approaches infinity. Otherwise, theseries is divergent.

On this basis an analysis of

S = a limn→∞

shows that if r is less than 1 but greater than −1, the infinite series isconvergent. For values outside of the range −1 < r < 1, the series is divergent because the sum is not defined. The range −1 < r < 1 iscalled the region of convergence. (We assume a ≠ 0.)

1 − r n

1 − r

1 − rn

1 − r

1 − r n

1 − r

1 − rn

1 − r

There are also two types of convergent series. Consider the newseries

S = 1 − + − + ⋅ ⋅ ⋅ + (−1)n + 1 + ⋅ ⋅ ⋅ (3-69)

It can be shown that the series (3-69) does converge to the value S =log 2. However, if each term is replaced by its absolute value, theseries becomes unbounded and therefore divergent (unboundeddivergent):

S = 1 + + + + + ⋅ ⋅ ⋅ (3-70)

In this case the series (3-69) is defined as a conditionally convergentseries. If the replacement series of absolute values also converges, theseries is defined to converge absolutely.

Series (3-69) is further defined as an alternating series, while series(3-70) is referred to as a positive series.

OPERATIONS WITH INFINITE SERIES

1. The convergence or divergence of an infinite series is unaffectedby the removal of a finite number of finite terms. This is a trivial the-orem but useful to remember, especially when using the comparisontest to be described in the subsection “Tests for Convergence andDivergence.”

2. If a series is conditionally convergent, its sums can be made tohave any arbitrary value by a suitable rearrangement of the series; itcan in fact be made divergent or oscillatory (Riemann’s theorem).

3. A series of positive terms, if convergent, has a sum independentof the order of its terms; but if divergent, it remains divergent how-ever its terms are rearranged.

4. An oscillatory series can always be made to converge by group-ing terms.

5. A power series can be inverted, provided the first-degree term isnot zero. Given

y = b1x + b2x2 + b3 x3 + b4 x4 + b5 x5 + b6 x6 + b7 x7 + ⋅ ⋅ ⋅

then x = B1y + B2y2 + B3y3 + B4 y4 + B5y5 + B6y6 + B7y7 + ⋅ ⋅ ⋅where B1 = 1/b1

B2 = −b2 /b13

B3 = (1/b15 ) (2b2

2 − b1b3 )B4 = (1/b1

7 )(5b1b2b3 − b12 b4 − 5b2

3 )

Additional coefficients are available in the references.6. Two series may be added or subtracted term by term provided

each is a convergent series. The joint sum is equal to the sum (or dif-ference) of the individuals.

15

14

13

12

1n

14

13

12

Page 29: 03 mathematics

7. The sum of two divergent series can be convergent. Similarly, thesum of a convergent series and a divergent series must be divergent.

8. A power series may be integrated term by term to represent theintegral of the function within an interval of the region of conver-gence. If f(x) = a0 + a1x + a2x2 + ⋅ ⋅ ⋅ , then

x2

x1

f(x) dx = x2

x1

a0 dx + x2

x1

a1x dx + x2

x1

a2x2 dx + ⋅ ⋅ ⋅

9. A power series may be differentiated term by term and representsthe function df(x)/dx within the same region of convergence as f(x).

TESTS FOR CONVERGENCE AND DIVERGENCE

In general, the problem of determining whether a given series will con-verge or not can require a great deal of ingenuity and resourcefulness.There is no all-inclusive test which can be applied to all series. As the onlyalternative, it is necessary to apply one or more of the developed theo-rems in an attempt to ascertain the convergence or divergence of theseries under study. The following defined tests are given in relative orderof effectiveness. For examples, see references on advanced calculus.

1. Comparison Test. A series will converge if the absolute value ofeach term (with or without a finite number of terms) is less than thecorresponding term of a known convergent series. Similarly, a positiveseries is divergent if it is termwise larger than a known divergent seriesof positive terms.

2. nth-Term Test. A series is divergent if the nth term of the seriesdoes not approach zero as n becomes increasingly large.

3. Ratio Test. If the absolute ratio of the (n + 1) term divided bythe nth term as n becomes unbounded approaches

a. A number less than 1, the series is absolutely convergentb. A number greater than 1, the series is divergentc. A number equal to 1, the test is inconclusive4. Alternating-Series Leibniz Test. If the terms of a series are

alternately positive and negative and never increase in value, theabsolute series will converge, provided that the terms tend to zero asa limit.

5. Cauchy’s Root Test. If the nth root of the absolute value of thenth term, as n becomes unbounded, approaches

a. A number less than 1, the series is absolutely convergentb. A number greater than 1, the series is divergentc. A number equal to 1, the test is inconclusive6. Maclaurin’s Integral Test. Suppose an is a series of positive

terms and f is a continuous decreasing function such that f(x) ≥ 0 for 1 ≤ x < ∞ and f(n) = an. Then the series and the improper integral ∫

∞1 f(x) dx either both converge or both diverge.

SERIES SUMMATION AND IDENTITIES

Sums for the First n Numbers to Integer Powers

n

j = 1

j = = 1 + 2 + 3 + 4 + ⋅ ⋅ ⋅ + n

n

j = 1

j2 = = 12 + 22 + 32 + 42 + ⋅ ⋅ ⋅ + n2

n

j = 1

j3 = = 13 + 23 + 33 + ⋅ ⋅ ⋅ + n3

Arithmetic Progression

n

k = 1

[a + (k − 1)d] = a + (a + d) + (a + 2d)+ (a + 3d) + ⋅ ⋅ ⋅ + [a + (n − 1)]d

= na + n(n − 1)d

Geometric Progression

n

j = 1

ar j − 1 = a + ar + ar 2 + ar 3 + ⋅ ⋅ ⋅ + ar n − 1 = a r ≠ 11 − rn

1 − r

12

n2(n + 1)2

4

n(n + 1)(2n + 1)

6

n(n + 1)

2

Harmonic Progression

n

k = 0

= + + + + + ⋅ ⋅ ⋅ +

The reciprocals of the terms of the arithmetic-progression series arecalled harmonic progression. No general summation formulas areavailable for this series.

Binomial Series (see also Elementary Algebra)

(1 x)n = 1 nx + x2 x3 + ⋅ ⋅ ⋅ (x2 < 1)

Taylor’s Series

f(h + x) = f(h) + xf ′(h) + f ″(h) + f ′′′(h) + ⋅ ⋅ ⋅

or f(x) = f(x0) + f ′(x0) (x − x0) + (x − x0)2 + (x − x0)3 + ⋅ ⋅ ⋅

Example Find a series expansion for f(x) = ln (1 + x) about x0 = 0.

f ′(x) = (1 + x)−1, f″(x) = −(1 + x)−2, f ′′′(x) = 2(1 + x)−3, etc.

thus f(0) = 0, f ′(0) = 1, f″(0) = −1, f ′′′(1) = 2, etc.

ln (x + 1) = x − + − + ⋅ ⋅ ⋅ + (−1)n + 1 + ⋅ ⋅ ⋅

which converges for −1 < x ≤ 1.

Maclaurin’s Series

f(x) = f(0) + xf ′(0) + f″(0) + f ′′′(0) + ⋅ ⋅ ⋅

This is simply a special case of Taylor’s series when h is set to zero.

Exponential Series

ex = 1 + x + + + ⋅ ⋅ ⋅ + + ⋅ ⋅ ⋅ − ∞ < x < ∞

Logarithmic Series

ln x = + 2

+ 3

+ ⋅ ⋅ ⋅ (x > a)

ln x = 2 + 3

+ ⋅ ⋅ ⋅ (x > 0)

Trigonometric Series*

sin x = x − + − + ⋅ ⋅ ⋅ −∞ < x < ∞

cos x = 1 − + − + ⋅ ⋅ ⋅ −∞ < x < ∞

sin−1 x = x + + ⋅ ⋅ + ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ (x2 < 1)

tan−1 x = x − x3 + x5 − x7 + ⋅ ⋅ ⋅ (x2 < 1)

Taylor Series The Taylor series for a function of two variables,expanded about the point (x0, y0), is

17

15

13

x7

7

56

34

12

x5

5

34

12

x3

6

x6

6!

x4

4!

x2

2!

x7

7!

x5

5!

x3

3!

x − 1x + 1

13

x − 1x + 1

x − 1

x13

x − 1

x12

x − 1

x

xn

n!

x3

3!

x2

2!

x3

3!

x2

2!

xn

n

x4

4

x3

3

x2

2

f ′′′(x0)

3!f″(x0)

2!

x3

3!

x2

2!

n(n − 1)(n − 2)

3!n(n − 1)

2!

1a + nd

1a + 4d

1a + 3d

1a + 2d

1a + d

1a

1a + kd

3-26 MATHEMATICS

* tan x series has awkward coefficients and should be computed as

(sign) .sin x1 − sin2 x

Page 30: 03 mathematics

f(x, y) = f(x0, y0) + x0, y0

(x − x0) + x0, y0

(y − y0)

+ x0, y0

(x − x0)2 + 2 x0, y0

(x − x0)(y − y0)

+ x0, y0

(y − y0)2 + ⋅ ⋅ ⋅∂2f∂y2

∂2f∂x∂y

∂2f∂x2

12!

∂f∂y

∂f∂x

Partial Sums of Infinite Series, and How They Grow Calcu-lus textbooks devote much space to tests for convergence and divergenceof series that are of little practical value, since a convergent series eitherconverges rapidly, in which case almost any test (among those presentedin the preceding subsections) will do; or it converges slowly, in whichcase it is not going to be of much use unless there is some way to get atits sum without adding up an unreasonable number of terms. To findout, as accurately as possible, how fast a convergent series converges andhow fast a divergent series diverges, see Boas, R. P., Jr., Am. Math. Mon.84: 237–258 (1977).

COMPLEX VARIABLES 3-27

COMPLEX VARIABLES

REFERENCES: Ablowitz, M. J., and A. S. Fokas, Complex Variables: Introduc-tion and Applications, Cambridge University Press, New York (2003); Asmar, N.,and G. C. Jones, Applied Complex Analysis with Partial Differential Equations,Prentice-Hall, Upper Saddle River, N.J. (2002); Brown, J. W., and R. V.Churchill, Complex Variables and Applications, 7th ed., McGraw-Hill, New York(2003); Kaplan, W., Advanced Calculus, 5th ed., Addison-Wesley, Redwood City,Calif. (2003); Kwok, Y. K., Applied Complex Variables for Scientists and Engi-neers, Cambridge University Press, New York (2002); McGehee, O. C., An Intro-duction to Complex Analysis, Wiley, New York (2000); Priestley, H. A.,Introduction to Complex Analysis, Oxford University Press, New York (2003).

Numbers of the form z = x + iy, where x and y are real, i2 = −1, arecalled complex numbers. The numbers z = x + iy are representable inthe plane as shown in Fig. 3-45. The following definitions and termi-nology are used:

1. Distance OP = r = modulus of z written |z|. |z| = x2 + y2.2. x is the real part of z.3. y is the imaginary part of z.4. The angle θ, 0 ≤ θ < 2π, measured counterclockwise from the

positive x axis to OP is the argument of z. θ = arctan y/x = arcsin y/r =arccos x/r if x ≠ 0, θ = π/2 if x = 0 and y > 0.

5. The numbers r, θ are the polar coordinates of z.6. z⎯ = x − iy is the complex conjugate of z.

ALGEBRA

Let z1 = x1 + iy1, z2 = x2 + iy2.Equality z1 = z2 if and only if x1 = x2 and y1 = y2.Addition z1 + z2 = (x1 + x2) + i(y1 + y2).Subtraction z1 − z2 = (x1 − x2) + i(y1 − y2).Multiplication z1 ⋅ z2 = (x1x2 − y1y2) + i(x1y2 + x2y1).

Division z1 /z2 = + i , z2 ≠ 0.

SPECIAL OPERATIONS

zz = x2 + y2 = |z|2; z1z2 = z1 z2; zz1 = z1; z1z2 = z1z2; |z1 ⋅ z2| = |z1| ⋅ |z2|;arg (z1 ⋅ z2) = arg z1 + arg z2; arg (z1 /z2) = arg z1 − arg z2; i4n = 1 for n anyinteger; i2n = −1 where n is any odd integer; z + z = 2x; z − z⎯ = 2iy.

Every complex quantity can be expressed in the form x + iy.

TRIGONOMETRIC REPRESENTATION

By referring to Fig. 3-45, there results x = r cos θ, y = r sin θ so that z = x + iy = r (cos θ + i sin θ), which is called the polar form of the

x2y1 − x1y2

x 22 + y2

2

x1x2 + y1y2

x 22 + y2

2

complex number. cos θ + i sin θ = e iθ. Hence z = x + iy = reiθ. z = x −iy = re−iθ. Two important results from this are cos θ = (eiθ + e−iθ)/2 and sin θ = (eiθ − e−iθ)/2i. Let z1 = r1e iθ1, z2 = r2e iθ2. This form is conve-nient for multiplication for z1z2 = r1 r2 e i(θ1 + θ2) and for division for z1 /z2 = (r1 /r2)ei(θ1 − θ2), z2 ≠ 0.

POWERS AND ROOTS

If n is a positive integer, zn = (reiθ)n = r neinθ = r n(cos nθ + i sin nθ).If n is a positive integer,

z1/ n = r 1/ nei[(θ + 2kπ)/n] = r 1/n cos + i sin and selecting values of k = 0, 1, 2, 3, . . . , n − 1 give the n distinct valuesof z1/n. The n roots of a complex quantity are uniformly spaced around acircle, with radius r 1/n, in the complex plane in a symmetric fashion.

Example Find the three cube roots of −8. Here r = 8, θ = π. The roots are z0 = 2(cos π/3 + i sin π/3) = 1 + i 3, z1 = 2(cos π + i sin π) = −2, z2 =2(cos 5π/3 + i sin 5π/3) = 1 − i 3.

ELEMENTARY COMPLEX FUNCTIONS

Polynomials A polynomial in z, anzn + an − 1zn − 1 + ⋅ ⋅ ⋅ + a0, wheren is a positive integer, is simply a sum of complex numbers times inte-gral powers of z which have already been defined. Every polynomialof degree n has precisely n complex roots provided each multiple rootof multiplicity m is counted m times.

Exponential Functions The exponential function ez is definedby the equation ez = ex + iy = ex ⋅ eiy = ex(cos y + i sin y). Properties: e0 =1; ez1 ⋅ ez2 = ez1 + z2; ez1/ez2 = ez1 − z2; ez + 2kπi = ez.

Trigonometric Functions sin z = (eiz − e−iz)/2i; cos z = (eiz + e−iz)/2;tan z = sin z/cos z; cot z = cos z/sin z; sec z = 1/cos z; csc z = 1/sin z.Fundamental identities for these functions are the same as their realcounterparts. Thus cos2 z + sin2 z = 1, cos (z1 z2) = cos z1 cos z2 "sin z1 sin z2, sin (z1 z2) = sin z1 cos z2 cos z1 sin z2. The sine andcosine of z are periodic functions of period 2π; thus sin (z + 2π) =sin z. For computation purposes sin z = sin (x + iy) = sin x cosh y +i cos x sinh y, where sin x, cosh y, etc., are the real trigonometric andhyperbolic functions. Similarly, cos z = cos x cosh y − i sin x sinh y. Ifx = 0 in the results given, cos iy = cosh y, sin iy = i sinh y.

Example Find all solutions of sin z = 3. From previous data sin z =sin x cosh y + i cos x sinh y = 3. Equating real and imaginary parts sin x cosh y =3, cos x sinh y = 0. The second equation can hold for y = 0 or for x = π/2, 3π/2,. . . . If y = 0, cosh 0 = 1 and sin x = 3 is impossible for real x. Therefore, x = π/2, 3π/2, . . . (2n + 1)π/2, n = 0, 1, 2, . . . . However, sin 3π/2 = −1and cosh y ≥ 1. Hence x = π/2, 5π/2, . . . . The solution is z = [(4n + 1)π]/2 +i cosh−13, n = 0, 1, 2, 3, . . . .

Example Find all solutions of ez = −i. ez = ex(cos y + i sin y) = −i. Equatingreal and imaginary parts gives ex cos y = 0, ex sin y = −1. From the first y = π/2,3π/2, . . . . But ex > 0. Therefore, y = 3π/2, 7π/2, −π/2, . . . . Then x = 0. Thesolution is z = i[(4n + 3)π]/2.

Two important facets of these functions should be recognized. First,the sin z is unbounded; and, second, ez takes all complex values except 0.

θ + 2kπ

nθ + 2kπ

n

FIG. 3-45 Complex plane.

Page 31: 03 mathematics

Hyperbolic Functions sinh z = (ez − e−z)/2; cosh z = (ez + e−z)/2;tanh z = sinh z/cosh z; coth z = cosh z/sinh z; csch z = 1/sinh z; sech z =1/cosh z. Identities are: cosh2 z − sinh2 z = 1; sinh (z1 + z2) = sinh z1 coshz2 + cosh z1 sinh z2; cosh (z1 + z2) = cosh z1 cosh z2 +sinh z1 sinh z2; cosh z + sinh z = ez; cosh z − sinh z = e−z. The hyper-bolic sine and hyperbolic cosine are periodic functions with the imag-inary period 2πi. That is, sinh (z + 2πi) = sinh z.

Logarithms The logarithm of z, log z = log |z| + i(θ + 2nπ), wherelog |z| is taken to the base e and θ is the principal argument of z,that is, the particular argument lying in the interval 0 ≤ θ < 2π. The log-arithm of z is infinitely many valued. If n = 0, the resulting logarithm iscalled the principal value. The familiar laws log z1z2 = log z1 + log z2, logz1 /z2 = log z1 − log z2, log zn = n log z hold for the principal value.

General powers of z are defined by zα = eα log z. Since log z is infi-nitely many valued, so too is zα unless α is a rational number.

DeMoivre’s formula can be derived from properties of ez.

zn = rn (cos θ + i sin θ)n = rn (cos nθ + i sin nθ)

Thus (cos θ + i sin θ)n = cos nθ + i sin nθ

COMPLEX FUNCTIONS (ANALYTIC)

In the real-number system a greater than b(a > b) and b less than c(b < c) define an order relation. These relations have no meaning forcomplex numbers. The absolute value is used for ordering. Someimportant relations follow: |z| ≥ x; |z| ≥ y; |z1 z2| ≤ |z1| + |z2|; |z1 − z2| ≥||z1| − |z2||; |z| ≥ (|x| + |y|)/2. Parts of the complex plane, commonlycalled regions or domains, are described by using inequalities.

Example |z − 3| ≤ 5. This is equivalent to (x − 3)2 + y2 ≤ 5, which is theset of all points within and on the circle, centered at x = 3, y = 0 of radius 5.

Example |z − 1| ≤ x represents the set of all points inside and on theparabola 2x = y2 + 1 or, equivalently, 2x ≥ y2 + 1.

Functions of a Complex Variable If z = x + iy, w = u + iv and iffor each value of z in some region of the complex plane one or morevalues of w are defined, then w is said to be a function of z, w = f(z).Some of these functions have already been discussed, e.g., sin z, log z.All functions are reducible to the form w = u(x, y) + iv(x, y), where u,v are real functions of the real variables x and y.

Example z3 = (x + iy)3 = x3 + 3x2(iy) + 3x(iy)2 + (iy)3 = (x3 − 3xy2) +i(3x2y − y3).

Differentiation The derivative of w = f(z) is

= lim∆z→0

and for the derivative to exist the limit must be the same no matterhow ∆z approaches zero. If w1, w2 are differentiable functions of z, thefollowing rules apply:

= = w2 + w1

=

and = nw1n − 1

For w = f(z) to be differentiable, it is necessary that ∂u/∂x = ∂v/∂y and∂v/∂x = −∂u/∂y. The last two equations are called the Cauchy-Riemann equations. The derivative

= + i = − i

If f(z) possesses a derivative at zo and at every point in some neighbor-hood of z0, then f(z) is said to be analytic or homomorphic at z0. If the

∂u∂y

∂v∂y

∂v∂x

∂u∂x

dwdz

dw1dz

dw1n

dz

w2(dw1/dz) − w1(dw2/dz)

w22

d(w1/w2)

dz

dw2dz

dw1dz

d(w1w2)

dzdw2dz

dw1dz

d(w1 w2)

dz

f(z + ∆z) − f(z)

∆zdwdz

Cauchy-Riemann equations are satisfied and

u, v, , , ,

are continuous in a region of the complex plane, then f(z) is analytic inthat region.

Example w = zz = x2 + y2. Here u = x2 + y2, v = 0. ∂u/∂x = 2x, ∂u/∂y = 2y,∂v/∂x = ∂v/∂y = 0. These are continuous everywhere, but the Cauchy-Riemannequations hold only at the origin. Therefore, w is nowhere analytic, but it is dif-ferentiable at z = 0 only.

Example w = ez = ex cos y + iex sin y. u = ex cos y, v = ex sin y. ∂u/∂x = ex cosy, ∂u/∂y = −ex sin y, ∂v/∂x = ex sin y, ∂v/∂y = ex cos y. The continuity and Cauchy-Riemann requirements are satisfied for all finite z. Hence ez is analytic (exceptat ∞) and dw/∂z = ∂u/∂x + i(∂v/∂x) = ez.

Example w = = = − i

It is easy to see that dw/dz exists except at z = 0. Thus 1/z is analytic except at z = 0.

Singular Points If f(z) is analytic in a region except at certainpoints, those points are called singular points.

Example 1/z has a singular point at zero.

Example tan z has singular points at z = (2n + 1)(π/2), n = 0, 1, 2, . . . .

The derivatives of the common functions, given earlier, are the sameas their real counterparts.

Example (d/dz)(log z) = 1/z, (d/dz)(sin z) = cos z.

Harmonic Functions Both the real and the imaginary parts ofany analytic function f = u + iv satisfy Laplace’s equation ∂2φ/∂x2 +∂2φ/∂y2 = 0. A function which possesses continuous second partial deriv-atives and satisfies Laplace’s equation is called a harmonic function.

Example ez = ex cos y + iex sin y. u = ex cos y, ∂u/∂x = ex cos y, ∂2u/∂x2 =ex cos y, ∂u/∂y = −ex sin y, ∂2u/∂y2 = −ex cos y. Clearly ∂2u/∂x2 + ∂2u/∂y2 = 0. Sim-ilarly, v = ex sin y is also harmonic.

If w = u + iv is analytic, the curves u(x, y) = c and v(x, y) = k inter-sect at right angles, if wi(z) ≠ 0.

Integration In much of the work with complex variables a simpleextension of integration called line or curvilinear integration is of funda-mental importance. Since any complex line integral can be expressed interms of real line integrals, we define only real line integrals. Let F(x,y)be a real, continuous function of x and y and c be any continuous curveof finite length joining the points A and B (Fig. 3-46). F(x,y) is not relatedto the curve c. Divide c up into n segments, ∆si, whose projection on thex axis is ∆xi and on the y axis is ∆yi. Let (εi, ηi) be the coordinates of anarbitrary point on ∆si. The limits of the sums

lim∆si→0

n

i = 1

F(εi, ηi) ∆si = c

F(x, y) ds

yx2 + y2

xx2 + y2

x − iyx2 + y2

1z

∂v∂y

∂v∂x

∂u∂y

∂u∂x

3-28 MATHEMATICS

FIG. 3-46 Line integral.

Page 32: 03 mathematics

lim∆si→0

n

i = 1

F(εi, ηi) ∆xi = c

F(x, y) dx

lim∆si→0

n

i = 1

F(εi, ηi) ∆yi = c

F(x, y) dy

are known as line integrals. Much of the initial strangeness of theseintegrals will vanish if it be observed that the ordinary definite integral∫ba f(x) dx is just a line integral in which the curve c is a line segment on

the x axis and F(x, y) is a function of x alone. The evaluation of lineintegrals can be reduced to evaluation of ordinary integrals.

Example ∫c y(1 + x) dy, where c: y = 1 − x2 from (−1, 0) to (1, 0). Clearlyy = 1 − x2, dy = −2x dx. Thus ∫c y(1 + x) dy = −2 ∫ 1

−1 (1 − x2)(1 + x)x dx = −8⁄15.

Example ∫c x2y ds, c is the square whose vertices are (0, 0), (1, 0), (1, 1),(0, 1). ds = dx2 + dy2. When dx = 0, ds = dy. From (0, 0) to (1, 0), y = 0, dy =0. Similar arguments for the other sides give

c

x2y ds = 1

00.x2 dx + 1

0y dy + 0

1x2 dx + 0

10.y dy = a − s = 1⁄6

Let f(z) be any function of z, analytic or not, and c any curve as above.The complex integral is calculated as ∫c f(z) dz = ∫c (u dx − v dy) + i ∫c (vdx + u dy), where f(z) = u(x, y) + iv(x, y). Properties of line integrals arethe same as those for ordinary integrals. That is, ∫c [ f(z) g(z)] dz = ∫c

f(z) dz ∫c g(z) dz; ∫c kf(z) dz = k ∫c f(z) dz for any constant k, etc.

Example c

(x2 + iy) dz along c: y = x, 0 to 1 + i. This becomes

c

(x2 + iy) dz = c

(x2 dx − y dy)

+ i c

(y dx + x2 dy) = 1

0x2 dx − 1

0x dx + i 1

0x dx + i 1

0x2 dx = −1⁄6 + 5i/6

Conformal Mapping Every function of a complex variable w =f(z) = u(x, y) + iv(x, y) transforms the x, y plane into the u, v plane insome manner. A conformal transformation is one in which anglesbetween curves are preserved in magnitude and sense. Every analyticfunction, except at those points where f ′(z) = 0, is a conformal trans-formation. See Fig. 3-47.

Example w = z2. u + iv = (x2 − y2) + 2ixy or u = x2 − y2, v = 2xy. These arethe transformation equations between the (x, y) and (u, v) planes. Lines parallelto the x axis, y = c1 map into curves in the u, v plane with parametric equationsu = x2 − c1

2, v = 2c1x. Eliminating x, u = (v2/4c12) − c1

2, which represents a family ofparabolas with the origin of the w plane as focus, the line v = 0 as axis and open-ing to the right. Similar arguments apply to x = c2.

The principles of complex variables are useful in the solution of a vari-ety of applied problems, including Laplace transforms and process con-trol (Sec. 8).

DIFFERENTIAL EQUATIONS 3-29

FIG. 3-47 Conformal transformation.

DIFFERENTIAL EQUATIONS

REFERENCES: Ames, W. F., Nonlinear Partial Differential Equations in Engi-neering, Academic Press, New York (1965); Aris, R., and N. R. Amundson,Mathematical Methods in Chemical Engineering, vol. 2, First-Order Partial Dif-ferential Equations with Applications, Prentice-Hall, Englewood Cliffs, N.J.(1973); Asmar, N., and G. C. Jones, Applied Complex Analysis with Partial Dif-ferential Equations, Prentice-Hall, Upper Saddle River, N.J. (2002); Boyce,W. E., and R. C. Di Prima, Elementary Differential Equations and BoundaryValue Problems, 7th ed., Wiley, New York (2004); Braun, M., Differential Equa-tions and Their Applications: An Introduction to Applied Mathematics, 4th ed.,Springer-Verlag, New York (1993); Bronson, R., and G. Costa, Schaum’s Outlineof Differential Equations, 3d ed., McGraw-Hill, New York (2007); Brown, J. W.,and R. V. Churchill, Fourier Series and Boundary Value Problems, 6th ed.,McGraw-Hill, New York (2000); Courant, R., and D. Hilbert, Methods of Math-ematical Physics, vols. I and II, Interscience, New York (1953, 1962); Duffy, D.,Green’s Functions with Applications, Chapman and Hall/CRC (2001); Kreyszig,E., Advanced Engineering Mathematics, 8th ed., Wiley, New York (1999);Morse, P. M., and H. Feshbach, Methods of Theoretical Physics, vols. I and II,McGraw-Hill, New York (1953); Polyanin, A. D., Handbook of Linear PartialDifferential Equations for Engineers and Scientists, Chapman and Hall/CRC(2002); Polyanin, A. D., and V. F. Zaitsev, Handbook of Exact Solutions for Ordi-nary Differential Equations, 2d ed., Chapman and Hall/CRC (2002); Ramkrishna,D., and N. R. Amundson, Linear Operator Methods in Chemical Engineeringwith Applications to Transport and Chemical Reaction Systems, Prentice-Hall,Englewood Cliffs, N.J. (1985).

The natural laws in any scientific or technological field are notregarded as precise and definitive until they have been expressed inmathematical form. Such a form, often an equation, is a relationbetween the quantity of interest, say, product yield, and independentvariables such as time and temperature upon which yield depends.When it happens that this equation involves, besides the functionitself, one or more of its derivatives it is called a differential equation.

Example The rate of the homogeneous bimolecular reaction A + B k→ C ischaracterized by the differential equation dx/dt = k(a − x)(b − x), where a = initial

concentration of A, b = initial concentration of B, and x = x(t) = concentration ofC as a function of time t.

Example The differential equation of heat conduction in a moving fluidwith velocity components vx, vy is

+ vx + vy = + where u = u(x, y, t) = temperature, K = thermal conductivity, ρ = density, and cp = specific heat at constant pressure.

ORDINARY DIFFERENTIAL EQUATIONS

When the function involved in the equation depends upon only onevariable, its derivatives are ordinary derivatives and the differentialequation is called an ordinary differential equation. When the func-tion depends upon several independent variables, then the equation iscalled a partial differential equation. The theories of ordinary and par-tial differential equations are quite different. In almost every respectthe latter is more difficult.

Whichever the type, a differential equation is said to be of nth orderif it involves derivatives of order n but no higher. The equation in thefirst example is of first order and that in the second example of secondorder. The degree of a differential equation is the power to which thederivative of the highest order is raised after the equation has beencleared of fractions and radicals in the dependent variable and itsderivatives.

A relation between the variables, involving no derivatives, is calleda solution of the differential equation if this relation, when substitutedin the equation, satisfies the equation. A solution of an ordinary dif-ferential equation which includes the maximum possible number of“arbitrary” constants is called the general solution. The maximumnumber of “arbitrary” constants is exactly equal to the order of the

∂2u∂y2

∂2u∂x2

Kρcp

∂u∂y

∂u∂x

∂u∂t

Page 33: 03 mathematics

differential equation. If any set of specific values of the constants ischosen, the result is called a particular solution.

Example The general solution of (d2x/dt2) + k2x = 0 is x = A cos kt +B sin kt, where A, B are arbitrary constants. A particular solution is x = a cos kt +3 sin kt.

In the case of some equations still other solutions exist called singu-lar solutions. A singular solution is any solution of the differentialequation which is not included in the general solution.

Example y = x(dy/dx) − d(dy/dx)2 has the general solution y = cx − dc2,where c is an arbitrary constant; y = x2 is a singular solution, as is easily verified.

ORDINARY DIFFERENTIAL EQUATIONS OF THE FIRST ORDER

Equations with Separable Variables Every differential equa-tion of the first order and of the first degree can be written in the formM(x, y) dx + N(x, y) dy = 0. If the equation can be transformed so thatM does not involve y and N does not involve x, then the variables aresaid to be separated. The solution can then be obtained by quadra-ture, which means that y = ∫ f(x) dx + c, which may or may not beexpressible in simpler form.

Example Two liquids A and B are boiling together in a vessel. Experi-mentally it is found that the ratio of the rates at which A and B are evaporatingat any time is proportional to the ratio of the amount of A (say, x) to the amountof B (say, y) still in the liquid state. This physical law is expressible as(dy/dt)/(dx/dt) = ky/x or dy/dx = ky/x, where k is a proportionality constant. Thisequation may be written dy/y = k(dx/x), in which the variables are separated.The solution is ln y = k ln x + ln c or y = cxk.

Exact Equations The equation M(x, y) dx + N(x, y) dy = 0 isexact if and only if ∂M/∂y = ∂N/∂x. In this case there exists a functionw = f(x, y) such that ∂f /∂x = M, ∂f /∂y = N, and f(x, y) = C is therequired solution. f(x, y) is found as follows: treat y as though it wereconstant and evaluate ∫ M(x, y) dx. Then treat x as though it were con-stant and evaluate ∫ N(x, y) dy. The sum of all unlike terms in thesetwo integrals (including no repetitions) is f(x, y).

Example (2xy − cos x) dx + (x2 − 1) dy = 0 is exact for ∂M/∂y = 2x, ∂N/∂x =2x. ∫ M dx = ∫ (2xy − cos x) dx = x2y − sin x, ∫ N dy = ∫ (x2 − 1) dy = x2y − y. Thesolution is x2y − sin x − y = C, as may easily be verified.

Linear Equations A differential equation is said to be linearwhen it is of first degree in the dependent variable and its derivatives.The general linear first-order differential equation has the form dy/dx + P(x)y = Q(x). Its general solution is

y = e−∫ P dx Qe∫ P dx dx + CExample A tank initially holds 200 gal of a salt solution in which 100 lb

is dissolved. Six gallons of brine containing 4 lb of salt run into the tank perminute. If mixing is perfect and the output rate is 4 gal/min, what is the amount A of salt in the tank at time t? The differential equation of A is dA/dt =[2/(100 + t)]A = 4. Its general solution is A = (4/3) (100 + t) + C/(100 + t)2. At t =0, ′A = 100; so the particular solution is A = (4/3) (100 + t) − (1/3) 106/(100 + t)2.

ORDINARY DIFFERENTIAL EQUATIONS OF HIGHER ORDER

The higher-order differential equations, especially those of order 2, areof great importance because of physical situations describable by them.

Equation y(n) = f(x)* Such a differential equation can be solvedby n integrations. The solution will contain n arbitrary constants.

Linear Differential Equations with Constant Coefficientsand Right-Hand Member Zero (Homogeneous) The solution ofy″ + ay′ + by = 0 depends upon the nature of the roots of the charac-teristic equation m2 + am + b = 0 obtained by substituting the trialsolution y = emx in the equation.

Distinct Real Roots If the roots of the characteristic equationare distinct real roots, r1 and r2, say, the solution is y = Aer1x + Ber2x,where A and B are arbitrary constants.

Example y″ + 4y′ + 3 = 0. The characteristic equation is m2 + 4m + 3 = 0.The roots are −3 and −1, and the general solution is y = Ae−3x + Be−x.

Multiple Real Roots If r1 = r2, the solution of the differentialequation is y = er1x(A + Bx).

Example y″ + 4y + 4 = 0. The characteristic equation is m2 + 4m + 4 = 0with roots −2 and −2. The solution is y = e−2x(A + Bx).

Complex Roots If the characteristic roots are p iq, then thesolution is y = epx(A cos qx + B sin qx).

Example The differential equation My″ + Ay′ + ky = 0 represents thevibration of a linear system of mass M, spring constant k, and damping constantA. If A < 2 kM, the roots of the characteristic equation

Mm2 + Am + k = 0 are complex − i − 2

and the solution is

y = e−(At/2M)

c1 cos − 2 t + ic2 sin −

2 tThis solution is oscillatory, representing undercritical damping.

All these results generalize to homogeneous linear differentialequations with constant coefficients of order higher than 2. Theseequations (especially of order 2) have been much used because of theease of solution. Oscillations, electric circuits, diffusion processes, andheat-flow problems are a few examples for which such equations areuseful.

Second-Order Equations: Dependent Variable Missing Suchan equation is of the form

F x, , = 0

It can be reduced to a first-order equation by substituting p = dy/dxand dp/dx = d 2y/dx2.

Second-Order Equations: Independent Variable MissingSuch an equation is of the form

F y, , = 0

Set = p, = p

The result is a first-order equation in p,

F y, p, p = 0

Example The capillary curve for one vertical plate is given by

= 1 + 2

3/2

Its solution by this technique is

x + c2 − y2 − c2 − h02 = cosh−1 − cosh−1

where c, h0 are physical constants.

Example The equation governing chemical reaction in a porous catalystin plane geometry of thickness L is

D = k f(c), (0) = 0, c(L) = cv

where D is a diffusion coefficient, k is a reaction rate parameter, c is theconcentration, k f(c) is the rate of reaction, and c0 is the concentration at the

dcdx

d 2cdx2

ch0

cy

c2

dydx

4yc2

d 2ydx2

dpdy

dpdy

d 2ydx2

dydx

d 2ydx2

dydx

d 2ydx2

dydx

A2M

kM

A2M

kM

A2M

kM

A2M

3-30 MATHEMATICS

*The superscript (n) means n derivatives.

Page 34: 03 mathematics

boundary. Making the substitution p = gives (Finlayson, 1980, p. 92)

p = f(c)

Integrating gives = c

c(0)f(c) dc

If the reaction is very fast, c(0) ≈ 0 and the average reaction rate is related top(L). This variable is given by

p(L) = c0

0f(c) dc

1/2

Thus, the average reaction rate can be calculated without solving the completeproblem.

Linear Nonhomogeneous Differential EquationsLinear Differential Equations Right-Hand Member f(x) ≠ 0

Again the specific remarks for y″ + ay′ + by = f(x) apply to differentialequations of similar type but higher order. We shall discuss two gen-eral methods.

Method of Undetermined Coefficients Use of this method islimited to equations exhibiting both constant coefficients and particu-lar forms of the function f(x). In most cases f(x) will be a sum or prod-uct of functions of the type constant, xn (n a positive integer), emx,cos kx, sin kx. When this is the case, the solution of the equation is y =H(x) + P(x), where H(x) is a solution of the homogeneous equationsfound by the method of the preceding subsection and P(x) is a partic-ular integral found by using the following table subject to these condi-tions: (1) When f(x) consists of the sum of several terms, the appropriateform of P(x) is the sum of the particular integrals corresponding tothese terms individually. (2) When a term in any of the trial integralslisted is already a part of the homogeneous solution, the indicatedform of the particular integral is multiplied by x.

Form of Particular Integral

If f(x) is Then P(x) is

a (constant) A (constant)axn Anxn + An − 1xn − 1 + ⋅⋅⋅A1x + A0

aerx Berx

c cos kx A cos kx + B sin kxd sin kx

gxnerx cos kx (Anxn + ⋅⋅⋅ + A0)erx cos kx + (Bnxn + ⋅⋅⋅ + B0)erx sin kxhxnerx sin kx

Since the form of the particular integral is known, the constants maybe evaluated by substitution in the differential equation.

Example y″ + 2y′ + y = 3e2x − cos x + x3. The characteristic equation is (m + 1)2 = 0 so that the homogeneous solution is y = (c1 + c2x)e−x. To find a par-ticular solution we use the trial solution from the table, y = a1e2x + a2 cos x +a3 sin x + a4x3 + a5x2 + a6x + a7. Substituting this in the differential equation col-lecting and equating like terms, there results a1 = s, a2 = 0, a3 = −a, a4 = 1,a5 = −6, a6 = 18, and a7 = −24. The solution is y = (c1 + c2x)e−x + se2x − a sin x +x3 − 6x2 + 18x − 24.

Method of Variation of Parameters This method is applicableto any linear equation. The technique is developed for a second-orderequation but immediately extends to higher order. Let the equationbe y″ + a(x)y′ + b(x)y = R(x) and let the solution of the homogeneousequation, found by some method, be y = c1 f1(x) + c2 f2(x). It is nowassumed that a particular integral of the differential equation is of theform P(x) = uf1 + vf2 where u, v are functions of x to be determined bytwo equations. One equation results from the requirement that uf1 +vf2 satisfy the differential equation, and the other is a degree of free-dom open to the analyst. The best choice proves to be

u′f1 + v′f2 = 0 and u′f ′1 + v′f ′2 = R(x)

Then u′ = = − R(x)f2

f1 f ′2 − f2 f ′1

dudx

2kD

kD

p2

2

kD

dpdc

dcds v′ = = R(x)

and since f1, f2, and R are known u, v may be found by direct inte-gration.

Perturbation Methods If the ordinary differential equation hasa parameter that is small and is not multiplying the highest derivative,perturbation methods can give solutions for small values of the param-eter.

Example Consider the differential equation for reaction and diffusion ina catalyst; the reaction is second order: c″ = ac2, c′(0) = 0, c(1) = 1. The solutionis expanded in the following Taylor series in a.

c(x, a) = c0(x) + ac1(x) + a2c2(x) + . . .

The goal is to find equations governing the functions ci(x) and solve them. Sub-stitution into the equations gives the following equations:

c0″(x) + a c″1(x) + a2c″2(x) + . . . = a[c0(x) + ac1(x) + a2c2(x) + . . . ]2

c′0(0) + ac′1(0) + a2c′2(0) + . . . = 0

c0(1) + ac1(1) + a2c2(1) + . . . = 1

Like terms in powers of a are collected to form the individual problems.

c″0 = 0, c′0(0) = 0, c0(1) = 1

c″1 = c02, c′1(0) = 0, c1(1) = 0

c″2 = 2c0c1, c′2(0) = 0, c2(1) = 0

The solution proceeds in turn.

c0(x) = 1, c1(x) = , c2(x) =

SPECIAL DIFFERENTIAL EQUATIONS [SEE ABRAMOWITZAND STEGUN (1972)]

Euler’s Equation The linear equation xny(n) + a1xn − 1y(n − 1) + ⋅ ⋅ ⋅ +an − 1xy′ + any = R(x) can be reduced to a linear equation with constantcoefficients by the change of variable x = et. To solve the homogeneousequation substitute y = xr into it, cancel the powers of x, which are thesame for all terms, and solve the resulting polynomial for r. In case ofmultiple or complex roots there results the form y = xr(log x)r and y =xα[cos (β log x) + i sin (β log x)].

Bessel’s Equation The linear equation x2(d 2y/dx2) + (1 − 2α)x(dy/dx) + [β 2γ 2 x 2γ + (α2 − p2γ 2)]y = 0 is the general Bessel equation.By series methods, not to be discussed here, this equation can beshown to have the solution

y = AxαJp(βxγ) + BxαJ−p(βxγ) p not an integer or zero

y = AxαJp(βxγ) + BxαYp(βxγ) p an integer

where Jp(x) = p

k = 0

J−p(x) = –p

k = 0

p not an integer

Γ(n) = ∞

0xn − 1e−x dx n > 0

is the gamma function. For p an integer

Jp(x) = p

k = 0

(Bessel function of the first kind of order p)

Yp(x) =

(replace right-hand side by limiting value if p is an integer or zero).The series converge for all x. Much of the importance of Bessel’s equa-

tion and Bessel functions lies in the fact that the solutions of numerouslinear differential equations can be expressed in terms of them.

[ Jp(x) cos (pπ) − J−p(x)]

sin (pπ)

(−1)k(x/2)2k

k!(p + k)!

x2

(−1)k(x/2)2k

k!Γ(k + 1 − p)

x2

(−1)k(x/2)2k

k!Γ(p + k + 1)

x2

5 − 6x2 + x4

12

(x2 − 1)

2

f1f1 f ′2 − f2 f ′1

dvdx

DIFFERENTIAL EQUATIONS 3-31

Page 35: 03 mathematics

Legendre’s Equation The Legendre equation (1 − x2)y″ −2xy′ + n(n + 1)y = 0, n ≥ 0, has the solution Pn for n an integer.

The polynomials Pn are the so-called Legendre polynomials, P0(x) =1, P1(x) = x, P2(x) = a(3x2 − 1), P3(x) = a(5x3 − 3x), . . . .

For n positive and not an integer, see Abramowitz and Stegun(1972).

Laguerre’s Equation The Laguerre equation x(d 2y/dx2) + (c − x)(dy/dx) − ay = 0 is satisfied by the confluent hypergeometric function.See Abramowitz and Stegun (1972) and Kreszig (1999).

Hermite’s Equation The Hermite equation y″ − 2xy′ + 2ny = 0is satisfied by the Hermite polynomial of degree n, y = AHn(x) if n is apositive integer or zero. H0(x) = 1, H1(x) = 2x, H2(x) = 4x2 − 2, H3(x) =8x3 − 12x, H4(x) = 16x4 − 48x2 + 12, Hr + 1(x) = 2xHr(x) − 2rHr − 1(x).

Chebyshev’s Equation The equation (1 − x2)y″ − xy′ + n2y = 0for n a positive integer or zero is satisfied by the nth Chebyshev poly-nomial y = ATn(x). T0(x) = 1, T1(x) = x, T2(x) = 2x2 − 1, T3(x) = 4x3 − 3x,T4(x) = 8x4 − 8x2 + 1; Tr + 1(x) = 2xTr(x) − Tr − 1(x).

PARTIAL DIFFERENTIAL EQUATIONS

The analysis of situations involving two or more independent variablesfrequently results in a partial differential equation.

Example The equation ∂T/∂t = K(∂2T/∂x2) represents the unsteady one-dimensional conduction of heat.

Example The equation for the unsteady transverse motion of a uniformbeam clamped at the ends is

+ = 0

Example The expansion of a gas behind a piston is characterized by thesimultaneous equations

+ u + = 0 and + u + ρ = 0

The partial differential equation ∂2f/∂x ∂y = 0 can be solved by twointegrations yielding the solution f = g(x) + h(y), where g(x) and h(y)are arbitrary differentiable functions. This result is an example of thefact that the general solution of partial differential equations involvesarbitrary functions in contrast to the solution of ordinary differentialequations, which involve only arbitrary constants. A number of meth-ods are available for finding the general solution of a partial differen-tial equation. In most applications of partial differential equations thegeneral solution is of limited use. In such applications the solution ofa partial differential equation must satisfy both the equation and cer-tain auxiliary conditions called initial and/or boundary conditions,which are dictated by the problem. Examples of these include those inwhich the wall temperature is a fixed constant T(x0) = T0, there is nodiffusion across a nonpermeable wall, and the like. In ordinary differ-ential equations these auxiliary conditions allow definite numbers tobe assigned to the constants of integration.

Partial Differential Equations of Second and Higher OrderMany of the applications to scientific problems fall naturally into par-tial differential equations of second order, although there are impor-tant exceptions in elasticity, vibration theory, and elsewhere.

A second-order differential equation can be written as

a + b + c = f

where a, b, c, and f depend upon x, y, u, ∂u/∂x, and ∂u/∂y. This equa-tion is hyperbolic, parabolic, or elliptic, depending on whether the dis-criminant b2 − 4ac is >0, =0, or <0, respectively. Since a, b, c, and fdepend on the solution, the type of equation can be different at dif-ferent x and y locations. If the equation is hyperbolic, discontinuitiescan be propagated. See Courant and Hilbert (1953, 1962) andLeVeque, R. J., Numerical Methods for Conservation Laws, Birkhäuser,Basel (1992).

Phenomena of propagation such as vibrations are characterized byequations of “hyperbolic” type which are essentially different in their

∂2u∂y2

∂2u∂x∂y

∂2u∂x2

∂u∂x

∂ρ∂x

∂ρ∂t

∂ρ∂x

c2

ρ

∂u∂x

∂u∂t

∂2y∂t2

ρEI

∂4y∂x4

properties from other classes such as those which describe equilib-rium (elliptic) or unsteady diffusion and heat transfer (parabolic). Pro-totypes are as follows:

Elliptic Laplace’s equation ∂2u/∂x2 + ∂2u/∂y2 = 0 and Poisson’sequation ∂2u/∂x2 + ∂2u/∂y2 = g(x, y) do not contain the variable timeexplicitly and consequently represent equilibrium configurations.Laplace’s equation is satisfied by static electric or magnetic potentialat points free from electric charges or magnetic poles. Other impor-tant functions satisfying Laplace’s equation are the velocity potentialof the irrotational motion of an incompressible fluid, used in hydrody-namics; the steady temperature at points in a homogeneous solid, andthe steady state of diffusion through a homogeneous body.

Parabolic The heat equation ∂T/∂t = ∂2T/∂x2 + ∂2T/∂y2 repre-sents nonequilibrium or unsteady states of heat conduction and diffu-sion.

Hyperbolic The wave equation ∂2u/∂t2 = c2(∂2u/∂x2 + ∂2u/∂y2)represents wave propagation of many varied types.

Quasilinear first-order differential equations are like

a + b = f

where a, b, and f depend on x, y, and u, with a2 + b2 ≠ 0. This equationcan be solved using the method of characteristics, which writes thesolution in terms of a parameter s, which defines a path for the char-acteristic.

= a, = b, = f

These equations are integrated from some initial conditions. For aspecified value of s, the value of x and y shows the location where thesolution is u. The equation is semilinear if a and b depend just on xand y (and not u), and the equation is linear if a, b, and f all depend onx and y, but not u. Such equations give rise to shock propagation, andconditions have been derived to deduce the presence of shocks.Courant and Hilbert (1953, 1962); Rhee, H. K., R. Aris, and N. R.Amundson, First-Order Partial Differential Equations, vol. I, Theoryand Applications of Single Equations, Prentice-Hall, EnglewoodCliffs, N.J. (1986); and LeVeque (1992), ibid.

An example of a linear hyperbolic equation is the advection equa-tion for flow of contaminants when the x and y velocity componentsare u and v, respectively.

+ u + v = 0

The equations for flow and adsorption in a packed bed or chromatog-raphy column give a quasilinear equation.

φ + φ u + (1 − φ) = 0

Here n = f(c) is the relation between concentration on the adsorbentand fluid concentration.

The solution of problems involving partial differential equationsoften revolves about an attempt to reduce the partial differentialequation to one or more ordinary differential equations. The solutionsof the ordinary differential equations are then combined (if possible)so that the boundary conditions as well as the original partial differen-tial equation are simultaneously satisfied. Three of these techniquesare illustrated.

Similarity Variables The physical meaning of the term “similar-ity” relates to internal similitude, or self-similitude. Thus, similar solu-tions in boundary-layer flow over a horizontal flat plate are those forwhich the horizontal component of velocity u has the property thattwo velocity profiles located at different coordinates x differ only by ascale factor. The mathematical interpretation of the term similarity isa transformation of variables carried out so that a reduction in thenumber of independent variables is achieved. There are essentiallytwo methods for finding similarity variables, “separation of variables”(not the classical concept) and the use of “continuous transformationgroups.” The basic theory is available in Ames (1965).

∂c∂t

dfdc

∂c∂x

∂c∂t

∂c∂y

∂c∂x

∂c∂t

duds

dyds

dxds

∂u∂y

∂u∂x

3-32 MATHEMATICS

Page 36: 03 mathematics

Example The equation ∂θ/∂x = (A/y)(∂2θ/∂y2) with the boundary condi-tions θ = 0 at x = 0, y > 0; θ = 0 at y = ∞, x > 0; θ = 1 at y = 0, x > 0 represents thenondimensional temperature θ of a fluid moving past an infinitely wide flat plateimmersed in the fluid. Turbulent transfer is neglected, as is molecular transportexcept in the y direction. It is now assumed that the equation and the boundaryconditions can be satisfied by a solution of the form θ = f(y/xn) = f(u), where θ =0 at u = ∞ and θ = 1 at u = 0. The purpose here is to replace the independentvariables x and y by the single variable u when it is hoped that a value of n existswhich will allow x and y to be completely eliminated in the equation. In this casesince u = y/xn, there results after some calculation ∂θ/∂x = −(nu/x)(dθ/du),∂2θ/∂y2 = (1/x2n)(d2θ/du2), and when these are substituted in the equation, −(1/x)nu(dθ/du) = (1/x3n)(A/u)(d2θ/du2). For this to be a function of u only,choose n = s. There results (d2θ/du2) + (u2/3A)(dθ/du) = 0. Two integrations anduse of the boundary conditions for this ordinary differential equation give thesolution

θ = ∞

uexp (−u3/9A) du ∞

0exp (−u3/9A) du

Group Method The type of transformation can be deducedusing group theory. For a complete exposition, see Ames (1965) andHill, J. M., Differential Equations and Group Methods for Scientistsand Engineers, CRC Press, New York (1992); a shortened version is inFinlayson (1980, 2003). Basically, a similarity transformation shouldbe considered when one of the independent variables has no physicalscale (perhaps it goes to infinity). The boundary conditions must alsosimplify (and combine) since each transformation leads to a differen-tial equation with one fewer independent variable.

Example A similarity variable is found for the problem

= , c(0,t) = 1, c(∞,t) = 0, c(x,0) = 0

Note that the length dimension goes to infinity, so that there is no length scalein the problem statement; this is a clue to try a similarity transformation. Thetransformation examined here is

t = aαt, x = aβx, c = aγc

With this substitution, the equation becomes

aα − γ = a2β − γ D(a−γ c) Group theory says a system is conformally invariant if it has the same form in thenew variables; here, that is

γ = 0, α − γ = 2β − γ, or α = 2βThe invariants are

η = , δ =

and the solution is

c(x, t) = f(η)tγ/α

We can take γ = 0 and δ = β/α = a. Note that the boundary conditions combinebecause the point x = ∞ and t = 0 give the same value of η and the conditions onc at x = ∞ and t = 0 are the same. We thus make the transformation

η = , c(x, t) = f(η)

The use of the 4 and D0 makes the analysis below simpler. The result is

DD(c

0

) + 2η = 0, f(0) = 1, f(∞) = 0

Thus, we solve a two-point boundary value problem instead of a partial differ-ential equation. When the diffusivity is constant, the solution is the error func-tion, a tabulated function.

c(x,t) = 1 − erf η = erfc η

erf η = η

0e−ξ

2

dξ ∞

0e−ξ

2

Separation of Variables This is a powerful, well-utilizedmethod which is applicable in certain circumstances. It consists ofassuming that the solution for a partial differential equation has theform U = f(x)g(y). If it is then possible to obtain an ordinary differen-tial equation on one side of the equation depending only on x and onthe other side only on y, the partial differential equation is said to be

dfdη

dfdη

ddη

x4D0 t

βα

xtδ

∂c∂x

∂∂x

∂c∂t

∂c∂x

DD(c)

∂∂x

∂c∂t

separable in the variables x, y. If this is the case, one side of the equa-tion is a function of x alone and the other of y alone. The two can beequal only if each is a constant, say λ. Thus the problem has againbeen reduced to the solution of ordinary differential equations.

Example Laplace’s equation ∂2V/∂x2 + ∂2V/∂y2 = 0 plus the boundary con-ditions V(0, y) = 0, V(l, y) = 0, V(x, ∞) = 0, V(x, 0) = f(x) represents the steady-state potential in a thin plate (in z direction) of infinite extent in the y directionand of width l in the x direction. A potential f(x) is impressed (at y = 0) from x =0 to x = 1, and the sides are grounded. To obtain a solution of this boundary-value problem assume V(x, y) = f(x)g(y). Substitution in the differential equationyields f″(x)g(y) + f(x)g″(y) = 0, or g″(y)/g(y) = −f″(x)/f(x) = λ2 (say). This systembecomes g″ (y) − λ2g(y) = 0 and f″(x) + λ2f(x) = 0. The solutions of these ordinarydifferential equations are respectively g(y) = Aeλy + Be−λy, f(x) = C sin λx +D cos λx. Then f(x)g(y) = (Aeλy + Be−λy) (C sin λx + D cos λx). Now V(0, y) = 0 sothat f(0)g(y) = (Aeλy + Be−λy) D 0 for all y. Hence D = 0. The solution then hasthe form sin λx (Aeλy + Be−λy) where the multiplicative constant C has been elim-inated. Since V(l, y) = 0, sin λl(Aeλy + Be−λy) 0. Clearly the bracketed functionof y is not zero, for the solution would then be the identically zero solution.Hence sin λl = 0 or λn = nπ/l, n = 1, 2, . . . where λn = nth eigenvalue.

The solution now has the form sin (nπx/l)(Aenπy/l + Be−nπy/l). Since V(x, ∞) = 0,A must be taken to be zero because ey becomes arbitrarily large as y → ∞.The solution then reads Bn sin (nπx/l)e−nπy/l, where Bn is the multiplicative con-stant. The differential equation is linear and homogeneous so that ∞

n = 1 Bne−nπy/l

sin (nπx/l) is also a solution. Satisfaction of the last boundary condition is en-sured by taking

Bn = l

0f(x) sin (nπx/l) dx = Fourier sine coefficients of f(x)

Further, convergence and differentiability of this series are established quiteeasily. Thus the solution is

V(x, y) = ∞

n = 1

Bne−nπy/l sin

Example The diffusion problem in a slab of thickness L

= D , c(0, t) = 1, c(L, t) = 0, c(x, 0) = 0

can be solved by separation of variables. First transform the problem so that theboundary conditions are homogeneous (having zeros on the right-hand side).Let

c(x, t) = 1 − Lx + u(x, t)

Then u(x, t) satisfies

= D , u(x, 0) = Lx − 1, u(0, t) = 0, u(L, t) = 0

Assume a solution of the form u(x, t) = X(x) T(t), which gives

=

Since both sides are constant, this gives the following ordinary differential equa-tions to solve.

= −λ, = −λ

The solution of these is

T = A e−λDt, X = B cos λ x + E sin λ x

The combined solution for u(x,t) is

u = A (B cos λ x + E sin λ x) e−λDt

Apply the boundary condition that u(0,t) = 0 to give B = 0. Then the solution is

u = A (sin λ x)e−λDt

where the multiplicative constant E has been eliminated. Apply the boundarycondition at x = L.

0 = A (sin λ L)e−λDt

This can be satisfied by choosing A = 0, which gives no solution. However, it canalso be satisfied by choosing λ such that

sin λ L = 0, λ L = n π

Thus λ = n2π2

L2

d 2Xdx2

1X

dTdt

1DT

d 2Xdx2

1X

dTdt

1DT

∂2u∂x2

∂u∂t

∂2c∂x2

∂c∂t

nπx

l

2l

DIFFERENTIAL EQUATIONS 3-33

Page 37: 03 mathematics

The combined solution can now be written as

u = A e−n2π2Dt/L2

Since the initial condition must be satisfied, we use an infinite series of thesefunctions.

u = ∞

n = 1

An e−n2π2Dt/L2

At t = 0, we satisfy the initial condition.

− 1 = ∞

n = 1

An This is done by multiplying the equation by

and integrating over x: 0 → L. (This is the same as minimizing the mean-squareerror of the initial condition.) This gives

= L

0(Lx − 1) sin

mLπx dx

which completes the solution.

Integral-Transform Method A number of integral transformsare used in the solution of differential equations. Only one, theLaplace transform, will be discussed here [for others, see “IntegralTransforms (Operational Methods)”]. The one-sided Laplace trans-form indicated by L[ f(t)] is defined by the equation L[ f(t)] = ∫∞

0 f(t)e−st dt. It has numerous important properties. The ones of interest hereare L[ f ′(t)] = sL[ f(t)] − f(0); L[ f″(t)] = s2L[ f(t)] − sf(0) − f ′(0);L[ f (n)(t)] = snL[ f(t)] − sn − 1f(0) − sn − 2f ′(0) − ⋅ ⋅ ⋅ − f (n − 1)(0) for ordinaryderivatives. For partial derivatives an indication of which variable isbeing transformed avoids confusion. Thus, if

y = y(x, t), Lt = sL[y(x, t)] − y(x, 0)∂y∂t

AmL

2

sin mπx

L

sin nπx

Lx

L

sin nπx

L

sin nπx

L

whereas Lt =since L[y(x, t)] is “really” only a function of x. Otherwise the resultsare similar. These facts coupled with the linearity of the transform,i.e., L[af(t) + bg(t)] = aL[ f(t)] + bL[g(t)], make it a useful device insolving some linear differential equations. Its use reduces the solutionof ordinary differential equations to the solution of algebraic equa-tions for L[y]. The inverse transform must be obtained either fromtables or by use of complex inversion methods.

Example The equation ∂c/∂t = D(∂2c/∂x2) represents the diffusion in asemi-infinite medium, x ≥ 0. Under the boundary conditions c(0, t) = c0, c(x, 0) =0 find a solution of the diffusion equation. By taking the Laplace transform ofboth sides with respect to t,

0e−st dt = ∞

0e−st dt

or = (1/D)sF − c(x, 0) =

where F(x, s) = Lt[c(x, t)]. Hence

− F = 0

The other boundary condition transforms into F(0, s) = c0 /s. Finally the solutionof the ordinary differential equation for F subject to F(0, s) = c0 /s and F remainsfinite as x → ∞ is F(x, s) = (c0 /s)e−s/Dx. Reference to a table shows that the func-tion having this as its Laplace transform is

c(x, t) = c0 1 − x/2Dt

0e−u2 du C0 erfc

Matched-Asymptotic Expansions Sometimes the coefficient infront of the highest derivative is a small number. Special perturbationtechniques can then be used, provided the proper scaling laws arefound. See Kevorkian, J., and J. D. Cole, Perturbation Methods inApplied Mathematics, Springer-Verlag, New York (1981); and Lager-strom, P. A., Matched Asymptotic Expansions: Ideas and Techniques,Springer-Verlag, New York (1988).

x4Dt

sD

d 2Fdx2

sFD

d2Fdx2

∂c∂t

1D

∂2c∂x2

dLt[y(x, t)]

dx∂y∂x

3-34 MATHEMATICS

DIFFERENCE EQUATIONS

REFERENCES: Elaydi, Saber, and S. N. Elaydi, An Introduction to DifferenceEquations, Springer-Verlag, New York (1999); Fulford, G., P. Forrester, and A.Jones, Modelling with Differential and Difference Equations, Cambridge Univer-sity Press, New York (1997); Goldberg, S., Introduction to Difference Equations,Dover (1986); Kelley, W. G., and A. C. Peterson, Difference Equations : An Intro-duction with Applications, 2d ed., Harcourt/Academic Press, San Diego (2001).

Certain situations are such that the independent variable does not varycontinuously but has meaning only for discrete values. Typical illustra-tions occur in the stagewise processes found in chemical engineeringsuch as distillation, staged extraction systems, and absorption columns.In each of these the operation is characterized by a finite between-stage change of the dependent variable in which the independent vari-able is the integral number of the stage. The importance of differenceequations is twofold: (1) to analyze problems of the type described and(2) to obtain approximate solutions of problems which lead, in theirformulation, to differential equations. In this subsection only problemsof analysis are considered; the application to approximate solutions isconsidered under “Numerical Analysis and Approximate Methods.”

ELEMENTS OF THE CALCULUS OF FINITE DIFFERENCES

Let y = f(x) be defined for discrete equidistant values of x, which willbe denoted by xn. The corresponding value of y will be written yn =f(xn). The first forward difference of f(x) denoted by ∆f(x) = f(x + h) −f(x) where h = xn − xn − 1 = interval length.

Example Let f(x) = x2. Then ∆f(x) = (x + h)2 − x2 = 2hx + h2.

The second forward difference is obtained by taking the differenceof the first; thus ∆∆f(x) = ∆2f(x) = ∆f(x + h) − ∆f(x) = f(x + 2h) − 2f(x +h) + f(x).

Example f(x) = x2, ∆2f(x) = ∆[∆f(x)] = ∆2hx + ∆h2 = 2h(x + h) − 2hx + h2 −h2 = 2h2.

Similarly the nth forward difference is defined by the relation ∆nf(x) = ∆[∆n − 1f(x)]. Other difference relations are also quite useful.Some of these are ∇f(x) = f(x) − f(x − h), which is called the backwarddifference, and δf(x) = f [x + (h/2)] − f [x − (h/2)], called the central dif-ference. Some properties of the operator ∆ are quite important. If Cis any constant, ∆C = 0; if f(x) is any function of period h, ∆f(x) = 0 (infact, periodic functions of period h play the same role here as con-stants do in the differential calculus); ∆[ f(x) + g(x)] = ∆f(x) + ∆g(x);∆m[∆nf(x)] = ∆m + nf(x); ∆[ f(x)g(x)] = f(x) ∆g(x) + g(x + h) ∆f(x)

∆ =Example ∆(x sin x) = x∆ sin x + sin (x + h) ∆x = 2x sin (h/2) cos [x + (h/2)] +

h sin (x + h).

DIFFERENCE EQUATIONS

A difference equation is a relation between the differences and theindependent variable, φ(∆ny, ∆n − 1y, . . . , ∆y, y, x) = 0, where φ is some

g(x) ∆f(x) − f(x) ∆g(x)

g(x)g(x + h)f(x)g(x)

Page 38: 03 mathematics

given function. The general case in which the interval between thesuccessive points is any real number h, instead of 1, can be reduced tothat with interval size 1 by the substitution x = hx′. Hence all furtherdifference-equation work will assume the interval size between suc-cessive points is 1.

Example f(x + 1) − (α + 1)f(x) + αf(x − 1) = 0. Common notation usually isyx = f(x). This equation is then written yx + 1 − (α + 1)yx + αyx − 1 = 0.

Example yx + 2 + 2yxyx + 1 + yx = x2.

Example yx + 1 − yx = 2x.

The order of the difference equation is the difference between thelargest and smallest arguments when written in the form of the secondexample. The first and second examples are both of order 2, while thethird example is of order 1. A linear difference equation involves noproducts or other nonlinear functions of the dependent variable andits differences. The first and third examples are linear, while the sec-ond example is nonlinear.

A solution of a difference equation is a relation between the vari-ables which satisfies the equation. If the difference equation is oforder n, the general solution involves n arbitrary constants. The tech-niques for solving difference equations resemble techniques used fordifferential equations.

Equation Dny = a The solution of ∆ny = a, where a is a constant,is a polynomial of degree n plus an arbitrary periodic function ofperiod 1. That is, y = (axn/n!) + c1xn − 1 + c2xn − 2 + ⋅ ⋅ ⋅ + cn + f(x), where f(x + 1) = f(x).

Example ∆3y = 6. The solution is y = x3 + c1x2 + c2x + c3 + f(x); c1, c2, c3 arearbitrary constants, and f(x) is an arbitrary periodic function of period 1.

Equation yx + 1 - yx = φ(x) This equation states that the first dif-ference of the unknown function is equal to the given function φ(x).The solution by analogy with solving the differential equation dy/dx =φ(x) by integration is obtained by “finite integration” or summation.When there are only a finite number of data points, this is easilyaccomplished by writing yx = y0 + x

t = 1 φ(t − 1), where the data pointsare numbered from 1 to x. This is the only situation considered here.

Examples If φ(x) = 1, yx = x. If φ(x) = x, yx = [x(x − 1)]/2. If φ(x) = ax, a ≠ 0,yx = ax/(a − 1). In all cases y0 = 0.

Other examples may be evaluated by using summation, that is, y2 =y1 + φ(1), y3 = y2 + φ(2) = y1 + φ(1) + φ(2), y4 = y3 + φ(3) = y1 + φ(1) +φ(2) + φ(3), . . . , yx = y1 + x − 1

t = 1 φ(t).

Example yx + 1 − ryx = 1, r constant, x > 0 and y0 = 1. y1 = 1 + r, y2 = 1 + r +r2, . . . , yx = 1 + r + ⋅⋅⋅ + rx = (1 − rx + 1)/(1 − r) for r ≠ 1 and yx = 1 + x for r = 1.

Linear Difference Equations The linear difference equationof order n has the form Pnyx + n + Pn − 1yx + n − 1 + ⋅ ⋅ ⋅ + P1yx + 1 + P0yx =Q(x) with Pn ≠ 0 and P0 ≠ 0 and Pj; j = 0, . . . , n are functions of x.

Constant Coefficient and Q(x) = 0 (Homogeneous) The solu-tion is obtained by trying a solution of the form yx = cβ x. When thistrial solution is substituted in the difference equation, a polynomial ofdegree n results for β. If the solutions of this polynomial are denotedby β1, β2, . . . , βn then the following cases result: (1) if all the βj’s arereal and unequal, the solution is yx = n

j = 1 cjβ jx, where the c1, . . . , cn

are arbitrary constants; (2) if the roots are real and repeated, say, βj hasmultiplicity m, then the partial solution corresponding to βj is β j

x(c1 +c2x + ⋅ ⋅ ⋅ + cmxm − 1); (3) if the roots are complex conjugates, say, a + ib =peiθ and a − ib = pe−iθ, the partial solution corresponding to this pair ispx(c1 cos θx + ic2 sin θx); and (4) if the roots are multiple complex conjugates, say, a + ib = peiθ and a − ib = pe−iθ are m-fold, then the partial solution corresponding to these is px[(c1 + c2x + ⋅ ⋅ ⋅ +cmxm − 1) cos θx + i(d1 + d2x + ⋅ ⋅ ⋅ + dmxm − 1) sin θx].

Example The equation yx + 1 − (α + 1)yx + αyx − 1 = 0, y0 = c0 and ym + 1 =xm + 1/k represents the steady-state composition of transferable material in theraffinate stream of a staged countercurrent liquid-liquid extraction system.

Clearly y is a function of the stage number x. α is a combination of system con-stants. By using the trial solution yx = cβx, there results β2 − (α + 1)β + α = 0, sothat β1 = 1, β2 = α. The general solution is yx = c1 + c2αx. By using the side conditions, c1 = c0 − c2, c2 = (ym + 1 − c0)/(αm + 1 − 1). The desired solution is (yx − c0)/(ym + 1 − c0) = (αx − 1)/(αm + 1 − 1).

Example yx + 3 − 3yx + 2 + 4yx = 0. By setting yx = cβ x, there results β3 −3β2 + 4 = 0 or β1 = −1, β2 = 2, β3 = 2. The general solution is yx = c1(−1)x +2x(c2 + c3x).

Example yx + 1 − 2yx + 2yx − 1 = 0. β1 = 1 + i, β2 = 1 − i. p = 1 + 1 = 2,θ = π/4. The solution is yx = 2x/2[c1 cos (xπ/4) + ic2 sin (xπ/4)].

Constant Coefficients and Q(x) ≠ 0 (Nonhomogeneous) Inthis case the general solution is found by first obtaining the homoge-neous solution, say, yx

H and adding to it any particular solution withQ(x) ≠ 0, say, yx

P. There are several means of obtaining the particularsolution.

Method of Undetermined Coefficients If Q(x) is a product orlinear combination of products of the functions ebx, ax, xp (p a positiveinteger or zero) cos cx and sin cx, this method may be used. The “fam-ilies” [ax], [ebx], [sin cx, cos cx] and [xp, xp − 1, . . . , x, 1] are defined foreach of the above functions in the following way: The family of a termfx is the set of all functions of which fx and all operations of the forma x + y, cos c(x + y), sin c(x + y), (x + y)p on fx and their linear combina-tions result in. The technique involves the following steps: (1) Solvethe homogeneous system. (2) Construct the family of each term. (3) Ifthe family has no representative in the homogeneous solution, assumeyx

P is a linear combination of the families of each term and determinethe constants so that the equation is satisfied. (4) If a family has a rep-resentative in the homogeneous solution, multiply each member ofthe family by the smallest integral power of x for which all such repre-sentatives are removed and revert to step 3.

Example yx + 1 − 3yx + 2yx − 1 = 1 + ax. a ≠ 0. The homogeneous solution isyx

H = c1 + c22x. The family of 1 is 1 and of ax is ax. However, 1 is a solution of thehomogeneous system. Therefore, try yx

P = Ax + Bax. Substituting in the equationthere results

yx = c1 + c22x − x + ax, a ≠ 1, a ≠ 2

If a = 1, yx = c1 + c22x − 2x. If a = 2, yx = c1 + c22x − x + x2x.

Example The family of x23x is [x23x, x3x, 3x].

Method of Variation of Parameters This technique is applic-able to general linear difference equations. It is illustrated for the second-order system yx + 2 + Ayx + 1 + Byx = φ(x). Assume that the homo-geneous solution has been found by some technique and write yx

H =c1ux + c2vx. Assume that a particular solution yx

P = Dxux + Exvx. Ex and Dx

can be found by solving the equations:

Ex + 1 − Ex =

Dx + 1 − Dx =

by summation. The general solution is then yx = yxP + yx

H.Variable Coefficients The method of variation of parameters

applies equally well to the linear difference equation with variablecoefficients. Techniques are therefore needed to solve the homoge-neous system with variable coefficients.

Equation yx + 1 - axyx = 0 By assuming that this equation is validfor x ≥ 0 and y0 = c, the solution is yx = c x

n = 1 an − 1.

Example yx + 1 + yx = 0. The solution is

yx = c x

n = 1− = c(−1)x ⋅ ⋅ ⋅ ⋅ = (−1)xc(x + 1)

Example yx + 1 − xyx = 0. The solution is yx = c(x − 1)!

x + 1

x32

21

n + 1

n

x + 2x + 1

vx + 1φ(x)vx + 1ux + 2 − vx + 2ux + 1

ux + 1φ(x)ux + 1vx + 2 − ux + 2vx + 1

a(a − 1)(a − 2)

DIFFERENCE EQUATIONS 3-35

Page 39: 03 mathematics

Reduction of Order If one homogeneous solution, say, ux, can befound by inspection or otherwise, an equation of lower order can beobtained by the substitution vx = yx /ux. The resultant equation must be satisfied by vx = constant or ∆vx = 0. Thus the equation will be ofreduced order if the new variable Ux = ∆(yx /ux) is introduced.

Example (x + 2)yx + 2 − (x + 3)yx + 1 + yx = 0. By observation ux = 1 is a solu-tion. Set Ux = ∆yx = yx + 1 − yx. There results (x + 2)Ux + 1 − Ux = 0, which is of degreeone lower than the original equation. The complete solution for yx is finally

yx = c0 x

n = 0

+ c1

Factorization If the difference equation can be factored, thenthe general solution can be obtained by solving two or more successiveequations of lower order. Consider yx + 2 + Axyx + 1 + Bxyx = φ(x). If thereexists ax, bx such that ax + bx = −Ax and axbx = Bx, then the differenceequation may be written yx + 2 − (ax + bx) yx + 1 + axbxyx = φ(x). First solveUx + 1 − bxUx = φ(x) and then yx + 1 − axyx = Ux.

Example yx + 2 − (2x + 1)yx + 1 + (x2 + x)yx = 0. Set ax = x, bx = x + 1. Solveux + 1 − (x + 1)ux = 0 and then yx + 1 − xyx = ux.

Substitution If it is possible to rearrange a difference equationso that it takes the form afx + 2yx + 2 + bfx + 1yx + 1 + cfxyx = φ(x) with a, b, cconstants, then the substitution ux = fxyx reduces the equation to onewith constant coefficients.

Example (x + 2)2yx + 2 − 3(x + 1)2yx + 1 + 2x2yx = 0. Set ux = x2yx. The equa-tion becomes ux + 2 − 3ux + 1 + 2ux = 0, which is linear and easily solved by previ-ous methods.

1n!

The substitution ux = yx / fx reduces afx fx + 1yx + z + bfx fx + 2yx + 1 +cfx + 1 fx + 2yx = φ(x) to an equation with constant coefficients.

Example x(x + 1)yx + 2 + 3x(x + 2)yx + 1 − 4(x + 1)(x + 2)yx = x. Set ux =yx / fx = yx /x. Then yx = xux, yx + 1 = (x + 1)ux + 1 and yx + 2 = (x + 2)ux + 2. Substitutionin the equation yields x(x + 1)(x + 2)ux + 2 + 3x(x + 2)(x + 1)uu + 1 − 4x(x + 1)(x + 2)ux = x or ux + 2 + 3ux + 1 − 4ux = 1/(x + 1)(x + 2), which is a linear equation with con-stant coefficients.

Nonlinear Difference Equations: Riccati Difference Equa-tion The Riccati equation yx + 1yx + ayx + 1 + byx + c = 0 is a nonlineardifference equation which can be solved by reduction to linear form.Set y = z + h. The equation becomes zx + 1zx + (h + a)zx + 1 + (h + b)zx +h2 + (a + b)h + c = 0. If h is selected as a root of h2 + (a + b)h + c = 0and the equation is divided by zx + 1zx there results [(h + b)/zx + 1] +[(h + a)/zx] + 1 = 0. This is a linear equation with constant coefficients.The solution is

yx = h +

Example This equation is obtained in distillation problems, amongothers, in which the number of theoretical plates is required. If the relativevolatility is assumed to be constant, the plates are theoretically perfect, and themolal liquid and vapor rates are constant, then a material balance around the nthplate of the enriching section yields a Riccati difference equation.

1

c− ba ++

hh

x

−(a + h) +

1(b + h)

3-36 MATHEMATICS

INTEGRAL EQUATIONS

REFERENCES: Courant, R., and D. Hilbert, Methods of Mathematical Physics,vol. I, Interscience, New York (1953); Linz, P., Analytical and Numerical Methodsfor Volterra Equations, SIAM Publications, Philadelphia (1985); Porter, D., andD. S. G. Stirling, Integral Equations: A Practical Treatment from Spectral Theoryto Applications, Cambridge University Press (1990); Statgold, I., Green’s Func-tions and Boundary Value Problems, 2d ed., Interscience, New York (1997).

An integral equation is any equation in which the unknown functionappears under the sign of integration and possibly outside the sign ofintegration. If derivatives of the dependent variable appear elsewherein the equation, the equation is said to be integrodifferential.

CLASSIFICATION OF INTEGRAL EQUATIONS

Volterra integral equations have an integral with a variable limit. TheVolterra equation of the second kind is

u(x) = f(x) + λ x

aK(x, t)u(t) dt

whereas a Volterra equation of the first kind is

u(x) = λ x

aK(x, t)u(t) dt

Equations of the first kind are very sensitive to solution errors so thatthey present severe numerical problems. Volterra equations are simi-lar to initial value problems.

A Fredholm equation of the second kind is

u(x) = f(x) + λ b

aK(x, t)u(t) dt

whereas a Fredholm equation of the first kind is

u(x) = b

aK(x, t)u(t) dt

The limits of integration are fixed, and these problems are analogousto boundary value problems.

An eigenvalue problem is a homogeneous equation of the secondkind, and solutions exist only for certain λ.

u(x) = λ b

aK(x, t)u(t) dt

See Linz (1985) for further information and existence proofs.If the unknown function u appears in the equation in any way

except to the first power, the integral equation is said to be nonlinear.The equation u(x) = f(x) + ∫b

a K(x, t)[u(t)]3/2 dt is nonlinear. The differ-ential equation du/dx = g(x, u) is equivalent to the nonlinear integralequation u(x) = c + ∫ x

a g[t, u(t)] dt.An integral equation is said to be singular when either one or both

of the limits of integration become infinite or if K(x, t) becomes infi-nite for one or more points of the interval under discussion.

Example u(x) = x + ∞

0cos (xt)u(t) dt and f(x) = x

0dt are both

singular. The kernel of the first equation is cos (xt), and that of the second is (x − t)−1.

RELATION TO DIFFERENTIAL EQUATIONS

The Leibniz rule (see “Integral Calculus”) can be used to show theequivalence of the initial-value problem consisting of the second-order differential equation d 2y/dx2 + A(x)(dy/dx) + B(x)y = f(x)together with the prescribed initial conditions y(a) = y0, y′(a) = y′0 tothe integral equation.

y(x) = x

aK(x, t)y(t) dt + F(x)

where K(x, t) = (t − x)[B(t) − A′(t)] − A(t)

u(t)x − t

Page 40: 03 mathematics

and F(x) = x

a(x − t)f(t) dt + [A(a)y0 + y′0](x − a) + y0

This integral equation is a Volterra equation of the second kind.Thus the initial-value problem is equivalent to a Volterra integralequation of the second kind.

Example d 2y/dx2 + x2(dy/dx) + xy = x, y(0) = 1, y′(0) = 0. Here A(x) = x2,B(x) = x, f(x) = x. The equivalent integral equation is y(x) = ∫ x

0 K(x, t)y(t) dt + F(x)where K(x, t) = t(x − t) − t2 and F(x) = ∫ x

0 (x − t)t dt + 1 = x3/6 + 1. Combining thesey(x) = ∫ x

0 t[x − 2t]y(t) dt + x3/6 + 1.

Eigenvalue problems can also be related. For example, the problem(d 2y/dx2) + λy = 0 with y(0) = 0, y(a) = 0 is equivalent to the integralequation y(x) = λ ∫ a

0 K(x, t)y(t) dt, where K(x, t) = (t/a)(a − x) when t < x and K(x, t) = (x/a)(a − t) when t > x. The differential equation maybe recovered from the integral equation by differentiating the integralequation by using the Leibniz rule.

METHODS OF SOLUTION

In general, the solution of integral equations is not easy, and a few exactand approximate methods are given here. Often numerical methods mustbe employed, as discussed in “Numerical Solution of Integral Equations.”

Equations of Convolution Type The equation u(x) = f(x) +λ ∫ x

0 K(x − t)u(t) dt is a special case of the linear integral equation of thesecond kind of Volterra type. The integral part is the convolution inte-gral discussed under “Integral Transforms (Operational Methods)”; sothe solution can be accomplished by Laplace transforms; L[u(x)] =L[ f(x)] + λL[u(x)]L[K(x)] or

L[u(x)] = , u(x) = L−1 L[ f(x)]1 − λL[K(x)]

L[ f(x)]1 − λL[K(x)]

Equations of the type considered here occur quite frequently in prac-tice in what can be called “cause-and-effect” systems.

Example In a certain linear system, the effect E(t) due to a cause C = λEat time τ is a function only of the elapsed time t − τ. If the system has the activ-ity level 1 at time t < 0, the cause λE and effect (E) relation is given by the inte-gral equation E(t) = 1 + λ ∫ t

0 K(t − τ)E(τ) dτ. Let K(t − τ) = t − τ. Then E(t) = 1 +λ ∫ t

0 (t − τ)E(τ) dτ. By using the transform method

E(t) = L−1 = L−1 = L−1 = cosh λ t

Method of Successive Approximations Consider the equationy(x) = f(x) + λ ∫b

a K(x, t)y(t) dt. In this method a unique solution isobtained in sequence form as follows: Substitute in the right-handmember of the equation y0(t) for y(t). Upon integration there resultsy1(t) = f(x) + λ ∫b

a K(x, t)y0(t) dt. Continue in like manner by replacingy0 by y1, y1 by y2, etc. A series of functions y0(x), y1(x), y2(x), . . . areobtained which satisfy the equations

yn(x) = f(x) + λ b

aK(x, t)yn − 1(t) dt

Then yn(x) = f(x) + λ ∫ba K(x, t)f(t) dt + λ2 ∫b

a K(x, t) ∫ba K(t, t1)f(t1) dt1 dt +

λ3 ∫ba K(x, t) ∫b

a K(t, t1) ∫ba K(t1, t2)f(t2) dt2 dt1 dt + ⋅ ⋅ ⋅ + Rn, where Rn is the

remainder, and

|Rn| ≤ |λn| Mn(b − a)n

where M = maximum value of |K| in the rectangle a ≤ t ≤ b, a ≤ x ≤ b.If |λ|M(b − a) < 1, lim

n→∞Rn = 0. Then yn(x) → y(x), which is the unique

solution.

max. y0

a ≤ x ≤ b

pp2 − λ

1/p1 − λ/p2

L[1]1 − λL[K(t)]

INTEGRAL TRANSFORMS (OPERATIONAL METHODS) 3-37

INTEGRAL TRANSFORMS (OPERATIONAL METHODS)

REFERENCES: Brown, J. W., and R. V. Churchill, Fourier Series and BoundaryValue Problems, 6th ed., McGraw-Hill, New York (2000); Churchill, R. V., Opera-tional Mathematics, 3d ed., McGraw-Hill, New York (1972); Davies, B., IntegralTransforms and Their Applications, 3d ed., Springer (2002); Duffy, D. G., Trans-form Methods for Solving Partial Differential Equations, Chapman & Hall/CRC,New York (2004); Varma, A., and M. Morbidelli, Mathematical Methods in Chem-ical Engineering, Oxford, New York (1997).

The term “operational method” implies a procedure of solving differ-ential and difference equations by which the boundary or initial con-ditions are automatically satisfied in the course of the solution. Thetechnique offers a very powerful tool in the applications of mathemat-ics, but it is limited to linear problems.

Most integral transforms are special cases of the equation g(s) =∫ b

a f(t)K(s, t) dt in which g(s) is said to be the transform of f(t) and K(s, t) is called the kernel of the transform. A tabulation of the moreimportant kernels and the interval (a, b) of applicability follows. Thefirst three transforms are considered here.

Name of transform (a, b) K(s, t)

Laplace (0, ∞) e−st

Fourier (−∞, ∞) e−ist

Fourier cosine (0, ∞) cos st

Fourier sine (0, ∞) sin st

Mellin (0, ∞) ts − 1

Hankel (0, ∞) tJν(st), ν ≥ −a

12π

LAPLACE TRANSFORM

The Laplace transform of a function f(t) is defined by F(s) = L f(t) =∫ ∞

0 e−stf(t) dt, where s is a complex variable. Note that the transform isan improper integral and therefore may not exist for all continuousfunctions and all values of s. We restrict consideration to those valuesof s and those functions f for which this improper integral converges.The Laplace transform is used in process control (see Sec. 8).

The function L[ f(t)] = g(s) is called the direct transform, and L−1[g(s)] = f(t) is called the inverse transform. Both the direct and theinverse transforms are tabulated for many often-occurring functions.In general,

L−1[g(s)] = + i∞

− i∞estg(s) ds

and to evaluate this integral requires a knowledge of complex vari-ables, the theory of residues, and contour integration.

A function is said to be piecewise continuous on an interval if it hasonly a finite number of finite (or jump) discontinuities. A function f on0 < t < ∞ is said to be of exponential growth at infinity if there existconstants M and α such that | f(t)| ≤ Meαt for sufficiently large t.

Sufficient Conditions for the Existence of Laplace TransformSuppose f is a function which is (1) piecewise continuous on every finiteinterval 0 < t < T, (2) of exponential growth at infinity, and (3) ∫ δ0 | f(t)| dt exist (finite) for every finite δ > 0. Then the Laplace transformof f exists for all complex numbers s with sufficiently large real part.

Note that condition 3 is automatically satisfied if f is assumed to bepiecewise continuous on every finite interval 0 ≤ t < T. The functionf(t) = t−1/2 is not piecewise continuous on 0 ≤ t ≤ T but satisfies condi-tions 1 to 3.

Let Λ denote the class of all functions on 0 < t < ∞ which satisfy con-ditions 1 to 3.

12πi

Page 41: 03 mathematics

Example Let f(t) be the Heaviside step function at t = t0; i.e., f(t) = 0 for t ≤ t0, and f(t) = 1 for t > t0. Then

L f(t) = ∞

t0

e−st dt = limT→∞

T

t0

e−st dt = limT→∞

(e−st0 − e−sT) =

provided s > 0.

Example Let f(t) = eat, t ≥ 0, where a is a real number. Then Leat =∫ ∞

0 e−(s − a) dt = 1/(s − a), provided Re s > a.

Properties of the Laplace Transform1. The Laplace transform is a linear operator: Laf(t) + bg(t) =

aL f(t) + bLg(t) for any constants a, b and any two functions f and gwhose Laplace transforms exist.

2. The Laplace transform of a real-valued function is real for real s.If f(t) is a complex-valued function, f(t) = u(t) + iv(t), where u and v arereal, then L f(t) = Lu(t) + iLv(t). Thus Lu(t) is the real part ofL f(t), and Lv(t) is the imaginary part of L f(t).

3. The Laplace transform of a function in the class Λ has deriva-tives of all orders, and Ltkf(t) = (−1)kdkF(s)/dsk, k = 1, 2, 3, . . . .

Example ∞

0e−st sin at dt = , s > 0. By property 3, =

0e−st t sin at dt = Lt sin at.

Example By applying property 3 with f(t) = 1 and using the precedingresults, we obtain

Ltk = (−1)k =provided Re s > 0; k = 1, 2, . . . . Similarly, we obtain

Ltkeat = (−1)k =4. Frequency-shift property (or, equivalently, the transform of

an exponentially modulated function). If F(s) is the Laplace transformof a function f(t) in the class Λ, then for any constant a, Leat f(t) =F(s − a).

Example Lte−at = , s > 0.

5. Time-shift property. Let u(t − a) be the unit step function at t = a. Then L f(t − a)u(t − a) = e−asF(s).

6. Transform of a derivative. Let f be a differentiable function suchthat both f and f ′ belong to the class Λ. Then L f ′(t) =sF(s) − f(0).

7. Transform of a higher-order derivative. Let f be a function whichhas continuous derivatives up to order n on (0, ∞), and suppose that fand its derivatives up to order n belong to the class Λ. Then L f ( j)(t)= s jF(s) − s j − 1f(0) − s j − 2f ′(0) − ⋅ ⋅ ⋅ − sf ( j − 2)(0) − f ( j − 1)(0) for j = 1, 2, . . . , k.

Example L f″(t) = s2L f(t) − sf(0) − f ′(0)

L f″′(t) = s3L f(t) − s2f(0) − sf ′(0) − f″(0)

Example Solve y″ + y = 2et, y(0) = y′(0) = 2. L[y″] = −y′(0) − sy(0) + s2L[y] =−2 − 2s + s2L[y]. Thus

−2 − 2s + s2L[y] + L[y] = 2L[et] =

L[y] = = + +

Hence y = et + cos t + sin t.

A short table (Table 3-2) of very common Laplace transforms andinverse transforms follows. The references include more detailedtables. NOTE: Γ(n + 1) = ∫ ∞

0 xne−x dx (gamma function); Jn(t) = Besselfunction of the first kind of order n.

8. L t

af(t) dt = L[ f(t)] + 0

af(t) dt

1s

1s

1s2 + 1

ss2 + 1

1s − 1

2s2

(s − 1)(s2 + 1)

2s − 1

1(s + a)2

k!(s − a)k + 1

1s − a

dk

dsk

k!sk + 1

1s

dk

dsk

2as(s2 + a2)2

as2 + a2

e−st0

s

1s

3-38 MATHEMATICS

Example Find f(t) if L[ f(t)] = . L sinh at = .

Therefore f(t) = t

0

t

0sinh at dt dt = − t.

9. L = ∞

sg(s) ds L = ∞

s⋅ ⋅ ⋅ ∞

sg(s)(ds)k

k integrals

Example L = ∞

sL[sin at] ds = ∞

s= cot−1

10. The unit step function

u(t − a) = L[u(t − a)] = e−as/s

11. The unit impulse function is

δ(a) = u′(t − a) = L[u′(t − a)] = e−as

12. L−1[e−asg(s)] = f(t − a)u(t − a) (second shift theorem).13. If f(t) is periodic of period b, i.e., f(t + b) = f(t), then

L[ f(t)] = b

0e−stf(t) dt

Example The partial differential equations relating gas composition toposition and time in a gas chromatograph are ∂y/∂n + ∂x/∂θ = 0, ∂y/∂n = x − y,where x = mx′, n = (kGaP/Gm)h, θ = (mkGaP/ρB)t and GM = molar velocity, y = molefraction of the component in the gas phase, ρB = bulk density, h = distance fromthe entrance, P = pressure, kG = mass-transfer coefficient, and m = slope of theequilibrium line. These equations are equivalent to ∂2y/∂n ∂θ + ∂y/∂n + ∂y/∂θ =0, where the boundary conditions considered here are y(0, θ) = 0 and x(n, 0) =y(n, 0) + (∂y/∂n) (n, 0) = δ(0) (see property 11). The problem is conveniently

11 − e−bs

∞ at t = a0 elsewhere

0 t < a1 t > a

sa

a dss2 + a2

sin at

t

f(t)tk

f(t)

t

sinh at

a1

a2

1a

1s2 − a2

1a

1s2 − a2

1s2

TABLE 3-2 Laplace Transforms

f(t) g(s) f(t) g(s)

1 1/s e−at(1 − at)

tn, (n a + integer)

tn, n ≠ + integer sin at sinh at

cos at cos at cosh at

sin at (sinh at + sin at)

cosh at a (cosh at + cos at)

sinh at tan−1

e−at J0(at)

e−bt cos at nan (s2+ a2− s)n(n > 0)

e−bt sin at J0 (2 at) e−a/s

erfc 2

kt

e−ks1s

1s

a(s + b)2 + a2

Jn(at)

ts + b

(s + b)2 + a2

1s2 + a2

1s + a

as

sin at

ta

s2 − a2

s3

s4 − a4

ss2 − a2

s2

s4 − a4

12a

as2 + a2

s2

s4 + 4a4

ss2 + a2

ss4 + 4a4

12a2

Γ(n + 1)

sn + 1

s(s2 + a2)2

t sin at

2an!

sn + 1

s(s + a)2

Page 42: 03 mathematics

solved by using the Laplace transform of y with respect to n; writeg(s, θ) = ∫∞

0 e−nsy(n, θ) dn. Operating on the partial differential equation givess(dg/dθ) − (∂y/∂θ) (0, θ) + sg − y(0, θ) + dg/dθ = 0 or (s + 1) (dg/dθ) + sg = (∂y/∂θ)(0, θ) + y(0, θ) = 0. The second boundary condition gives g(s, 0) + sg(s, 0) −y(0, 0) = 1 or g(s, 0) + sg(s, 0) = 1 (L[δ(0)] = 1). A solution of the ordinary differ-ential equation for g consistent with this second condition is

g(s, θ) = s +

11

e−sθ /(s + 1)

Inversion of this transform gives the solution y(n, θ) = e−(n + θ) I0(2 nθ) where I0 = zero-order Bessel function of an imaginary argument. For large u, In(u) ∼eu/2πu. Hence for large n,

y(n, θ) ∼

or for sufficiently large n, the peak concentration occurs near θ = n.

Other applications of Laplace transforms are given under “Differ-ential Equations.”

CONVOLUTION INTEGRAL

The convolution integral (faltung) of two functions f(t), r(t) is x(t) =f(t)°r(t) = ∫ t

0 f(τ)r(t − τ) dτ.

Example t° sin t = t

0τ sin (t − τ) dτ = t − sin t.

L[ f(t)]L[h(t)] = L[ f(t)°h(t)]

Z-TRANSFORM

See Ogunnaike, Babatunde A., and W. Harmon Ray, Process Dynam-ics, Modeling, and Control, Oxford University Press (1994); Seborg,D., T. F. Edgar, and D. A. Mellichamp, Process Dynamics and Con-trol, 2d ed., Wiley, New York (2003). The z-transform is useful whendata is available at only discrete points. Let

f*(t) = f(tk)

be the value of f at the sample points

tk = k ∆t, k = 0, 1, 2, . . .

Then the function f*(t) is

f*(t) = ∞

k = 0

f(tk) δ(t − tk)

Take the Laplace transform of this.

g*(s) = L[ f*(t)] = ∞

k = 0

f(tk) e−stk = ∞

k = 0

f(tk) e−s∆tk

For convenience, replace es∆t by z and call g*(z) the z-transform of f*(t).

g*(z) = ∞

k = 0

f(tk) z−k

The z-transform is used in process control when the signals are atintervals of ∆t. A brief table (Table 3-3) is provided here.

The z-transform can also be used to solve difference equations, justlike the Laplace transform can be used to solve differential equations.

Example The difference equation for y(k) is

y(k) + a1 y(k − 1) + a2y(k − 2) = b1u(k)

Take the z-transform(1 + a1z−1 + a2z−2) y*(z) = b1u*(z)

Then y*(z) =1 + a1

uz*−

(1

z+)

a2z−2

The inverse transform must be found, usually from a table of inverse transforms.

FOURIER TRANSFORMREFERENCES: Bateman, H., Tables of Integral Transforms, vol. I, McGraw-Hill,New York (1954); Varma, A., and M. Morbidelli, Mathematical Methods in Chem-ical Engineering, Oxford, New York (1997).

exp [−(θ − n)2]

2π1/2(nθ)1/4

The Fourier transform is given by

F[ f(t)] =

1

2π ∞

−∞f(t)e−ist dt = g(s)

and its inverse by

F−1[g(s)] = ∞

−∞g(s)eist dt = f(t)

In brief, the condition for the Fourier transform to exist is that ∫ ∞-∞ |f(t)| dt < ∞, although certain functions may have a Fourier trans-

form even if this is violated.

Example The function f(t) = has F[ f(t)] =a

−ae−ist dt =

a

0eist dt + a

0e−ist dt = 2 a

0cos st dt =

Properties of the Fourier Transform Let F[ f (t)] = g(s);F−1[g(s)] = f(t).

1. F [ f (n)(t)] = (is)nF [ f(t)].2. F[af(t) + bh(t)] = aF[ f(t)] + bF[h(t)].3. F[ f(−t)] = g(−s).

4. F[ f(at)] = g , a > 0.

5. F [e−iwt f(t)] = g(s + w).6. F [ f(t + t1)] = eist1g(s).7. F [ f(t)] = G(is) + G(−is) if f(t) = f(−t) ( f even)

F[ f(t)] = G(is) − G(−is) if f(t) = −f(−t) ( f odd)where G(s) = L[f(t)]. This result allows the use of the Laplace-transform tables to obtain the Fourier transforms.

Example Find F[e−a|t|] by property 7. e−a|t| is even. So L[e−at] = 1/(s + a).Therefore, F[e−a|t|] = 1/(is + a) + 1/(−is + a) = 2a/(s2 + a2).

FOURIER COSINE TRANSFORM

The Fourier cosine transform is given by

Fc[f(t)] = g(s) = ∞

0f(t) cos st dt

sa

1a

2 sin sa

s

1 − a ≤ t ≤ a0 elsewhere

12π

INTEGRAL TRANSFORMS (OPERATIONAL METHODS) 3-39

TABLE 3-3 z-Transforms

f(k) g*(z)

1(k)

k ∆t

(k ∆t)n − 1 lima→0

(−1)n − 1 sin a k ∆t

cos a k ∆t

e−ak∆t

e−bk∆t cos a k ∆t

e−bk∆t sin a k ∆tz−1 e−b∆t sin a ∆t

1 − 2 z−1 e−b∆t cos a ∆t + z−2 e−2b∆t

1b

1b

1 − z−1 e−b∆t cos a ∆t1 − 2 z−1 e−b∆t cos a ∆t + z−2 e−2b∆t

11 − e−a∆tz−1

1 − z−1 cos a ∆t(1 − 2 z−1 cos a ∆t + z−2)

z−1 sin a ∆t(1 − 2 z−1 cos a ∆t + z−2)

11 − e−a∆tz−1

∂n − 1

∂an − 1

∆t z−1

(1 − z−1)2

11 − z−1

Page 43: 03 mathematics

and its inverse by

Fc−1[g(s)] = f(t) = ∞

0g(s) cos st ds

Inverse of a Matrix A square matrix A is said to have an inverseif there exists a matrix B such that AB = BA = I, where I is the identitymatrix of order n.

The inverse B is a square matrix of the order of A, designated by A−1.Thus AA−1 = A−1A = I. A square matrix A has an inverse if and only if Ais nonsingular.

Certain relations are important:

(1) (AB)−1 = B−1A−1

(2) (AB)T = BTAT

(3) (A−1)T = (AT )−1

(4) (ABC)−1 = C−1B−1A−1

Scalar Multiplication Let c be any real or complex number.Then cA = (caij).

Adjugate Matrix of a Matrix Let Aij denote the cofactor of theelement aij in the determinant of the matrix A. The matrix BT whereB = (Aij) is called the adjugate matrix of A written adj A = BT. The ele-ments bij are calculated by taking the matrix A, deleting the ith rowand jth column, and calculating the determinant of the remainingmatrix times (−1)i + j. Then A−1 = adj A/ |A|. This definition may be usedto calculate A−1. However, it is very laborious and the inversion is usu-ally accomplished by numerical techniques shown under “NumericalAnalysis and Approximate Methods.”

Linear Equations in Matrix Form Every set of n nonhomoge-neous linear equations in n unknowns

a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n xn = b1

a21 x1 + a22 x2 + ⋅ ⋅ ⋅ + a2n xn = b2

an1 x1 + an2x2 + ⋅ ⋅ ⋅ + annxn = bn

can be written in matrix form as AX = B, where A = (aij), XT = [x1 ⋅ ⋅ ⋅ xn],and BT = [b1 ⋅ ⋅ ⋅ bn]. The solution for the unknowns is X = A−1B.

Special Square Matrices1. A triangular matrix is a matrix all of whose elements above or

below the main diagonal (set of elements a11, . . . , ann) are zero.If A is triangular, det (A) = a11. a22 . . . ann.2. A diagonal matrix is one such that all elements both above and

below the main diagonal are zero (i.e., aij = 0 for all i ≠ j). If all diago-nal elements are equal, the matrix is called scalar. If A is diagonal, A =(aij), A−1 = (1/aij).

3. If aij = aji for all i and j (i.e., A = AT), the matrix is symmetric.4. If aij = −aji for i ≠ j but the aij are not all zero, the matrix is skew.5. If aij = −aji for all i and j (i.e., aii = 0), the matrix is skew sym-

metric.6. If AT = A−1, the matrix A is orthogonal.7. If the matrix A* = (aij)T, aij = complex conjugate of aij, A* is the

hermitian transpose of A.8. If A = A−1, A is involutory.9. If A = A*, A is hermitian.

10. If A = −A*, A is skew hermitian.11. If A−1 = A*, A is unitary.If A is any matrix, then AAT and ATA are square symmetric matrices,

usually of different order.

0

01

0 . . . .1 . .

1. . . . 0

100

3-40 MATHEMATICS

The Fourier sine transform Fs is obtainable by replacing the cosine bythe sine in these integrals. They can be used to solve linear differentialequations; see the transform references.

REFERENCES: Anton, H., and C. Rorres, Elementary Linear Algebra withApplications, 9th ed., Wiley (2004); Bernstein, D. S., Matrix Mathematics: The-ory, Facts, and Formulas with Application to Linear Systems Theory, PrincetonUniversity Press, Princeton, N.J. (2005); Kolman, B., and D. R. Hill, IntroductoryLinear Algebra: An Applied First Course, 8th ed., Prentice-Hall, EnglewoodCliffs, N.J. (2004); Lay, D. C., Linear Algebra and Its Applications, 3d ed., Addi-son Wesley (2002); Lipschutz, S., and M. Lipson, Schaum’s Outline of LinearAlgebra, McGraw-Hill, New York (2000); Noble, B., and J. W. Daniel, AppliedLinear Algebra, 3d ed., Prentice-Hall, Englewood Cliffs, N.J. (1987); Press, W. H.,et al., Numerical Recipes, Cambridge University Press, Cambridge (1986).

MATRIX ALGEBRA

Matrices A rectangular array of mn quantities, arranged in mrows and n columns

A = (aij) = is called a matrix. The elements aij may be real or complex. The notationaij means the element in the ith row and jth column, i is called the rowindex, j the column index. If m = n the matrix is said to be square and oforder n. A matrix, even if it is square, does not have a numerical value, asa determinant does. However, if the matrix A is square, a determinantcan be formed which has the same elements as the matrix A. This iscalled the determinant of the matrix and is written det (A) or |A|. If A issquare and det (A) ≠ 0, A is said to be nonsingular; if det (A) = 0, A is saidto be singular. A matrix A has rank r if and only if it has a nonvanishingdeterminant of order r and no nonvanishing determinant of order > r.

Equality of Matrices Let A = (aij), B = (bij). Two matrices A andB are equal (=) if and only if they are identical; that is, they have thesame number of rows and the same number of columns and equal cor-responding elements (aij = bij for all i and j).

Addition and Subtraction The operations of addition (+) andsubtraction (−) of two or more matrices are possible if and only if theyhave the same number of rows and columns. Thus A B = (aij bij);i.e., addition and subtraction are of corresponding elements.

Transposition The matrix obtained from A by interchanging therows and columns of A is called the transpose of A, written A′ or AT.

Example A = , AT = Note that (AT)T = A.

Multiplication Let A = (aij), i = 1, . . . , m1; j = 1, . . . , m2. B = (bij),i = 1, . . . , n1, j = 1, . . . , n2. The product AB is defined if and only if thenumber of columns of A (m2) equals the number of rows of B(n1), i.e.,n1 = m2. For two such matrices the product P = AB is defined by sum-ming the element by element products of a row of A by a column of B.

This is the row by column rule. Thus

pij = n1

k = 1

aikbkj

The resulting matrix has m1 rows and n2 columns.

Example = It is helpful to remember that the element pij is formed from the ith

row of the first matrix and the jth column of the second matrix. Thematrix product is not commutative. That is, AB ≠ BA in general.

249

42

176

29

315

−4−2−8

63

51

10

0−2

214

315

216

134

46

31

12

a1n

a2n

amn

……

a11

a21

am1

MATRIX ALGEBRA AND MATRIX COMPUTATIONS

Page 44: 03 mathematics

Example Let A = , AT = AAT = , ATA =

Using a program such as MATLAB, these are easily calculated.

Matrix CalculusDifferentiation Let the elements of A = [aij(t)] be differentiable

functions of t. Then = .

Example A = , = .

Integration The integral ∫ A dt = [∫ aij(t) dt].

Example A = , ∫ A dt = .

The matrix B = A − λI is called the characteristic (eigen) matrix ofA. Here A is square of order n, λ is a scalar parameter, and I is the n × n identity. det B = det (A − λI) = 0 is the characteristic (eigen)equation for A. The characteristic equation is always of the samedegree as the order of A. The roots of the characteristic equation arecalled the eigenvalues of A.

Example A = , B = − = is the characteristic matrix and f(λ) = det (B) = det (A − λI) = (1 − λ)(8 − λ) − 6 =2 − 9λ + λ2 = 0 is the characteristic equation. The eigenvalues of A are the rootsof λ2 − 9λ + 2 = 0, which are (9 73)/2.

A nonzero matrix Xi, which has one column and n rows, called a col-umn vector satisfying the equation

(A − λI)Xi = 0

and associated with the ith characteristic root λi is called an eigenvector.Vector and Matrix Norms To carry out error analysis for approx-

imate and iterative methods for the solutions of linear systems, oneneeds notions for vectors in Rn and for matrices that are analogous to thenotion of length of a geometric vector. Let Rn denote the set of all vec-tors with n components, x = (x1, . . . , xn). In dealing with matrices it isconvenient to treat vectors in Rn as columns, and so x = (x1, . . . , xn)T;however, we shall here write them simply as row vectors. A norm on Rn

is a real-valued function f defined on Rn with the following properties:1. f(x) ≥ 0 for all x Rn.2. f(x) = 0 if and only if x = (0, 0, . . . , 0).3. f(ax) = |a| f(x) for all real numbers a and x Rn.4. f(x + y) $ f(x) + f(y) for all x, y Rn.The usual notation for a norm is f(x) = x.The norm of a matrix is

κ(A) A A−1

where A = supx ≠ 0 = maxk n

j = 1

|ajk|

The norm is useful when doing numerical calculations. If the com-puter’s floating-point precision is 10−6, then κ = 106 indicates an ill-conditioned matrix. If the floating-point precision is 10−12 (doubleprecision), then a matrix with κ = 1012 may be ill-conditioned. Twoother measures are useful and are more easily calculated:

Ratio = , V = , αi = (a2i1 + a2

i2 + . . . a2in)1/2

where akk(k) are the diagonal elements of the LU decomposition.

|det A|α1 α2 . . . αn

maxk |akk(k)|

mink |akk

(k)|

A x x

28−λ

1−λ3

λ0

28

13

28

13

2tet

t2/2t3/3

2et

tt2

−sin tcos t

cos tsin t

dAdt

cos tsin t

sin t−cos t

daij(t)

dtdAdt

17185

26

187

105

13217

18

38131817

839

22513

35228

2−2

01

3415

5130

051

310

14

−2

532

MATRIX COMPUTATIONS

The principal topics in linear algebra involve systems of linear equa-tions, matrices, vector spaces, linear transformations, eigenvalues andeigenvectors, and least-squares problems. The calculations are rou-tinely done on a computer.

LU Factorization of a Matrix To every m × n matrix A thereexists a permutation matrix P, a lower triangular matrix L with unit diag-onal elements, and an m × n (upper triangular) echelon matrix U suchthat PA =LU. The Gauss elimination is in essence an algorithm to deter-mine U, P, and L. The permutation matrix P may be needed since it maybe necessary in carrying out the Gauss elimination to interchange tworows of A to produce a (nonzero) pivot, such as if we start with

A = If A is a square matrix and if principal submatrices of A are all nonsingu-lar, then we may choose P as the identity in the preceding factorizationand obtain A = LU. This factorization is unique if L is normalized (asassumed previously), so that it has unit elements on the main diagonal.

Solution of Ax = b by Using LU Factorization Suppose thatthe indicated system is compatible and that A = LU (the case PA = LUis similarly handled and amounts to rearranging the equations). Let z = Ux. Then Ax = LUx = b implies that Lz = b. Thus to solve Ax = b wefirst solve Lz = b for z and then solve Ux = z for x. This procedure doesnot require that A be invertible and can be used to determine all solu-tions of a compatible system Ax = b. Note that the systems Lz = b andUx = z are both in triangular forms and thus can be easily solved.

The LU decomposition is essentially a Gaussian elimination,arranged for maximum efficiency. The chief reason for doing an LUdecomposition is that it takes fewer multiplications than would beneeded to find an inverse. Also, once the LU decomposition has beenfound, it is possible to solve for multiple right-hand sides with littleincrease in work. The multiplication count for an n × n matrix and mright-hand sides is

operation count = n3 − n + mn2

If an inverse is desired, it can be calculated by solving for the LUdecomposition and then solving n problems with right-hand sides con-sisting of all zeros except one entry. Thus 4n2/3 − n/3 multiplicationsare required for the inverse. The determinant is given by

Det A = n

i = 1

aii(i)

where aii(i) are the diagonal elements obtained in the LU decomposition.

A tridiagonal matrix is one in which the only nonzero entries lie onthe main diagonal and the diagonal just above and just below the maindiagonal. The set of equations can be written as

aixi − 1 + bixi + cixi + 1 = di

The LU decomposition is

b1 = b1

for k=2,n do

a′k = , b′k = bk − ck − 1

enddod′1 = d1

for k=2,n dod′k = dk − a′k d′k − 1

enddoxn = d ′n /b′nfor k=n−1,1 do

xk =

enddo

d ′k − ck xk + 1

b′k

akb′k − 1

akb′k − 1

13

13

26

01

MATRIX ALGEBRA AND MATRIX COMPUTATION 3-41

Page 45: 03 mathematics

The operation count for an n × n matrix with m right-hand sides is

2(n − 1) + m(3n − 2)

If |bi| > |ai| + |ci|

no pivoting is necessary, and this is true for many boundary-valueproblems and partial-differential equations.

Sparse matrices are ones in which the majority of the elements arezero. If the structure of the matrix is exploited, the solution time on acomputer is greatly reduced. See Duff, I. S., J. K. Reid, and A. M.Erisman (eds.), Direct Methods for Sparse Matrices, ClarendonPress, Oxford (1986); Saad, Y., Iterative Methods for Sparse LinearSystems, 2d ed., Society for Industrial and Applied Mathematics,Philadelphia (2003). The conjugate gradient method is one methodfor solving sparse matrix problems, since it only involves multiplica-tion of a matrix times a vector. Thus the sparseness of the matrix iseasy to exploit. The conjugate gradient method is an iterative methodthat converges for sure in n iterations where the matrix is an n × nmatrix.

Matrix methods, in particular finding the rank of the matrix, can beused to find the number of independent reactions in a reaction set. Ifthe stoichiometric numbers for the reactions and molecules are put inthe form of a matrix, the rank of the matrix gives the number of inde-pendent reactions. See Amundson, N. R., Mathematical Methods inChemical Engineering, Prentice-Hall, Englewood Cliffs, N.J. (1966,p. 50).

QR Factorization of a Matrix If A is an m × n matrix with m ≥n, there exists an m × m unitary matrix Q = [q1, q2,…,qm] and an m × nright triangular matrix R such that A = QR. The QR factorization isfrequently used in the actual computations when the other transfor-mations are unstable.

Singular-Value Decomposition If A is an m × n matrix with m≥ n and rank k ≤ n, consider the two following matrices.

AA* and A*A

An m × m unitary matrix U is formed from the eigenvectors ui of thefirst matrix.

U [u1,u2, . . ., um]

An n × n unitary matrix V is formed from the eigenvectors vi of thesecond matrix.

V [v1,v2, . . .,vn]

Then matrix A can be decomposed into

A UV*

where is a k × k diagonal matrix with diagonal elements dii = i > 0for 1 ≤ i ≤ k. The eigenvalues of * are 2

i. The vectors ui for k + 1 ≤i ≤ m and vi for k + 1 ≤ i ≤ n are eigenvectors associated with the eigen-value zero; the eigenvalues for 1 ≤ i ≤ k are 2

i. The values of i arecalled the singular values of matrix A. If A is real, then U and V are realand hence orthogonal matrices. The value of the singular-valuedecomposition comes when a process is represented by a linear trans-formation and the elements of A, aij, are the contribution to an outputi for a particular variable as input variable j. The input may be the sizeof a disturbance, and the output is the gain [Seborg, D. E., T. F. Edgar,and D. A. Mellichamp, Process Dynamics and Control, 2d ed., Wiley,New York (2004)]. If the rank is less than n, not all the variables areindependent and they cannot all be controlled. Furthermore, if thesingular values are widely separated, the process is sensitive to smallchanges in the elements of the matrix, and the process will be difficultto control.

Example Consider the following example from Noble and Daniel[Applied Linear Algebra, Prentice-Hall (1987)] with the MATLAB commandsto do the analysis. Define the following real matrix with m = 3 and n = 2 (whoserank k = 1).

>> a = [ 1 1

2 2

2 2 ]

>> a1 = a’*a

a1 = 9 9 %(n × n or 2 × 2)

9 9

>> a2 = a*a´

a2 = 2 4 4 %(m × m or 3 × 3)

4 8 8

4 8 8

>> [v,d1]=eig(a1)

v = −0.7071 0.7071 %(n × n or 2 × 2)

0.7071 0.7071

d1 = 0 0

0 18

>> [u,d2]=eig(a2)

u = 0.8944 0.2981 0.3333 %(m × m or 3 × 3)

−0.4472 0.5963 0.6667

0 −0.7454 0.6667

d2 = 0 0 0

0 0 0

0 0 18

Thus, 21 18 and the eigenfunctions are the rows of v and u. The second

column of v is associated with the eigenvalue 21 18, and the third column of

u is associated with the eigenvalue 21 18.

If A is square and nonsingular, the vector x that minimizes

||Ax b|| (3-71)

is obtained by solving the linear equations

x A−1b

When A is not square, then the solution to

Ax = b

is

x Vy

where yi b′i/i for i 1, . . ., k, b¢ = UTb, and yk1, yk2, . . . , ym are arbitrary.The matrices U and V are those obtained in the singular-value decomposition.The solution which minimizes the norm, Eq. (3-71), is x with yk1, yk2, . . . , ym

zero. These techniques can be used to monitor process variables. See Mont-gomery, D. C., Introduction to Statistical Quality Control, 4th ed., Wiley, NewYork (2001); Piovos, M. J., and K. A. Hoo, “Multivariate Statistics for ProcessControl,” IEEE Control Systems 22(5):8 (2002).

Principal Component Analysis (PCA) PCA is used to recog-nize patterns in data and reduce the dimensionality of the problem.Let the matrix A now represent data with the columns of A represent-ing different samples and the rows representing different variables.The covariance matrix is defined as

cov(A)

This is just the same matrix discussed with singular value decomposi-tion. For data analysis, though, it is necessary to adjust the columns tohave zero mean by subtracting from each entry in the column theaverage of the column entries. Once this is done, the loadings are thevi and satisfy

cov(A) vi 2i vi

and the score vector ui is given by

Avi iui

In process analysis, the columns of A represent different mea-surement techniques (temperatures, pressures, etc.) and the rowsrepresent the measurement output at different times. In that case

ATAm 1

3-42 MATHEMATICS

Page 46: 03 mathematics

the columns of A are adjusted to have a zero mean and a varianceof 1.0 (by dividing each entry in the column by the variance of thecolumn). The goal is to represent the essential variation of theprocess with as few variables as possible. The ui, vi pairs arearranged in descending order according to the associated i. Thei can be thought of as the variance, and the ui, vi pair capturesthe greatest amount of variation in the data. Instead of having todeal with n variables, one can capture most of the variation of the

data by using only the first few pairs. An excellent example of thisis given by Wise, B. M., and B. R. Kowalski, “Process Chemomet-rics,” Chap. 8 in Process Analytical Chemistry, F. McLennan andB. Kowalski (eds.), Blackie Academic & Professional, London(1995). When modeling a slurry-fed ceramic melter, they wereable to capture 97 percent of the variation by using only foureigenvalues and eigenvectors, even though there were 16 vari-ables (columns) measured.

NUMERICAL ANALYSIS AND APPROXIMATE METHODS 3-43

NUMERICAL APPROXIMATIONS TO SOME EXPRESSIONS

APPROXIMATION IDENTITIES

For the following relationships the sign means approximately equalto, when X is small. These equations are derived by using a Taylor’sseries (see “Series Summation and Identities”).

Approximation Approximation

1 " X 1 X 1 X2

11 X

Approximation Approximation

(1 X)n 1 ± nX (1 X)−n 1 " nX

(a X)2 a2 ± 2aX ex 1 + X

sin X X(X rad) tan X X

Y(Y + X) Y2+ X2 Y + smallXY

X 2

2Y

2Y + X

2

NUMERICAL ANALYSIS AND APPROXIMATE METHODS

REFERENCES: Buchanan, G. R., Schaum’s Outline of Finite Element Analysis,McGraw-Hill, New York (1995); Burden, R. L., J. D. Faires, and A. C. Reynolds,Numerical Analysis, 8th ed., Brookes Cole (2004); Chapra, S. C., and R. P. Canal,Numerical Methods for Engineers, 5th ed., McGraw-Hill, New York (2006); Fin-layson, B. A., Nonlinear Analysis in Chemical Engineering, McGraw-Hill (1980),Ravenna Park (2003); Finlayson, B. A., and L. T. Biegler, “Mathematics in Chemi-cal Engineering,” Ullmann’s Encyclopedia of Industrial Chemistry, vol. 20, VCH,Weinheim (2006); Gunzburger, M. D., Finite Element Methods for Viscous Incom-pressible Flows, Academic Press (1989); Kardestuncer, H., and D. H. Norrie (eds.),Finite Element Handbook, McGraw-Hill, New York (1987); Lau, H. T., A Numeri-cal Library in C for Scientists and Engineers, CRC Press (1994); Lau, H. T., ANumerical Library in Java for Scientists and Engineers, CRC Press (2004); Mor-ton, K. W., and D. F. Mayers, Numerical Solution of Partial Differential Equations,Cambridge University Press (1994); Press, W. H., et al., Numerical Recipes, Cam-bridge University Press, Cambridge (1986); Quarteroni, A., and A. Valli, NumericalApproximation of Partial Differential Equations, Springer (1997); Reddy, J. N., andD. K. Gartling, The Finite Element Method in Heat Transfer and Fluid Dynamics,2d ed., CRC Press (2000); Scheid, F., Schaum’s Outline of Numerical Analysis, 2ded., McGraw-Hill, New York (1989); Schiesser, W. E., The Numerical Method ofLines, Academic Press (1991); Shampine, L., Numerical Solution of Ordinary Dif-ferential Equations, Chapman & Hall (1994); Zienkiewicz, O. C., R. L. Taylor, andJ. Z. Zhu, The Finite Element Method: Its Basis and Fundamentals, vol. 1, 6th ed.,Elsevier Butterworth-Heinemann (2005); Zienkiewicz, O. C., and R. L. Taylor, TheFinite Element Method: Solid Mechanics, vol. 2, 5th ed., Butterworth-Heinemann(2000); Zienkiewicz, O. C., and R. L. Taylor, The Finite Element Method: FluidMechanics, vol. 2, 5th ed., Butterworth-Heinemann (2000).

INTRODUCTION

The goal of approximate and numerical methods is to provide conve-nient techniques for obtaining useful information from mathematicalformulations of physical problems. Often this mathematical statementis not solvable by analytical means. Or perhaps analytic solutions areavailable but in a form that is inconvenient for direct interpretation. Inthe first case it is necessary either to attempt to approximate the prob-lem satisfactorily by one which will be amenable to analysis, to obtainan approximate solution to the original problem by numerical means,or to use the two techniques in combination.

Numerical techniques therefore do not yield exact results in thesense of the mathematician. Since most numerical calculations areinexact, the concept of error is an important feature. The error associ-ated with an approximate value is defined as

True value = approximate value + error

The four sources of error are as follows:1. Gross errors. These result from unpredictable human, mechan-

ical, or electrical mistakes.2. Round-off errors. These are the consequence of using a number

specified by m correct digits to approximate a number which requiresmore than m digits for its exact specification. For example, approximatethe irrational number 2 by 1.414. Such errors are often present inexperimental data, in which case they may be called inherent errors,due either to empiricism or to the fact that the computer dictates thenumber of digits. Such errors may be especially damaging in areas suchas matrix inversion or the numerical solution of partial differential equa-tions when the number of algebraic operations is extremely large.

3. Truncation errors. These errors arise from the substitution of afinite number of steps for an infinite sequence of steps which wouldyield the exact result. To illustrate this error consider the infiniteseries for e−x: e−x = 1 − x + x2/2 − x3/6 + ET(x), where ET is the truncationerror, ET = (1/24)e−εx4, 0 < ε < x. If x is positive, ε is also positive. Hencee−ε < 1. The approximation e−x ≈ 1 − x + x2/2 − x3/6 is in error by a pos-itive amount smaller than (1/24)x4.

4. Inherited errors. These arise as a result of errors occurring inthe previous steps of the computational algorithm.

The study of errors in a computation is related to the theory ofprobability. In what follows a relation for the error will be given in cer-tain instances.

A variety of general-purpose computer programs are available com-mercially. Mathematica (http://www.wolfram.com/), Maple (http://www.maplesoft.com/), and Mathcad (http://www.mathcad.com/) allhave the capability of doing symbolic manipulation so that algebraicsolutions can be obtained. For example, Mathematica can solve someordinary and partial differential equations analytically; Maple canmake simple graphs and do linear algebra and simple computations;and Mathcad can do simple calculations. In this section, examples aregiven for the use of Matlab (http://www.mathworks.com/), which is apackage of numerical analysis tools, some of which are accessed bysimple commands and others of which are accessed by writing pro-grams in C. Spreadsheets can also be used to solve certain problems,and these are described below too. A popular program used in chemicalengineering education is Polymath (http://www.polymath-software.com/), which can numerically solve sets of linear or nonlinear equa-tions, ordinary differential equations as initial-value problems, andperform data analysis and regression.

Page 47: 03 mathematics

NUMERICAL SOLUTION OF LINEAR EQUATIONS

See the section entitled “Matrix Algebra and Matrix Computation.”

NUMERICAL SOLUTION OF NONLINEAR EQUATIONS IN ONE VARIABLE

Special Methods for Polynomials Consider a polynomial equa-tion of degree n:

P(x) = a0xn + a1xn − 1 + a2xn − 2 + ⋅⋅⋅ + an − 1x + an = 0 (3-72)

with real coefficients. P(x) has exactly n roots, which may be real orcomplex. If the roots are complex, they occur in pairs with their com-plex conjugates.

One can obtain an upper and lower bound for the real roots by thefollowing device: If a0 > 0 in Eq. (3-72) and if in Eq. (3-72) the firstnegative coefficient is preceded by k coefficients which are positive orzero, and if G is the greatest of the absolute values of the negativecoefficients, then each real root is less than 1 + k

G/a0.

Example P(x) = x5 + 3x4 − 7x2 − 40x + 2 = 0. Here a0 = 1, G = 40, and k = 3since we must supply 0 as the coefficient for x3. Thus 1 + 3

40 ≈ 4.42 is an upperbound for the real roots. The largest real root is 2.19.

A lower bound to the real roots may be found by applying the crite-rion to the equation P(−x).

Example P(−x) = −x5 + 3x4 − 7x2 + 40x + 2 = 0, which is equivalent to x5 −3x4 + 7x2 − 40x − 2 = 0 since a0 must be +. Then a0 = 1, G = 40, and k = 1. Hence−(1 + 40) = −41 is a lower bound. The smallest real root is −3.41. Thus all realroots are between −41 and 4.42.

One last result is helpful in getting an estimate of how many posi-tive and negative real roots there are.

Descartes Rule The number of positive real roots of a poly-nomial with real coefficients is either equal to the number of changesin sign v or is less than v by a positive even integer. The number ofnegative roots of P(x) is either equal to the number of variations ofsign of P(−x) or is less than this by a positive even integer.

Example f(x) = x4 − 13x2 + 4x − 2 = 0 has three changes in sign; therefore,there are either three or one positive root. f(−x) = x4 − 13x2 − 4x − 2 has onechange in sign. Therefore, there is one negative root. Using MATLAB, onedefines the vector

C(i) = ai−1

and uses the command roots(c) to find all the roots.

Methods for Nonlinear Equations in One VariableSuccessive Substitutions Let f(x) = 0 be the nonlinear equation

to be solved. If this is rewritten as x = F(x), then an iterative schemecan be set up in the form xk + 1 = F(xk). To start the iteration an initialguess must be obtained graphically or otherwise. The convergence ordivergence of the procedure depends upon the method of writing x =F(x), of which there will usually be several forms. However, if a is aroot of f(x) = 0, and if |F ′(a)| < 1, then for any initial approximation suf-ficiently close to a, the method converges to a. This process is calledfirst order because the error in xk + 1 is proportional to the first powerof the error in xk for large k.

One way of writing the equation is xk + 1 = xk + β f(xk). The choice of β is madesuch that |1 + β df/dx(a)| < 1. Convergence is guaranteed by the theorem givenfor simultaneous equations.

Methods of Perturbation Let f(x) = 0 be the equation. In gen-eral, the iterative relation is

xk + 1 = xk − [ f(xk)/αk]

where the iteration begins with x0 as an initial approximation and αk assome functional, derived below.

Newton-Raphson Procedure This variant chooses αk = f ′(xk)where f ′ = df/dx and geometrically consists of replacing the graph of

f(x) by the tangent line at x = xk in each successive step. If f ′(x) andf″(x) have the same sign throughout an interval a ≤ x ≤ b containingthe solution, with f(a), f(b) of opposite signs, then the process con-verges starting from any x0 in the interval a ≤ x ≤ b. The process is sec-ond order.

Example f(x) = x − 1 +

f ′(x) = 1 − 2.3105[0.5]x

An approximate root (obtained graphically) is 2.

Step k xk f(xk) f ′(xk)

0 2 0.1667 0.42241 1.6054 0.0342 0.24072 1.4632 0.0055 0.1620

Method of False Position This variant is commenced by findingx0 and x1 such that f(x0), f(x1) are of opposite signs. Then α1 = slope ofsecant line joining [x0, f(x0)] and [x1, f(x1)] so that

x2 = x1 − f(x1)

In each of the following steps αk is the slope of the line joining [xk,f(xk)] to the most recently determined point where f(xj) has the oppo-site sign from that of f(xk). This method is of first order. If one uses themost recently determined point (regardless of sign), the method is asecant method.

Method of Wegstein This is a variant of the method of successivesubstitutions which forces and/or accelerates convergence. The itera-tive procedure xk + 1 = F(xk) is revised by setting xk + 1 = F(xk) and thentaking xk + 1 = qxk + (1 − q)xk + 1, where q is a suitably chosen numberwhich may be taken as constant throughout or may be adjusted at eachstep. Wegstein found that suitable q’s are:

Behavior of successive substitution process Range of optimum q

Oscillatory convergence 0 < q < aOscillatory divergence without Wegstein a < q < 1Monotonic convergence q < 0Monotonic divergence without Wegstein 1 < q

At each step q may be calculated to give a locally optimum value bysetting

q =

The Wegstein method is a secant method applied to g(x) x − F(x). InMicrosoft Excel, roots are found by using Goal Seek or Solver. Assignone cell to be x, put the equation for f(x) in another cell, and let GoalSeek or Solver find the value of x that makes the equation cell zero. InMATLAB, the process is similar except that a function (m-file) isdefined and the command fzero(‘f’,x0) provides the solution x, startingfrom the initial guess x0.

METHODS FOR MULTIPLE NONLINEAR EQUATIONS

Method of Successive Substitutions Write a system of equa-tions as

αi = fi(α), or α = f(α)

The following theorem guarantees convergence. Let α be the solutionto αi = fi(α). Assume that given h > 0, there exists a number 0 < µ < 1such that

n

j = 1 ≤ µ for |xi − αi| < h, i = 1, . . . , n

x ik + 1 = fi(xi

k)

Then x ik → αi

as k increases [see Finlayson (2003)].

∂fi∂xj

xk + 1 − xkxk + 1 − 2 xk + xk− 1

x1 − x0f(x1) − f(x0)

(0.5)x − 0.5

0.3

3-44 MATHEMATICS

Page 48: 03 mathematics

Newton-Raphson Method To solve the set of equations

Fi(x1, x2, . . . , xn) = 0, or Fi(xj) = 0, or Fi(x) = 0

one uses a truncated Taylor series to give

0 = Fi(x k) + n

j = 1

xk(x j

k + 1 − x jk)

Thus one solves iteratively from one point to another.

n

j = 1

Aijk (x j

k + 1 − x jk) = −Fi(xk)

where Aijk =

xk

This method requires solution of sets of linear equations until thefunctions are zero to some tolerance or the changes of the solutionbetween iterations is small enough. Convergence is guaranteed pro-vided the norm of the matrix A is bounded, F(x) is bounded for theinitial guess, and the second derivative of F(x) with respect to all vari-ables is bounded. See Finlayson (2003).

Method of Continuity (Homotopy) In the case of n equationsin n unknowns, when n is large, determining the approximate solutionmay involve considerable effort. In such a case the method of conti-nuity is admirably suited for use on digital computers. It consists basi-cally of the introduction of an extra variable into the n equations

fi(x1, x2, . . . , xn) = 0 i = 1, . . . , n (3-73)

and replacing them by

fi(x1, x2, . . . , xn, λ) = 0 i = 1, . . . , n (3-74)

where λ is introduced in such a way that the functions (3-74) dependin a simple way upon λ and reduce to an easily solvable system for λ =0 and to the original equations (3-73) for λ = 1. A system of ordinarydifferential equations, with independent variable λ, is then con-structed by differentiating Eqs. (3-74) with respect to λ. There results

n

j = 1

+ = 0 (3-75)

where x1, . . . , xn are considered as functions of λ. Equations (3-75)are integrated, with initial conditions obtained from Eqs. (3-74) withλ = 0, from λ = 0 to λ = 1. If the solution can be continued to λ = 1, thevalues of x1, . . . , xn for λ = 1 will be a solution of the original equa-tions. If the integration becomes infinite, the parameter λ must beintroduced in a different fashion. Integration of the differential equa-tions (which are usually nonlinear in λ) may be accomplished by usingtechniques described under “Numerical Solution of Ordinary Differ-ential Equations.”

Other Methods Other methods can be found in the literature.See Chan, T. F. C., and H. B. Keller, SIAM J. Sci. Stat. Comput.3:173–194 (1982); Seader, J. D., “Computer Modeling of ChemicalProcesses,” AIChE Monograph Series 81(15) (1986).

INTERPOLATION AND FINITE DIFFERENCES

Linear Interpolation If a function f(x) is approximately linear ina certain range, then the ratio

= f [x0, x1]

is approximately independent of x0, x1 in the range. The linear approx-imation to the function f(x), x0 < x < x1 then leads to the interpolationformula

f(x) ≈ f(x0) + (x − x0)f [x0, x1]

≈ f(x0) + [ f(x1) − f(x0)]

≈ [(x1 − x) f(x0) − (x0 − x) f(x1)]1

x1 − x0

x − x0x1 − x0

f(x1) − f(x0)

x1 − x0

∂fi∂λ

dxjdλ

∂fi∂xj

∂Fi∂xj

∂Fi∂xj

Divided Differences of Higher Order and Higher-OrderInterpolation The first-order divided difference f[x0, x1] wasdefined previously. Divided differences of second and higher orderare defined iteratively by

f[x0, x1, x2] =

f[x0, x1, . . . , xk] =

and a convenient form for computational purposes is

f[x0, x1, . . . , xk] = k′

j = 0

for any k ≥ 0, where the ′ means that the term (xj − xj) is omitted in thedenominator. For example,

f [x0, x1, x2] = + +

If the accuracy afforded by a linear approximation is inadequate, agenerally more accurate result may be based upon the assumptionthat f(x) may be approximated by a polynomial of degree 2 or higherover certain ranges. This assumption leads to Newton’s fundamentalinterpolation formula with divided differences

f(x) ≈ f(x0) + (x − x0) f[x0, x1] + (x − x0)(x − x1) f[x0, x1, x2]

+ ⋅⋅⋅ + (x − x0)(x − x1) ⋅⋅⋅ (x − xn − 1) f [x0, x1, . . . , xn] + En(x)

where En(x) = error = f n + 1(ε)π(x)

where minimum (x0, . . . , x) < ε < maximum (x0, x1, . . . , xn, x) and π(x) = (x − x0)(x − x1) ⋅⋅⋅ (x − xn). In order to use the previous equationmost effectively one may first form a divided-difference table. Forexample, for third-order interpolation the difference table is

where each entry is given by taking the difference between diagonallyadjacent entries to the left, divided by the abscissas corresponding tothe ordinates intercepted by the diagonals passing through the calcu-lated entry.

Equally Spaced Forward Differences If the ordinates areequally spaced, i.e., xj − xj − 1 = ∆x for all j, then the first differences are denoted by ∆f(x0) = f(x1) − f(x0) or ∆y0 = y1 − y0, where y = f(x). Thedifferences of these first differences, called second differences, aredenoted by ∆2y0, ∆2y1, . . . , ∆2yn. Thus

∆2y0 = ∆y1 − ∆y0 = y2 − y1 − y1 + y0 = y2 − 2y1 + y0

And in general

∆ jy0 = j

n = 0

(−1)n yj − n

where = = binomial coefficients.

If the ordinates are equally spaced,

xn + 1 − xn = ∆xyn = y(xn)

then the first and second differences are denoted by

∆yn = yn + 1 − yn

∆2yn = ∆yn + 1 − ∆yn = yn + 2 − 2yn + 1 + yn

j!n!( j − n)!

jn

jn

x0

x1

x2

x3

f(x0)

f(x1)

f(x2)

f(x3)

f [x0, x1]

f [x1, x2]f [x0, x1, x2]

f [x0, x1, x2, x3]f [x1, x2, x3]

f [x2, x3]

1(n + 1)!

f(x2)(x2 − x0)(x2 − x1)

f(x1)(x1 − x0)(x1 − x2)

f(x0)(x0 − x1)(x0 − x2)

f(xj)(xj − x0)(xj − x1) ⋅⋅⋅ (xj − xk)

f[x1, . . . , xk] − f[x0, x1, . . . , xk − 1]

xk − x0

f [x1, x2] − f[x0, x1]

x2 − x0

NUMERICAL ANALYSIS AND APPROXIMATE METHODS 3-45

Page 49: 03 mathematics

A new variable is defined

α =

and the finite interpolation formula through the points y0, y1, . . . , yn

is written as follows:

yα = y0 + α ∆y0 + ∆2y0 + ⋅⋅⋅ + ∆ny0

(3-76)

Keeping only the first two terms gives a straight line through (x0, y0)–(x1, y1); keeping the first three terms gives a quadratic functionof position going through those points plus (x2, y2). The value α = 0gives x = x0; α = 1 gives x = x1, and so on.

Equally Spaced Backward Differences Backward differencesare defined by

∇yn = yn − yn − 1

∇ 2yn = ∇yn − ∇yn − 1 = yn − 2 yn − 1 + yn − 2

The interpolation polynomial of order n through the points y0, y−1,. . . , y−n is

yα = y0 + α ∇y0 + ∇2y0 + ⋅⋅⋅ + ∇ny0

The value of α = 0 gives x = x0; α = −1 gives x = x−1, and so on. Alter-natively, the interpolation polynomial of order n through the points y1,y0, y−1, . . . , y−n is

yα = y1 + (α − 1) ∇y1 + ∇ 2y1 (3-77)

+ ⋅⋅⋅ + ∇ny1

Now α = 1 gives x = x1; α = 0 gives x = x0.Central Differences The central difference denoted by

δf(x) = fx + − f x − δ2f(x) = δfx + − δf x − = f(x + h) − 2f(x) + f(x − h)

δnf(x) = δn − 1 f x + − δn − 1 fx − is useful for calculating at the interior points of tabulated data.

Lagrange Interpolation Formulas A global polynomial isdefined over the entire region of space

Pm(x) = m

j = 0

cj x j

This polynomial is of degree m (highest power is xm) and order m + 1(m + 1 parameters cj). If we are given a set of m + 1 points

y1 = f(x1), y2 = f(x2), . . . , ym + 1 = f(xm + 1)

then Lagrange’s formula gives a polynomial of degree m that goesthrough the m + 1 points:

Pm(x) = y1

+ y2 + ⋅⋅⋅

+ ym + 1

Note that each coefficient of yj is a polynomial of degree m that van-ishes at the points xj (except for one value of j) and takes the value of

(x − x1) (x − x2) ⋅⋅⋅ (x − xm + 1)(xm + 1 − x1) (xm + 1 − x2) ⋅⋅⋅ (xm + 1 − xm)

(x − x1) (x − x3) ⋅⋅⋅ (x − xm + 1)(x2 − x1) (x2 − x3) ⋅⋅⋅ (x2 − xm + 1)

(x − x2) (x − x3) ⋅⋅⋅ (x − xm + 1)(x1 − x2) (x1 − x3) ⋅⋅⋅ (x1 − xm + 1)

h2

h2

h2

h2

h2

h2

(α − 1) α (α + 1) ⋅⋅⋅ (α + n − 2)

n!

α (α − 1)

2!

α (α + 1) ⋅⋅⋅ (α + n − 1)

n!α (α + 1)

2!

α (α − 1) ⋅⋅⋅ (α − n + 1)

n!

α (α − 1)

2!

xα − x0∆x

1.0 at that point:

Pm(xj) = yj, j = 1, 2, . . . , m + 1

If the function f(x) is known, the error in the approximation is, perAnderson, E., et al., LAPACK Users’ Guide, SIAM(1992),

|error(x)| ≤ maxx1 ≤ x ≤ xm + 1| f (m + 2)(x)|

The evaluation of Pm(x) at a point other than at the defining points canbe made with Neville’s algorithm [Press et al. (1986)]. Let P1 be thevalue at x of the unique function passing through the point (x1, y1); orP1 = y1. Let P12 be the value at x of the unique polynomial passingthrough the points x1 and x2. Likewise, Pijk . . . r is the unique polynomialpassing through the points xi, xj, xk, . . . xr. Then use the table

x1 y1 = P1

P12

x2 y2 = P2 P123

P23 P1234

x3 y3 = P3 P234

P34

x4 y4 = P4

These entries are defined using

Pi(i + 1) . . . (i + m) =

For example, consider P1234. The terms on the right-hand side involveP123 and P234. The “parents,” P123 and P234, already agree at points 2 and3. Here i = 1, m = 3; thus, the parents agree at xi + 1, . . . , xi + m − 1

already. The formula makes Pi(i + 1) . . . (i + m) agree with the function at theadditional points xi + m and xi. Thus, Pi(i + 1) . . . (i + m) agrees with the func-tion at all the points xi, xi + 1, . . . xi + m.

Spline Functions Splines are functions that match given valuesat the points x1, . . . , xNT and have continuous derivatives up to someorder at the knots, or the points x2, . . . , xNT − 1. Cubic splines are mostcommon; see de Boor, C., A Practical Guide to Splines, Springer(1978). The function is represented by a cubic polynomial within eachinterval (xi, xi + 1) and has continuous first and second derivatives at theknots. Two more conditions can be specified arbitrarily. These areusually the second derivatives at the two end points, which are com-monly taken as zero; this gives the natural cubic splines.

Take yi = y(xi) at each of the points xi, and let ∆xi = xi + 1 − xi. Then,in the interval (xi, xi + 1), the function is represented as a cubic poly-nomial.

Ci(x) = a0i + a1i x + a2ix2 + a3ix3

The interpolating function takes on specified values at the knots andhas continuous first and second derivatives at the knots. Within the ithinterval, the function is

Ci(x) = Ci(xi) + C′i(xi)(x − xi) + C″i (xi)

+ [C″i (xi + 1) − C″i (xi)]

where Ci(xi) = yi. The second derivative C″i (xi) = y″i is found by solvingthe following tridiagonal system of equations:

y″i − 1∆xi − 1 + y″i 2(∆xi − 1 + ∆xi) + y″i + 1∆xi = − 6 − Since the continuity conditions apply only for i = 2, . . . , NT − 1, wehave only NT − 2 conditions for the NT values of y″i . Two additionalconditions are needed, and these are usually taken as the value of thesecond derivative at each end of the domain, y″1 , y″NT. If these valuesare zero, we get the natural cubic splines; they can also be set toachieve some other purpose, such as making the first derivative matchsome desired condition at the two ends. With these values taken aszero in the natural cubic spline, we have a NT − 2 system of tridiago-nal equations, which is easily solved. Once the second derivatives are

yi + 1 − yi

∆xi

yi − yi − 1∆xi − 1

(x − xi)3

6∆xi

(x − xi)

2∆xi

(x − xi + m) Pi(i + 1) . . . (i + m − 1) + (xi − x) P(i + 1)(i + 2) . . . (i + m)

xi − xi + m

|xm + 1 − x1|m + 1

(m + 2)!

3-46 MATHEMATICS

Page 50: 03 mathematics

known at each of the knots, the first derivatives are given by

y′i = − y″i − y″i + 1

The function itself is then known within each element.

NUMERICAL DIFFERENTIATION

Numerical differentiation should be avoided whenever possible, par-ticularly when data are empirical and subject to appreciable observa-tion errors. Errors in data can affect numerical derivatives quitestrongly; i.e., differentiation is a roughening process. When such a cal-culation must be made, it is usually desirable first to smooth the datato a certain extent.

Use of Interpolation Formula If the data are given over equidis-tant values of the independent variable x, an interpolation formulasuch as the Newton formula [Eq. (3-76) or (3-77)] may be used andthe resulting formula differentiated analytically. If the independentvariable is not at equidistant values, then Lagrange’s formulas must beused. By differentiating three-point Lagrange interpolation formulasthe following differentiation formulas result for equally spaced tabularpoints:

Three-Point Formulas Let x0, x1, x2 be the three points.

f ′(x0) = [−3f(x0) + 4f(x1) − f(x2)] + f ′″(ε)

f ′(x1) = [−f(x0) + f(x2)] − f ′″(ε)

f ′(x2) = [ f(x0) − 4f(x1) + 3f(x2)] + f ′″(ε)

where the last term is an error term mijn. xj < ε < ma

jx. xj.

Smoothing Techniques These techniques involve the approxi-mation of the tabular data by a least-squares fit of the data by usingsome known functional form, usually a polynomial (for the concept ofleast squares see “Statistics”). In place of approximating f(x) by a sin-gle least-squares polynomial of degree n over the entire range of thetabulation, it is often desirable to replace each tabulated value by thevalue taken on by a least-squares polynomial of degree n relevant to asubrange of 2M + 1 points centered, when possible, at the point forwhich the entry is to be modified. Thus each smoothed value replacesa tabulated value. Let fj = f(xj) be the tabular points and yj = smoothedvalues.

First-Degree Least Squares with Three Points

y0 = j[5f0 + 2 f1 − f2]

y1 = s[ f0 + f1 + f2]

y2 = j[−f0 + 2 f1 + 5f2]

The derivatives at all the points are

f ′0 = f ′1 = f ′2 = 12

h [y2 − y1]

Second-Degree Least Squares with Five Points For fiveevenly spaced points x−2, x−1, x0, x1, and x2 (separated by distance h) andtheir ordinates f−2, f−1, f0, f1, and f2, assume a parabola is fit by leastsquares. Then the derivative at the center point is

f ′0 = 1/10h [−2f−2 − f−1 + f1 + 2f2]

The derivatives at the other points are

f ′−2 = 1/70h [−54 f−2 + 13 f−1 + 40 f0 + 27 f1 − 26 f2]

f ′−1 = 1/70h [−34 f−2 + 3 f−1 + 20 f0 + 17 f1 − 6 f2]

f ′1 = 1/70h [6 f−2 − 17 f−1 − 20 f0 − 3 f1 + 34 f2]

f ′2 = 1/70h [26 f−2 − 27 f−1 − 40 f0 − 13 f1 + 54 f2]

Numerical Derivatives The results given above can be used toobtain numerical derivatives when solving problems on the computer,

h2

3

12h

h2

6

12h

h2

3

12h

∆xi6

∆xi3

yi + 1 − yi

∆xi

in particular for the Newton-Raphson method and homotopy meth-ods. Suppose one has a program, subroutine, or other function evalu-ation device that will calculate f given x. One can estimate the value ofthe first derivative at x0 using

x0

(a first-order formula) or

x0

(a second-order formula). The value of ε is important; a value of 10−6

is typical, but smaller or larger values may be necessary depending onthe computer precision and the application. One must also be surethat the value of x0 is not zero and use a different increment in thatcase.

NUMERICAL INTEGRATION (QUADRATURE)

A multitude of formulas have been developed to accomplish numeri-cal integration, which consists of computing the value of a definiteintegral from a set of numerical values of the integrand.

Newton-Cotes Integration Formulas (Equally Spaced Ordi-nates) for Functions of One Variable The definite integral ∫ b

a f(x) dx is to be evaluated.Trapezoidal Rule This formula consists of subdividing the inter-

val a ≤ x ≤ b into n subintervals a to a + h, a + h to a + 2h, . . . andreplacing the graph of f(x) by the result of joining the ends of adjacentordinates by line segments. If fj = f(xj) = f(a + jh), f0 = f(a), fn = f(b), theintegration formula is

b

af(x) dx = [ f0 + 2f1 + 2f2 + ⋅ ⋅ ⋅ + 2fn − 1 + fn] + En

where |En| = |f″(ε)| = | f″(ε)| a < ε < b

This procedure is not of high accuracy. However, if f″(x) is continuousin a < x < b, the error goes to zero as 1/n2, n → ∞.

Parabolic Rule (Simpson’s Rule) This procedure consists ofsubdividing the interval a < x < b into n/2 subintervals, each of length2h, where n is an even integer. By using the notation as above the inte-gration formula is

b

af(x) dx = [ f0 + 4 f1 + 2 f2 + 4 f3 + ⋅ ⋅ ⋅

+ 4 fn − 3 + 2 fn − 2 + 4 fn − 1 + fn] + En

where |En| = | f (IV)(ε)| = | f (IV)(ε)| a < ε < b

This method approximates f(x) by a parabola on each subinterval. Thisrule is generally more accurate than the trapezoidal rule. It is the mostwidely used integration formula.

Gaussian Quadrature Gaussian quadrature provides a highlyaccurate formula based on irregularly spaced points, but the integralneeds to be transformed onto the interval 0 to 1.

x = a + (b − a)u, dx = (b − a)du

b

af(x) dx = (b − a) 1

0f(u) du

1

0f(u) du =

m

i = 1

Wi f(ui)

The quadrature is exact when f is a polynomial of degree 2m − 1 in x.Because there are m weights and m Gauss points, we have 2m param-eters that are chosen to exactly represent a polynomial of degree

(b − a)5

180n4

nh5

180

h3

(b − a)3

12n2

nh3

12

h2

f[x0(1 + ε)] − f[x0(1 − ε)]

2εx0

dfdx

f[x0(1 + ε)] − f[x0]

εx0

dfdx

NUMERICAL ANALYSIS AND APPROXIMATE METHODS 3-47

Page 51: 03 mathematics

2m − 1, which has 2m parameters. The Gauss points and weights aregiven in the table.

Gaussian Quadrature Points and Weights

m ui Wi

2 0.21132 48654 0.50000 000000.78867 51346 0.50000 00000

3 0.11270 16654 0.27777 777780.50000 00000 0.44444 444450.88729 83346 0.27777 77778

4 0.06943 18442 0.17392 742260.33000 94783 0.32607 257740.66999 05218 0.32607 257740.93056 81558 0.17392 74226

5 0.04691 00771 0.11846 344250.23076 53450 0.23931 433530.50000 00000 0.28444 444440.76923 46551 0.23931 433530.95308 99230 0.11846 34425

Example Calculate the value of the following integral.

I = 1

0e−x sin x dx (3-78)

Using the Gaussian quadrature formulas gives the following values for variousvalues of m. Clearly, three internal points, requiring evaluation of the integrandat only three points, gives excellent results.

m I

2 0.246096430623683 0.245834877365054 0.245837004442935 0.24583700700700

Romberg’s Method Romberg’s method uses extrapolation tech-niques to improve the answer [Press et al. (1986)]. If we let I1 be thevalue of the integral obtained using interval size h = ∆x, and I2 be thevalue of I obtained when using interval size h/2, and I0 the true valueof I, then the error in a method is approximately hm, or

I1 ≈ I0 + chm

I2 ≈ I0 + c m

Replacing the ≈ by an equality (an approximation) and solving for cand I0 gives

I0 =

This process can also be used to obtain I1, I2, . . . , by halving h eachtime, and then calculating new estimates from each pair, calling themJ1, J2, . . . ; that is, in the formula above, replace I0 with J1. The formu-las are reapplied for each pair of J to obtain K1, K2, . . . The processcontinues until the required tolerance is obtained.

I1 I2 I3 I4

J1 J2 J3

K1 K2

L1

Romberg’s method is most useful for a low-order method (small m)because significant improvement is then possible.

Example Evaluate the same intergral (3-78) by using the trapezoid ruleand then apply the Romberg method. Use 11, 21, 41, and 81 points with m = 2.To achieve six-digit accuracy, any result from J2 through L1 is suitable, eventhough the base results (I1 through I4) are not that accurate.

I1 = 0.24491148225216 I2 = 0.24560560017077 I3 = 0.24577915369183 I4 = 0.24582254357310J1 = 0.24583697281030 J2 = 0.24583700486552 J3 = 0.24583700686685

K1 = 0.24583701555059 K2 = 0.24583700753396L1 = 0.24583700486175

2mI2 − I1

2m − 1

h2

Computer Methods These methods are easily programmed in aspreadsheet program such as Microsoft Excel. In MATLAB, thetrapezoid rule can be calculated by using the command trapz(x,y),where x is a vector of x values xi and y is a vector of values y(xi). Alter-natively, use the commands

F = @(x) exp(-x).*sin(x)Q = quad(F,0,1)

I1 = 0.24491148225216 I2 = 0.24560560017077 I3 = 0.24577915369183 I4 = 0.24582254357310J1 = 0.24583697281030 J2 = 0.24583700486552 J3 = 0.24583700686685

K1 = 0.24583701555059 K2 = 0.24583700753396L1 = 0.24583700486175

Singularities When the integrand has singularities, a variety oftechniques can be tried. The integral may be divided into one partthat can be integrated analytically near the singularity and anotherpart that is integrated numerically. Sometimes a change of argumentallows analytical integration. Series expansion might be helpful, too.When the domain is infinite, it is possible to use Gauss-Legendre orGauss-Hermite quadrature. Also a transformation can be made. Forexample, let u = 1/x and then

b

af(x) dx = 1/a

1/bf du ab > 0

Two-Dimensional Formula Two-dimensional integrals can becalculated by breaking down the integral into one-dimensional inte-grals.

b

ag2(x)

g1(x)f(x, y) dx dy = b

aG(x) dx

G(x) = g2(x)

g1(x)f(x, y) dy

Gaussian quadrature can also be used in two dimensions, provided theintegration is on a square or can be transformed to one. (Domaintransformations might be used to convert the domain to a square.)

1

01

0f(x, y) dx dy =

mx

i = 1

Wxi my

j = 1

Wyj f(xi, yj)

NUMERICAL SOLUTION OF ORDINARY DIFFERENTIALEQUATIONS AS INITIAL VALUE PROBLEMS

A differential equation for a function that depends on only one vari-able, often time, is called an ordinary differential equation. The generalsolution to the differential equation includes many possibilities; theboundary or initial conditions are needed to specify which of those aredesired. If all conditions are at one point, then the problem is an initialvalue problem and can be integrated from that point on. If some of theconditions are available at one point and others at another point, thenthe ordinary differential equations become two-point boundary valueproblems, which are treated in the next section. Initial value problemsas ordinary differential equations arise in control of lumped parametermodels, transient models of stirred tank reactors, and in all modelswhere there are no spatial gradients in the unknowns.

A higher-order differential equation

y(n) + F(y(n − 1), y(n − 2), . . . , y′, y) = 0

with initial conditions

Gi(y(n − 1)(0), y(n − 2)(0), . . . , y(0), y(0)) = 0, i = 1, . . . , n

can be converted into a set of first-order equations using

yi y(i − 1) = = y(i − 2) =

The higher-order equation can be written as a set of first-order equa-tions.

= y2dy1dt

dyi − 1

dtd

dt

d(i − 1)ydt(i − 1)

1u

1u2

3-48 MATHEMATICS

Page 52: 03 mathematics

= y3

= y4

. . .

= −F(yn − 1, yn − 2, . . . , y2, y1)

The initial conditions would have to be specified for variablesy1(0), . . . , yn(0), or equivalently y(0), . . . , y(n − 1)(0). The set of equa-tions is then written as

= f(y, t)

All the methods in this section are described for a single equation; themethods apply to multiple equations.

Euler’s method is first-order.

yn + 1 = yn + ∆t f(yn)

and errors are proportional to ∆t. The second-order Adams-Bashforthmethod is

yn + 1 = yn + [3 f(yn) − f(yn − 1)]

Errors are proportional to ∆ t2, and high-order methods are available.Notice that the higher-order explicit methods require knowing thesolution (or the right-hand side) evaluated at times in the past. Sincethese were calculated to get to the current time, this presents no prob-lem except for starting the problem. Then it may be necessary to useEuler’s method with a very small step size for several steps in order togenerate starting values at a succession of time points. The errorterms, order of the method, function evaluations per step, and stabil-ity limitations are listed in Finlayson (2003). The advantage of thehigh-order Adams-Bashforth method is that it uses only one functionevaluation per step yet achieves high-order accuracy. The disadvan-tage is the necessity of using another method to start.

Runge-Kutta methods are explicit methods that use several func-tion evaluations for each time step. Runge-Kutta methods are tradi-tionally written for f(t, y). The first-order Runge-Kutta method isEuler’s method. A second-order Runge-Kutta method is

yn + 1 = yn + [ f n + f(tn + ∆t, yn + ∆t f n)]

while the midpoint scheme is also a second-order Runge-Kuttamethod.

yn + 1 = yn + ∆t f tn + , yn + f nA popular fourth-order Runge-Kutta method is the Runge-Kutta-

Feldberg formulas, which have the property that the method isfourth-order but achieves fifth-order accuracy. The popular integra-tion package RKF45 is based on this method.

k1 = ∆t f(tn, yn)

k2 = ∆t f tn + , yn + k3 = ∆t f tn + ∆t, yn + k1 + k2k4 = ∆t f tn + ∆t, yn + k1 − k2 + k3k5 = ∆t f tn + ∆t, yn + k1 − 8k2 + k3 − k4k6 = ∆t f tn + , yn − k1 + 2k2 − k3 + k4 − k511

40

18594104

35442565

827

∆t2

8454104

3680513

439216

72962197

72002197

19322197

1213

932

332

38

k14

∆t4

∆t2

∆t2

∆t2

∆t2

dydt

dyndt

dy3dt

dy2dt

yn + 1 = yn + k1 + k3 + k4 − k5

zn + 1 = yn + k1 + k3 + k4 − k5 + k6

The value of yn + 1 − zn + 1 is an estimate of the error in yn + 1 and can beused in step-size control schemes.

Usually one would use a high-order method to achieve high accu-racy. The Runge-Kutta-Feldberg method is popular because it is highorder and does not require a starting method (as does an Adams-Bashforth method). However, it does require four function evalua-tions per time step, or four times as many as a fourth-orderAdams-Bashforth method. For problems in which the function evalu-ations are a significant portion of the calculation time, this might beimportant. Given the speed and availability of desktop computers, theefficiency of the methods is most important only for very large prob-lems that are going to be solved many times. For other problems, themost important criterion for choosing a method is probably the timethe user spends setting up the problem.

The stability limits for the explicit methods are based on the largesteigenvalue of the linearized system of equations

ddyt

i = n

i=1 Aij yj, Aij = δδyfi

jy

For linear problems, the eigenvalues do not change, so that the stabil-ity and oscillation limits must be satisfied for every eigenvalue of thematrix A. When solving nonlinear problems, the equations are linearizedabout the solution at the local time, and the analysis applies for smallchanges in time, after which a new analysis about the new solution mustbe made. Thus, for nonlinear problems, the eigenvalues keep changing,and the largest stable time step changes, too. The stability limits are:

Euler method, λ ∆t ≤ 2Runge-Kutta, 2nd order, λ ∆t < 2Runge-Kutta-Feldberg, λ ∆t < 3.0Richardson extrapolation can be used to improve the accuracy of

a method. Suppose we step forward one step ∆t with a pth-ordermethod. Then redo the problem, this time stepping forward from thesame initial point, but in two steps of length ∆t/2, thus ending at thesame point. Call the solution of the one-step calculation y1 andthe solution of the two-step calculation y2. Then an improved solutionat the new time is given by

y =

This gives a good estimate provided ∆t is small enough that themethod is truly convergent with order p. This process can also berepeated in the same way Romberg’s method was used for quadrature.

The error term in the various methods can be used to deduce a stepsize that will give a user-specified accuracy. Most packages today arebased on a user-specified tolerance; the step-size is changed duringthe calculation to achieve that accuracy. The accuracy itself is notguaranteed, but it improves as the tolerance is decreased.

Implicit Methods By using different interpolation formulasinvolving yn + 1, it is possible to derive implicit integration methods.Implicit methods result in a nonlinear equation to be solved for yn + 1

so that iterative methods must be used. The backward Euler methodis a first-order method.

yn + 1 = yn + ∆t f(yn + 1)

Errors are proportional to ∆t for small ∆t. The trapezoid rule is a second-order method.

yn + 1 = yn + [ f(yn) + f(yn + 1)]

Errors are proportional to ∆ t2 for small ∆ t. When the trapezoid ruleis used with the finite difference method for solving partial differen-tial equations, it is called the Crank-Nicolson method. The implicitmethods are stable for any step size but do require the solution of aset of nonlinear equations, which must be solved iteratively. The setof equations can be solved using the successive substitution method

∆t2

2 py2 − y1

2 p − 1

255

950

2856156430

665612825

16135

15

21974104

14082565

25216

NUMERICAL ANALYSIS AND APPROXIMATE METHODS 3-49

Page 53: 03 mathematics

or Newton-Raphson method. See Bogacki, M. B, K. Alejski, andJ. Szymanewski, Comp. Chem. Eng. 13: 1081–1085 (1989) for anapplication to dynamic distillation problems.

The best packages for stiff equations (see below) use Gear’s back-ward difference formulas. The formulas of various orders are [Gear,G. W., Numerical Initial Value Problems in Ordinary DifferentialEquations, Prentice-Hall, Englewood Cliffs, N.J. (1971)]

(1) yn + 1 = yn + ∆t f(yn + 1)

(2) yn + 1 = yn − yn − 1 + ∆t f(yn + 1)

(3) yn + 1 = yn − yn − 1 + yn − 2 + ∆t f(yn + 1)

(4) yn + 1 = yn − yn − 1 + yn − 2 − yn − 3 + ∆t f(yn + 1)

(5) yn + 1 = yn − yn − 1 + yn − 2 − yn − 3 + yn − 4

+ ∆t f(yn + 1)

Stiffness The concept of stiffness is described for a system of lin-ear equations.

= Ay

Let λi be the eigenvalues of the matrix A. The stiffness ratio is definedas

SR = (3-79)

SR = 20 is not stiff, SR = 103 is stiff, and SR = 106 is very stiff. If theproblem is nonlinear, then the solution is expanded about the currentstate.

= fi [y(tn)] + n

j = 1

[yj − yj(tn)]

The question of stiffness then depends on the solution at the currenttime. Consequently nonlinear problems can be stiff during one timeperiod and not stiff during another. While the chemical engineer maynot actually calculate the eigenvalues, it is useful to know that theydetermine the stability and accuracy of the numerical scheme and thestep size used.

Problems are stiff when the time constants for different phenom-ena have very different magnitudes. Consider flow through a packedbed reactor. The time constants for different phenomena are:

1. Time for device flow-through

tflow = =

where Q is the volumetric flow rate, A is the cross sectional area, L isthe length of the packed bed, and φ is the void fraction;

2. Time for reaction

tr × n =

where k is a rate constant (time−1);3. Time for diffusion inside the catalyst

tinternal diffusion =

where ε is the porosity of the catalyst, R is the catalyst radius, and De

is the effective diffusion coefficient inside the catalyst;4. Time for heat transfer is

tinternal heat transfer = =

where ρs is the catalyst density, Cs is the catalyst heat capacity perunit mass, ke is the effective thermal conductivity of the catalyst, and

ρsCsR2

ke

R2

α

εR2

De

1k

φAL

QLu

∂fi∂yj

dyidt

maxi |Re (λi)|mini |Re (λi)|

dydt

60137

12137

75137

200137

300137

300137

1225

325

1625

3625

4825

611

211

911

1811

23

13

43

α is the thermal diffusivity. For example, in the model of a catalyticconverter for an automobile [Ferguson, N. B., and B. A. Finlayson,AIChE J. 20:539–550 (1974)], the time constants for internal diffu-sion was 0.3 seconds; internal heat transfer, 21 seconds; and deviceflow-through, 0.003 seconds. The device flow-through is so fast thatit might as well be instantaneous. The stiffness is approximately7000. Implicit methods must be used to integrate the equations.Alternatively, a quasistate model can be developed [Ramirez, W. F.,Computational Methods for Process Simulations, 2d ed., Butterworth-Heinemann, Boston (1997)].

Differential-Algebraic Systems Sometimes models involveordinary differential equations subject to some algebraic constraints.For example, the equations governing one equilibrium stage (as in adistillation column) are

M = Vn + 1yn + 1 − Lnxn − Vnyn + Ln − 1xn − 1

xn − 1 − xn = En(xn − 1 − x*,n)

N

i = 1

xi = 1

where x and y are the mole fraction in the liquid and vapor, respec-tively; L and V are liquid and vapor flow rates, respectively; M is theholdup; and the superscript is the stage number. The efficiency is E,and the concentration in equilibrium with the vapor is x*. The firstequation is an ordinary differential equation for the mass of one com-ponent on the stage, while the third equation represents a constraintthat the mass fractions add to one. This is a differential-algebraic sys-tem of equations.

Differential-algebraic equations can be written in the general notation

F t, y, = 0

To solve the general problem using the backward Euler method,replace the nonlinear differential equation with the nonlinear alge-braic equation for one step.

F t, yn + 1, = 0

This equation must be solved for yn + 1. The Newton-Raphson methodcan be used, and if convergence is not achieved within a few itera-tions, the time step can be reduced and the step repeated. In actual-ity, the higher-order backward-difference Gear methods are used inDASSL [Ascher, U. M., and L. R. Petzold, Computer Methods forOrdinary Differential Equations and Differential-Algebraic Equa-tions, SIAM, Philadelphia (1998); and Brenan, K. E., S. L. Campbell,and L. R. Petzold, Numerical Solution of Initial-Value Problems inDifferential-Algebraic Equations, North Holland: Elsevier (1989)].

Differential-algebraic systems are more complicated than differentialsystems because the solution may not always be defined. Pontelideset al. [Comp. Chem. Eng. 12: 449–454 (1988)] introduced the termindex to identify the possible problems. The index is defined as theminimum number of times the equations need to be differentiatedwith respect to time to convert the system to a set of ordinary differ-ential equations. These higher derivatives may not exist, and theprocess places limits on which variables can be given initial values.Sometimes the initial values must be constrained by the algebraicequations. For a differential-algebraic system modeling a distillationtower, Pontelides et al. show that the index depends on the specifica-tion of pressure for the column. Byrne and Ponzi [Comp. Chem. Eng.12: 377–382 (1988); also Chan, T. F. C., and H. B. Keller, SIAM J. Sci.Stat. Comput. 3: 173–194 (1982)] also list several chemical engineer-ing examples of differential-algebraic systems and solve one involvingtwo-phase flow.

Computer Software Efficient computer packages are availablefor solving ordinary differential equations as initial value problems.The packages are widely available and good enough that most chemi-cal engineers use them and do not write their own. Here we discussthree of them: RKF45, LSODE, and EPISODE. In each of the pack-ages, the user specifies the differential equation to be solved and a

yn + 1 − yn

∆t

dydt

dxn

dt

3-50 MATHEMATICS

Page 54: 03 mathematics

desired error criterion. The package then integrates in time andadjusts the step size to achieve the error criterion within the limita-tions imposed by stability.

A popular explicit, Runge-Kutta package is RKF45. Notice therethat an estimate of the truncation error at each step is available. Thenthe step size can be reduced until this estimate is below the user-specified tolerance. The method is thus automatic, and the user isassured of the results. Note, however, that the tolerance is set on thelocal truncation error, namely from one step to another, whereas theuser is usually interested in the global truncation error, or the errorafter several steps. The global error is generally made smaller by mak-ing the tolerance smaller, but the absolute accuracy is not the same asthe tolerance. If the problem is stiff, then very small step sizes areused; the computation becomes very lengthy. The RKF45 code dis-covers this and returns control to the user with a message indicatingthe problem is too hard to solve with RKF45.

A popular implicit package is LSODE, a version of Gear’s method(Gear, ibid.) written by Alan Hindmarsh at Lawrence Livermore Lab-oratory. In this package, the user specifies the differential equation tobe solved and the tolerance desired. Now the method is implicit andtherefore stable for any step size. The accuracy may not be acceptable,however, and sets of nonlinear equations must be solved. Thus, in prac-tice the step size is limited but not nearly so much as in the Runge-Kutta methods. In these packages, both the step size and order of themethod are adjusted by the package. Suppose we are calculating witha kth order method. The truncation error is determined by the (k + 1)thorder derivative. This is estimated using difference formulas and thevalues of the right-hand sides at previous times. An estimate is alsomade for the kth and (k + 2)th derivative. Then it is possible to estimatethe error in a (k − 1)th order method, a kth order method, and a (k + 1)thorder method. Furthermore, the step size needed to satisfy the toler-ance with each of these methods can be determined. Then we canchoose the method and step size for the next step that achieves thebiggest step, with appropriate adjustments due to the different workrequired for each order. The package generally starts with a very smallstep size and a first-order method, the backward Euler method. Thenit integrates along, adjusting the order up (and later down) dependingon the error estimates. The user is thus assured that the local trunca-tion error meets the tolerance. There is a further difficulty, since theset of nonlinear equations must be solved. Usually a good guess of thesolution is available, since the solution is evolving in time and past his-tory can be extrapolated. Thus, the Newton-Raphson method will usu-ally converge. The package protects itself, though, by only doing a fewiterations. If convergence is not reached within this many iterations,then the step size is reduced and the calculation is redone for that timestep. The convergence theorem for the Newton-Raphson method(p. 3-50) indicates that the method will converge if the step size is smallenough. Thus the method is guaranteed to work. Further economiesare possible. The Jacobian needed in the Newton-Raphson methodcan be fixed over several time steps. Then, if the iteration does not con-verge, the Jacobian can be reevaluated at the current time-step. If theiteration still does not converge, then the step-size is reduced and anew Jacobian is evaluated. Also the successive substitution method canbe used, which is even faster, except that it may not converge. How-ever, it, too, will converge if the time step is small enough.

The Runge-Kutta methods give extremely good accuracy, especiallywhen the step size is kept small for stability reasons. If the problem isstiff, though, backward difference implicit methods must be used. Manychemical reactor problems are stiff, necessitating the use of implicitmethods. In the MATLAB suite of ODE solvers, the ode45 uses a revi-sion of the RKF45 program, while the ode15s program uses an improvedbackward difference method. Shampine and Reichelt [SIAM J. Sci.Comp. 18:1–22 (1997)] give details of the programs in MATLAB. For-tunately, many packages are available. On the NIST web pagehttp://gams.nist.gov/, choose “problem decision tree” and then “differen-tial and integral equations” to find packages that can be downloaded. Onthe Netlib web site http://www.netlib.org/, choose “ode” to find packagesthat can be downloaded. Using Microsoft Excel to solve ordinary differ-ential equations is cumbersome, except for the simplest problems.

Stability, Bifurcations, Limit Cycles Some aspects of this sub-ject involve the solution of nonlinear equations; other aspects involve

the integration of ordinary differential equations; applications includechaos and fractals as well as unusual operation of some chemical engi-neering equipment. Kubicek, M., and M. Marek, ComputationalMethods in Bifurcation Theory and Dissipative Structures, Springer-Verlag, Berlin (1983), give an excellent introduction to the subject andthe details needed to apply the methods. Chan, T. F. C., and H. B.Keller, SIAM J. Sci. Stat. Comput. 3:173–194 (1982), give more detailsof the algorithms. A concise survey with some chemical engineeringexamples is given in Doherty, M. F., and J. M. Ottino, Chem. Eng. Sci.43:139–183 (1988). Bifurcation results are closely connected with sta-bility of the steady states, which is essentially a transient phenomenon.

Sensitivity Analysis When solving differential equations, it isfrequently necessary to know the solution as well as the sensitivity ofthe solution to the value of a parameter. Such information is usefulwhen doing parameter estimation (to find the best set of parametersfor a model) and for deciding if a parameter needs to be measuredaccurately. See Finlayson, et al. (2006).

Molecular Dynamics Special integration methods have beendeveloped for molecular dynamics calculations due to the structure of theequations. A very large number of equations are to be integrated, with thefollowing form based on molecular interactions between molecules.

mi dd

2

tr2

i = Fi (r) Fi(r) = −∇V

The symbol mi is the mass of the ith particle, ri is the position of the ithparticle, Fi is the force acting on the ith particle, and V is the potentialenergy that depends upon the location of all the particles (but not theirvelocities). Since the major part of the calculation lies in the evaluationof the forces, or potentials, a method must be used that minimizes thenumber of times the forces are calculated to move from one time toanother time. Rewrite this equation in the form of an acceleration as

dd

2

tr2

i =

m1

i

Fi(r) ai

In the Verlet method, this equation is written by using central finitedifferences (see “Interpolation and Finite Differences”). Note thatthe accelerations do not depend upon the velocities.

ri(t + ∆t) = 2ri (t) − ri (t − ∆t) + ai(t)∆t2

The calculations are straightforward, and no explicit velocity isneeded. The storage requirement is modest, and the precision is mod-est (it is a second-order method). Note that one must start the calcu-lation with values of r at times t and t − ∆t.

In the Verlet velocity method, an equation is written for the veloc-ity, too.

ddV

ti

= ai

The trapezoid rule [see “Numerical Integration (Quadrature)”] isapplied to obtain

vi(t + ∆t) = vi (t) + 12

[ai (t) + ai(t + ∆t)]∆t

The position of the particles is expanded in a Taylor series.

ri(t + ∆t) = ri (t) + vi∆t + 12

ai (t)∆t2

Beginning with values of r and v at time 0, one calculates the newpositions and then the new velocities. This method is second-order in∆t, too. For additional details, see Allen, M. P., and D. J. Tildesley,Computer Simulation of Liquids, Clarendon Press, Oxford (1989);Frenkel, D., and B. Smit, Understanding Molecular Simulation, Aca-demic Press (2002); Haile, J. M., Molecular Dynamics Simulation,Wiley (1992); Leach, A. R., Molecular Modelling: Principles andApplications, Prentice-Hall (2001); Schlick, T., Molecular Modelingand Simulations, Springer, New York (2002).

ORDINARY DIFFERENTIAL EQUATIONS-BOUNDARYVALUE PROBLEMS

Diffusion problems in one dimension lead to boundary value prob-lems. The boundary conditions are applied at two different spatial

NUMERICAL ANALYSIS AND APPROXIMATE METHODS 3-51

Page 55: 03 mathematics

locations: at one side the concentration may be fixed and at the otherside the flux may be fixed. Because the conditions are specified at twodifferent locations, the problems are not initial value in character. It isnot possible to begin at one position and integrate directly because atleast one of the conditions is specified somewhere else and there arenot enough conditions to begin the calculation. Thus, methods havebeen developed especially for boundary value problems.

Boundary value methods provide a description of the solutioneither by providing values at specific locations or by an expansion in aseries of functions. Thus, the key issues are the method of represent-ing the solution, the number of points or terms in the series, and howthe approximation converges to the exact answer, i.e., how the errorchanges with the number of points or number of terms in the series.These issues are discussed for each of the methods: finite difference,orthogonal collocation, and Galerkin finite element methods.

Finite Difference Method To apply the finite differencemethod, we first spread grid points through the domain. Figure 3-48shows a uniform mesh of n points (nonuniform meshes are possible,too). The unknown, here c(x), at a grid point xi is assigned the symbolci = c(xi). The finite difference method can be derived easily by usinga Taylor expansion of the solution about this point. Expressions for thederivatives are:

i= −

i+ ⋅⋅⋅ ,

i= +

i+ ⋅ ⋅ ⋅

i= −

i+ ⋅ ⋅ ⋅

The truncation error in the first two expressions is proportional to ∆x,and the methods are said to be first-order. The truncation error in thethird expression is proportional to ∆x2, and the method is said to besecond-order. Usually the last equation is used to insure the best accu-racy. The finite difference representation of the second derivative is:

i= −

i+ ⋅ ⋅ ⋅

The truncation error is proportional to ∆x2. To solve a differentialequation, it is evaluated at a point i and then these expressions areinserted for the derivatives.

Example Consider the equation for convection, diffusion, and reaction ina tubular reactor.

− = Da R(c)

The finite difference representation is

− = Da R(ci)

This equation is written for i = 2 to n − 1, or the internal points. The equationswould then be coupled but would also involve the values of c1 and cn, as well.These are determined from the boundary conditions.

If the boundary condition involves a derivative, it is important that the deriv-atives be evaluated using points that exist. Three possibilities exist:

1=

1=

The third alternative is to add a false point, outside the domain, as c0 =c(x = −∆x).

1=

Since this equation introduces a new variable, c0, another equation is neededand is obtained by writing the finite difference equation for i = 1, too.

c2 − c0

2∆xdcdx

−3c1 + 4c2 − c3

2∆xdcdx

c2 − c1∆x

dcdx

ci + 1 − ci − 1

2∆xci + 1 − 2ci + ci − 1

∆x2

1Pe

dcdx

d2cdx2

1Pe

2∆x2

4!

d4cdx4

ci + 1 − 2ci + ci − 1

∆x2

d 2cdx2

∆x2

3!

d 3cdx3

ci + 1 − ci − 1

2∆xdcdx

∆x2

d 2cdx2

ci − ci − 1

∆xdcdx

∆x2

d 2cdx2

ci + 1 − ci

∆xdcdx

The sets of equations can be solved using the Newton-Raphson method. Thefirst form of the derivative gives a tridiagonal system of equations, and the stan-dard routines for solving tridiagonal equations suffice. For the other twooptions, some manipulation is necessary to put them into a tridiagonal form.

Frequently, the transport coefficients, such as diffusion coefficient or thermalconductivity, depend on the dependent variable, concentration, or temperature,respectively. Then the differential equation might look like

D(c) = 0

This could be written as two equations.

− = 0 J = −D(c)

Because the coefficient depends on c, the equations are more complicated. A finitedifference method can be written in terms of the fluxes at the midpoints, i + 1/2.

− = 0 Ji +1/2 = −D(ci + 1/2)

These are combined to give the complete equation.

= 0

This represents a set of nonlinear algebraic equations that can be solved with theNewton-Raphson method. However, in this case, a viable iterative strategy is toevaluate the transport coefficients at the last value and then solve

= 0

The advantage of this approach is that it is easier to program than a full Newton-Raphson method. If the transport coefficients do not vary radically, then themethod converges. If the method does not converge, then it may be necessaryto use the full Newton-Raphson method.

There are two common ways to evaluate the transport coefficient at the mid-point: Use the average value of the solution on each side to evaluate the diffusivity,or use the average value of the diffusivity on each side. Both methods have trunca-tion error ∆x2 (Finlayson, 2003). The spacing of the grid points need not be uni-form; see Finlayson (2003) and Finlayson et al. (2006) for the formulas in that case.

Example A reaction diffusion problem is solved with the finite differencemethod.

= φ2c, (0) = 0, c(1) = 1

The solution is derived for φ = 2. It is solved several times, first with two intervalsand three points (at x = 0, 0.5, 1), then with four intervals, then with eight intervals.The reason is that when an exact solution is not known, one must use several ∆x andsee that the solution converges as ∆x approaches zero. With two intervals, the equa-tions are as follows. The points are x1 = 0, x2 = 0.5, and x3 = 1.0; and the solution atthose points are c1, c2, and c3, respectively. A false boundary is used at x0 = −0.5.

= 0, − φ2c1 = 0, − φ2c2 = 0, c3 = 1

The solution is c1 = 0.2857, c2 = 0.4286, and c3 = 1.0. The problem is solved againwith four and then eight intervals. The value of concentration at x = 0 takes thefollowing values for different ∆x. These values are extrapolated using theRichardson extrapolation technique to give c(0) = 0.265718. Using this value asthe best estimate of the exact solution, the errors in the solution are tabulatedversus ∆x. Clearly the errors go as ∆x2 (decreasing by a factor of 4 when ∆xdecreases by a factor of 2), thus validating the solution. The exact solution is0.265802.

n − 1 ∆x c(0)

2 0.5 0.2857144 0.25 0.2710438 0.125 0.267131

n − 1 ∆x Error in c(0)

2 0.5 0.020004 0.25 0.005328 0.125 0.00141

Finite Difference Methods Solved with Spreadsheets Aconvenient way to solve the finite difference equations for simpleproblems is to use a computer spreadsheet. The equations for theproblem solved in the example can be cast into the following form

c1 =2c2

2 + φ2∆x2

c1 − 2c2 + c3

∆x2

c0 − 2c1 + c2

∆x2

c0 − c2

2∆x

dcdx

d 2cdx2

D(cki + 1/2) (ci + 1

k + 1 − cik + 1) − D(ck

i − 1/2) (cik + 1 − ci − 1

k + 1)

∆x2

D(ci + 1/2) (ci + 1 − ci) − D(ci − 1/2) (ci − ci − 1)

∆x2

ci + 1 − ci

∆xJi + 1/2 − Ji − 1/2

∆x

dcdx

dJdx

dcdx

ddx

3-52 MATHEMATICS

FIG. 3-48 Finite difference mesh; ∆x uniform.

Page 56: 03 mathematics

ci =

cn + 1 = 1

Let us solve the problem using 6 nodes, or 5 intervals. Then the con-nection between the cell in the spreadsheet and the nodal value isshown in Fig. 3-49. The following equations are placed into the vari-ous cells.

A1: = 2*B1/(2.+(phi*dx)**2)B1: = (A1 + C1)/(2.+(phi*dx)**2)F1: = 1.

The equation in cell B1 is copied into cells C1 though E1. Then turn onthe iteration scheme in the spreadsheet and watch the solution converge.Whether or not convergence is achieved can depend on how you write theequations, so some experimentation may be necessary. Theorems for con-vergence of the successive substitution method are useful in this regard.

Orthogonal Collocation The orthogonal collocation method hasfound widespread application in chemical engineering, particularly forchemical reaction engineering. In the collocation method, the dependentvariable is expanded in a series of orthogonal polynomials. See “Interpo-lation and Finite Differences: Lagrange Interpolation Formulas.”

c(x) = N

m = 0amPm(x)

The differential equation is evaluated at certain collocation points.The collocation points are the roots to an orthogonal polynomial, asfirst used by Lanczos [Lanczos, C., J. Math. Phys. 17:123–199 (1938);and Lanczos, C., Applied Analysis, Prentice-Hall (1956)]. A majorimprovement was proposed by Villadsen and Stewart [Villadsen, J. V.,and W. E. Stewart, Chem. Eng. Sci. 22:1483–1501 (1967)], who pro-posed that the entire solution process be done in terms of the solutionat the collocation points rather than the coefficients in the expansion.This method is especially useful for reaction-diffusion problems thatfrequently arise when modeling chemical reactors. It is highly effi-cient when the solution is smooth, but the finite difference method ispreferred when the solution changes steeply in some region of space.The error decreases very rapidly as N is increased since it is propor-tional to [1/(1 − N)]N − 1. See Finlayson (2003) and Villadsen, J. V., andM. Michelsen, Solution of Differential Equation Models by Polyno-mial Approximations, Prentice-Hall (1978).

Galerkin Finite Element Method In the finite elementmethod, the domain is divided into elements and an expansion ismade for the solution on each finite element. In the Galerkin finiteelement method an additional idea is introduced: the Galerkinmethod is used to solve the equation. The Galerkin method isexplained before the finite element basis set is introduced, using theequations for reaction and diffusion in a porous catalyst pellet.

= φ2R(c)

(0) = 0, c(1) = 1

The unknown solution is expanded in a series of known functionsbi(x) with unknown coefficients ai.

c(x) = NT

i = 1

aibi(x)

The trial solution is substituted into the differential equation to obtain

dcdx

d 2cdx2

ci + 1 + ci − 12 + φ2∆x2

the residual.

Residual = NT

i = 1

ai − φ2RNT

i = 1

aibi(x)The residual is then made orthogonal to the set of basis functions.

1

0bj(x)

NT

i = 1

ai − φ2RNT

i = 1

aibi(x) dx = 0 j = 1, . . . , NT

This is the process that makes the method a Galerkin method. Thebasis for the orthogonality condition is that a function that is madeorthogonal to each member of a complete set is then zero. The resid-ual is being made orthogonal, and if the basis functions are completeand you use infinitely many of them, then the residual is zero. Oncethe residual is zero, the problem is solved.

This equation is integrated by parts to give the following equation

−NT

i = 1

1

0dxai = φ21

0bj(x)R

NT

i = 1

aibi(x) dx

j = 1, . . . , NT − 1 (3-80)

This equation defines the Galerkin method and a solution that satisfiesthis equation (for all j= 1, . . . , ∞) is called a weak solution. For an approx-imate solution, the equation is written once for each member of the trialfunction, j = 1, . . . , NT − 1, and the boundary condition is applied.

NT

i = 1

aibi(1) = cB

The Galerkin finite element method results when the Galerkinmethod is combined with a finite element trial function. The domainis divided into elements separated by nodes, as in the finite differencemethod. The solution is approximated by a linear (or sometimes qua-dratic) function of position within the element. These approximationsare substituted into Eq. (3-80) to provide the Galerkin finite elementequations. For example, with the grid shown in Fig. 3-48, a linearinterpolation would be used between points xi and xi+1.

c(x) = ci(1− u) + ci+1u u xx

i+1

−−x

xi

i

A finite element method based on these functions would have an errorproportional to ∆x2. The finite element representations for the firstderivative and second derivative are the same as in the finite differ-ence method, but this is not true for other functions or derivatives.With quadratic finite elements, take the region from xi-1 and xi+1 as oneelement. Then the interpolation would be

c(x) ci1N1(u)ciN2(u)ci1N3(u)

N1(u) 2(u1)u N2(u) 4u(1u)

N3 (u) 2uu − A finite element method based on these functions would have an errorproportional to ∆x3. Thus, it would converge faster than one based onlinear interpolation. A variety of other finite element functions can beused as well, including B splines (see “Interpolation and Finite Dif-ferences: Spline Functions”).

Adaptive Meshes In many two-point boundary value problems,the difficulty in the problem is the formation of a boundary layer region,or a region in which the solution changes very dramatically. In such cases,it is prudent to use small mesh spacing there, either with the finite differ-ence method or the finite element method. If the region is known a pri-ori, small mesh spacings can be assumed at the boundary layer. If theregion is not known, though, other techniques must be used. These tech-niques are known as adaptive mesh techniques. The mesh size is madesmall where some property of the solution is large. For example, if thetruncation error of the method is nth order, then the nth-order derivativeof the solution is evaluated and a small mesh is used where it is large.Alternatively, the residual (the differential equation with the numerical

12

12

dbidx

dbjdx

d 2bidx2

d 2bidx2

NUMERICAL ANALYSIS AND APPROXIMATE METHODS 3-53

FIG. 3-49 Finite difference method using spreadsheets.

Page 57: 03 mathematics

solution substituted into it) can be used as a criterion. It is also possible todefine the error that is expected from a method one order higher and oneorder lower. Then a decision about whether to increase or decrease theorder of the method can be made, taking into account the relative work ofthe different orders. This provides a method of adjusting both the meshspacing (∆x, or sometimes called h) and the degree of polynomial (p).Such methods are called h-p methods. Many finite element programshave the capability to do this mesh refinement automatically.

Singular Problems and Infinite Domains If the solution beingsought has a singularity, it may be difficult to find a good numerical solu-tion. Sometimes even the location of the singularity may not be known.One method of solving such problems is to refine the mesh near the sin-gularity, relying on the better approximation due to a smaller ∆x. Anotherapproach is to incorporate the singular trial function into the approxima-tion. Thus, if the solution approaches f(x) as x goes to zero and f(x)becomes infinite, one may define a new variable u(x) = y(x) − f(x) andderive an equation for u. The differential equation is more complicated,but the solution is better near the singularity. Press et al. (1986).

Sometimes the domain is semi-infinite, as in boundary layer flow. Thedomain can be transformed from the x domain (0 − ∞) to the η domain(1 − 0) using the transformation η = exp (−x). Another approach is to usea variable mesh, perhaps with the same transformation. For example, useη = exp (−βx) and a constant mesh size inη; the value of β is found exper-imentally. Still another approach is to solve on a finite mesh in which thelast point is far enough away that its location does not influence the solu-tion. A location that is far enough away must be found by trial and error.

Packages to solve boundary value problems are available on the Inter-net. On the NIST web page http://gams.nist.gov/, choose “problem deci-sion tree” and then “differential and integral equations” and then“ordinary differential equations” and “multipoint boundary value prob-lems.” On the Netlib web site http://www.netlib.org/, search on “boundaryvalue problem.” Any spreadsheet that has an iteration capability can beused with the finite difference method. Some packages for partial differ-ential equations also have a capability for solving one-dimensional bound-ary value problems [e.g. Comsol Multiphysics (formerly FEMLAB)].

NUMERICAL SOLUTION OF INTEGRAL EQUATIONS

In this subsection is considered a method of solving numerically theFredholm integral equation of the second kind:

u(x) = f(x) + λ b

ak(x, t)u(t) dt for u(x) (3-81)

The method discussed arises because a definite integral can be closelyapproximated by any of several numerical integration formulas (eachof which arises by approximating the function by some polynomialover an interval). Thus the definite integral in Eq. (3-81) can bereplaced by an integration formula, and Eq. (3-81) may be written

u(x) = f(x) + λ(b − a) n

i = 1

cik(x, ti)u(ti) (3-82)

where t1, . . . , tn are points of subdivision of the t axis, a ≤ t ≤ b, and thec’s are coefficients whose values depend upon the type of numericalintegration formula used. Now Eq. (3-82) must hold for all values of x, a ≤ x ≤ b; so it must hold for x = t1, x = t2, . . . , x = tn. Substituting forx successively t1, t2, . . . , tn and setting u(ti) = ui, f(ti) = fi, we get n lin-ear algebraic equations for the n unknowns u1, . . . , un. That is,

ui = fi + λ(b − a)[c1k(ti, t1)u1 + c2k(ti, t2)u2

+ ⋅⋅⋅ + cnk(ti, tn)un] i = 1, 2, . . . , n

These uj may be solved for by the methods under “Numerical Solutionof Linear Equations and Associated Problems” and substituted intoEq. (3-82) to yield an approximate solution for Eq. (3-81).

Because of the work involved in solving large systems of simultane-ous linear equations it is desirable that only a small number of u’s becomputed. Thus the gaussian integration formulas are useful becauseof the economy they offer.

Solutions for Volterra equations are done in a similar fashion, exceptthat the solution can proceed point by point, or in small groups ofpoints depending on the quadrature scheme. See Linz, P., Analyticaland Numerical Methods for Volterra Equations, SIAM, Philadelphia(1985). There are methods that are analogous to the usual methods for

integrating differential equations (Runge-Kutta, predictor-corrector,Adams methods, etc.). Explicit methods are fast and efficient until thetime step is very small to meet the stability requirements. Thenimplicit methods are used, even though sets of simultaneous algebraicequations must be solved. The major part of the calculation is the eval-uation of integrals, however, so that the added time to solve the alge-braic equations is not excessive. Thus, implicit methods tend to bepreferred. Volterra equations of the first kind are not well posed, andsmall errors in the solution can have disastrous consequences. Theboundary element method uses Green’s functions and integral equa-tions to solve differential equations. See Brebbia, C. A., and J.Dominguez, Boundary Elements—An Introductory Course, 2d ed.,Computational Mechanics Publications, Southhampton (1992); andMackerle, J., and C. A. Brebbia (eds.), Boundary Element ReferenceBook, Springer-Verlag (1988).

MONTE CARLO SIMULATIONS

Some physical problems, such as those involving interaction of mole-cules, are usually formulated as integral equations. Monte Carlomethods are especially well-suited to their solution. This section can-not give a comprehensive treatment of such methods, but their use incalculating the value of an integral will be illustrated. Suppose we wishto calculate the integral

G = Ω0

g(x) f(x) dx

where the distribution function f(x) satisfies:

f(x) ≥ 0, Ω0

f(x) dx = 1

The distribution function f(x) can be taken as constant; for example,1/Ω0. We choose variables x1, x2, . . . , xN randomly from f(x) and formthe arithmetic mean

GN = i

g(xi)

The quantity GN is an estimation of G, and the fundamental theoremof Monte Carlo guarantees that the expected value of GN is G, if Gexists [Kalos, M. H., and P. A. Whitlock, Monte Carlo Methods, vol. 1,Wiley, New York (1986)]. The error in the calculation is given by

ε =

where σ 12 is calculated from

σ 12 =

Ω0

g2(x) f(x) dx − G2

Thus the number of terms needed to achieve a specified accuracy %can be calculated once an estimate of σ 1

2 is known.

N =

Various methods, such as influence sampling, can be used to reducethe number of calculations needed. See also Lapeyre, B., Introductionto Monte-Carlo Methods for Transport and Diffusion Equations,Oxford University Press (2003), and Liu, J. S., Monte Carlo Strategiesin Scientific Computing, Springer (2001). Some computer programsare available that perform simple Monte Carlo calculations usingMicrosoft Excel.

NUMERICAL SOLUTION OF PARTIAL DIFFERENTIAL EQUATIONS

The numerical methods for partial differential equations can be classi-fied according to the type of equation (see “Partial Differential Equa-tions”): parabolic, elliptic, and hyperbolic. This section uses the finitedifference method to illustrate the ideas, and these results can be pro-grammed for simple problems. For more complicated problems,though, it is common to rely on computer packages. Thus, some dis-cussion is given to the issues that arise when using computer packages.

Parabolic Equations in One Dimension By combining the tech-niques applied to initial value problems and boundary value problems it

σ12

ε

σ1N1/2

1N

3-54 MATHEMATICS

Page 58: 03 mathematics

is possible to easily solve parabolic equations in one dimension. Themethod is often called the method of lines. It is illustrated here using thefinite difference method, but the Galerkin finite element method andthe orthogonal collocation method can also be combined with initialvalue methods in similar ways. The analysis is done by example. Thefinite volume method is described under Hyperbolic Equations.

Example Consider the diffusion equation, with boundary and initial con-ditions.

= D

c(x, 0) = 0c(0, t) = 1, c(1, t) = 0

We denote by ci the value of c(xi, t) at any time. Thus, ci is a function oftime, and differential equations in ci are ordinary differential equa-tions. By evaluating the diffusion equation at the ith node and replac-ing the derivative with a finite difference equation, the followingworking equation is derived for each node i, i = 2, . . . , n (see Fig. 3-50).

= D

This can be written in the general form of a set of ordinary differentialequations by defining the matrix AA.

= AAc

This set of ordinary differential equations can be solved using any ofthe standard methods, and the stability of the integration of theseequations is governed by the largest eigenvalue of AA. When Euler’smethod is used to integrate in time, the equations become

= D

where c in = c(xi, tn). Notice that if the solution is known at every point

at one time n, then it is a straightforward calculation to find the solu-tion at every point at the new time n+1.

If Euler’s method is used for integration, the time step is limited by

∆t ≤

whereas if the Runge-Kutta-Feldberg method is used, the 2 in thenumerator is replaced by 3.0. The largest eigenvalue of AA is boundedby Gerschgorin’s Theorem.

|λ|max ≤ max2 < j < n n

i = 2

|AA ji| =

This gives the well-known stability limit

∆t ≤

The smallest eigenvalue is independent of ∆x (it is Dπ 2/L2) so that theratio of largest to smallest eigenvalue is proportional to 1/∆x2. Thus,the problem becomes stiff as ∆x approaches zero. See Eq. (3-79).

The effect of the increased stiffness is that a smaller and smaller timestep (∆t) must be taken as the mesh is refined (∆x2 —> 0). At the sametime, the number of points is increasing, so that the computation becomesvery lengthy. Implicit methods are used to overcome this problem.

Write a finite difference form for the time derivative and averagethe right-hand sides, evaluated at the old and new time.

= D(1 − θ) + Dθ

Now the equations are of the form

− ci + 1n + 1 + 1 + 2 c i

n + 1 − ci − 1n + 1

= c in + (cn

i + 1 − 2c in + cn

i − 1)

and require solving a set of simultaneous equations, which have atridiagonal structure. Using θ = 0 gives the Euler method (as above),

D∆t(1 − θ)

∆x2

D∆tθ∆x2

D∆tθ∆x2

D∆tθ∆x2

ci + 1n + 1 − 2ci

n + 1 + ci − 1n + 1

∆x2

cni + 1 − 2c i

n + cni − 1

∆x2

cin + 1 − ci

n

∆t

12

D∆x2

4D∆x2

2|λ|max

cni + 1 − 2c i

n + cni − 1

∆x2

cin + 1 − ci

n

∆t

dcdt

ci + 1 − 2ci + ci − 1

∆x2

dcidt

∂2c∂x2

∂c∂t

θ = 0.5 gives the Crank-Nicolson method, and θ = 1 gives the back-ward Euler method. The Crank-Nicolson method is also the same asapplying the trapezoid rule to do the integration. The stability limit isgiven by

≤ 0.51 − 2θ

D∆t∆x2

NUMERICAL ANALYSIS AND APPROXIMATE METHODS 3-55

FIG. 3-50 Computational molecules. h = ∆x = ∆y.

Page 59: 03 mathematics

The price of using implicit methods is that one now has a system ofequations to solve at each time step, and the solution methods aremore complicated (particularly for nonlinear problems) than thestraightforward explicit methods. Phenomena that happen quickly canalso be obliterated or smoothed over by using a large time step, soimplicit methods are not suitable in all cases. The engineer mustdecide if he or she wants to track those fast phenomena, and choosean appropriate method that handles the time scales that are importantin the problem.

Other methods can be used in space, such as the finite elementmethod, the orthogonal collocation method, or the method of orthog-onal collocation on finite elements. One simply combines the meth-ods for ordinary differential equations (see “Ordinary DifferentialEquations—Boundary Value Problems”) with the methods for initial-value problems (see “Numerical Solution of Ordinary DifferentialEquations as Initial Value Problems”). Fast Fourier transforms canalso be used on regular grids (see “Fast Fourier Transform”).

Elliptic Equations Elliptic equations can be solved with bothfinite difference and finite element methods. One-dimensional ellip-tic problems are two-point boundary value problems. Two- and three-dimensional elliptic problems are often solved with iterative methodswhen the finite difference method is used and direct methods whenthe finite element method is used. So there are two aspects to con-sider: how the equations are discretized to form sets of algebraic equa-tions and how the algebraic equations are then solved.

The prototype elliptic problem is steady-state heat conduction ordiffusion,

k + = Q

possibly with a heat generation term per unit volume, Q. The bound-ary conditions taken here are T = f(x, y) on the boundary (S) with f aknown function. Illustrations are given for constant thermal conduc-tivity k while Q is a known function of position. The finite differenceformulation is given using the following nomenclature:

Ti, j = T(i∆x, j∆y)

The finite difference formulation is then (see Fig. 3-50)

+ = Qi, j (3-83)

Ti, j = f(xi, yj) on S

If the boundary is parallel to a coordinate axis any derivative is evalu-ated as in the section on boundary value problems, using either a one-sided, centered difference or a false boundary. If the boundary is moreirregular and not parallel to a coordinate line then more complicatedexpressions are needed and the finite element method may be thebetter method.

Equation (3-83) provides a set of linear equations that must besolved. These equations and their boundary conditions may be writtenin matrix form as

At f

where t is the set of temperatures at all the points, f is the set of heatgeneration terms at all points, and A is formed from the coefficients ofTij in Eq. (3-83). The solution can be obtained simply by solving theset of linear equations. For three-dimensional problems, the matrix Ais sparse, and iterative methods are used. These include Gauss-Seidel,alternating direction, overrelaxation methods, conjugate gradient, andmultigrid methods. In Gauss-Seidel methods, one writes the equationfor Tij in terms of the other temperatures and cycles through all thepoints over and over. In the alternating direction method, one solvesalong one line (that is, x = constant), keeping the side values fixed, andthen repeats this for all lines, and then repeats the process. Multigridmethods solve the problem on successively refined grids, which hasadvantages for both convergence and error estimation. Conjugate gra-dient methods frequently use a preconditioned matrix. The equationis multiplied by another matrix, which is chosen so that the resultingproblem is easier to solve than the original one. Finding such matrices

Ti, j + 1 − 2Ti, j + Ti, j − 1

∆y2

Ti + 1, j − 2Ti, j + Ti − 1, j

∆x2

∂2T∂y2

∂2T∂x2

is an art, but it can speed convergence. The generalized minimalresidual method is described in http://mathworld.wolfram.com/GeneralizedMinimalResidualMethod.html. Additional resources canbe found at http://www.netlib.org/linalg/html_templates/Templates.html. When the problem is nonlinear, the iterative methods may notconverge, or the mesh may have to be refined before they converge,so some experimentation is sometimes necessary.

Spreadsheets can be used to solve two-dimensional problems on rect-angular grids. The equation for Tij is obtained by rearranging Eq. (3-83).

21+

yx2

2 Ti, j = Ti+ 1, j + Ti1, j +

yx

2

2

(Ti, j+ 1 + T1, j−1) − x2

This equation is inserted into a cell and copied throughout the spacerepresented by all the cells; when the iteration feature is turned on,the solution is obtained.

The Galerkin finite element method (FEM) is useful for solving ellip-tic problems and is particularly effective when the domain or geometryis irregular. As an example, cover the domain with triangles and define atrial function on each triangle. The trial function takes the value 1.0 atone corner and 0.0 at the other corners and is linear in between. See Fig.3-51. These trial functions on each triangle are pieced together to give atrial function on the whole domain. General treatments of the finite ele-ment method are available (see references). The steps in the solutionmethod are similar to those described for boundary value problems,except now the problems are much bigger so that the numerical analysismust be done very carefully to be efficient. Most engineers, though, justuse a finite element program without generating it. There are threemajor caveats that must be addressed. The first one is that the solution isdependent on the mesh laid down, and the only way to assess the accu-racy of the solution is to solve the problem with a more refined mesh.The second concern is that the solution obeys the shape of the trial func-tion inside the element. Thus, if linear functions are used on triangles, athree-dimensional view of the solution, plotting the solution versus x andy, consists of a series of triangular planes joined together at the edges, asin a geodesic dome. The third caveat is that the Galerkin finite elementmethod is applied to both the differential equations and the boundaryconditions. Computer programs are usually quite general and may allowthe user to specify boundary conditions that are not realistic. Also, nat-ural boundary conditions are satisfied if no other boundary condition(ones involving derivatives) is set at a node. Thus, the user of finite ele-ment codes must be very clear what boundary conditions and differen-tial equations are built into the computer code. When the problem isnonlinear, the Newton-Raphson method is used to iterate from an initialguess. Nonlinear problems lead to complicated integrals to evaluate, andthey are usually evaluated using Gaussian quadrature.

One nice feature of the finite element method is the use of naturalboundary conditions. It may be possible to solve the problem on adomain that is shorter than needed to reach some limiting condition(such as at an outflow boundary). The externally applied flux is stillapplied at the shorter domain, and the solution inside the truncateddomain is still valid. Examples are given in Chang, M. W., and B. A.Finlayson, Int. J. Num. Methods Eng. 15, 935–942 (1980), and Finla-son, B. A. (1992). The effect of this is to allow solutions in domains thatare smaller, thus saving computation time and permitting the solutionin semi-infinite domains.

The trial functions in the finite element method are not limited tolinear ones. Quadratic functions and even higher-order functions arefrequently used. The same considerations hold as for boundary valueproblems: The higher-order trial functions converge faster, but requiremore work. It is possible to refine both the mesh h and the power ofpolynomial in the trial function p in an hp method. Some problemshave constraints on some of the variables. For flow problems, the pres-sure must usually be approximated by using a trial function that is oneorder lower than the polynomial used to approximate the velocity.

Hyperbolic Equations The most common situation yieldinghyperbolic equations involves unsteady phenomena with convection.Two typical equations are the convective diffusive equation

+ u = D

and the chromatography equation.

∂2c∂x2

∂c∂x

∂c∂t

Qi,jk

3-56 MATHEMATICS

Page 60: 03 mathematics

(See “Partial Differential Equations.”) If the diffusion coefficient iszero, the convective diffusion equation is hyperbolic. If D is small, thephenomenon may be essentially hyperbolic, even though the equa-tions are parabolic. Thus the numerical methods for hyperbolic equa-tions may be useful even for parabolic equations.

Equations for several methods are given here, as taken from thebook by Finlayson [Finlayson, B. A. (1992)]. If the convective term istreated with a centered difference expression, the solution exhibitsoscillations from node to node, and these only go away if a very finegrid is used. The simplest way to avoid the oscillations with a hyper-bolic equation is to use upstream derivatives. If the flow is from left toright, this would give

+ u = D

The effect of using upstream derivatives is to add artificial ornumerical diffusion to the model. This can be ascertained by rear-ranging the finite difference form of the convective diffusionequation

+ u = D + ci + 1 − 2ci + ci − 1

∆x2

u∆x

2ci + 1 − ci − 1

2∆xdcidt

ci + 1 − 2ci + ci − 1

∆x2

ci − ci − 1

∆xdcidt

Thus the diffusion coefficient has been changed from

D to D +

Alternatively, the diffusion coefficient has been multiplied by the factor

D& = D1 + u2

Dx

= D1+ Pe

2cell

where Pecell = uDx =

uDL

Lx = Pe

Lx is called the cell Peclet number.

When the diffusion coefficient is very small (or diffusion is slow com-pared with convection), the Peclet number will be large. In that case,extraneous diffusion will be included in the solution unless the meshsize (denoted by ∆x) is small compared with the characteristic length ofthe problem. To avoid this problem (by keeping the factor small), veryfine meshes must be used, and the smaller the diffusion coefficient, thesmaller the required mesh size.

A variety of other methods are used to obtain a good solution with-out using extremely fine meshes. The flux correction methods keeptrack of the flux of material into and out of a cell (from one node toanother) and put limits on the flux to make sure that no more material

u∆x

2

NUMERICAL ANALYSIS AND APPROXIMATE METHODS 3-57

1

0.5

00.7

0.60.5

0.40.3

0.20.1

0 00.2

0.40.6

0.81

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

00.7

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

00.7

0.60.5

0.40.3

0.20.1

0

0.60.5

0.40.3

0.20.1

0

00.2

0.40.6

0.81

00.2

0.40.6

0.81

FIG. 3-51 Trial functions for Galerkin finite element method: linear polynomial on triangle.

Page 61: 03 mathematics

leaves the cell than is there originally plus the input amount. See Fin-layson, ibid., for many examples.

All the methods have a limit to the time step that is set by the con-vection term. Essentially, the time step should not be so big as to takethe material farther than it can go at its velocity. This is usuallyexpressed as a Courant number limitation.

Co u

xt

≤ 1

Some methods require a smaller limit, depending upon the amount ofdiffusion present (see Finlayson, ibid., Appendix).

In the finite element method, Petrov-Galerkin methods are used tominimize the unphysical oscillations. The Petrov-Galerkin methodessentially adds a small amount of diffusion in the flow direction tosmooth the unphysical oscillations. The amount of diffusion is usuallyproportional to ∆x so that it becomes negligible as the mesh size isreduced. The value of the Petrov-Galerkin method lies in being ableto obtain a smooth solution when the mesh size is large, so that thecomputation is feasible. This is not so crucial in one-dimensionalproblems, but it is essential in two- and three-dimensional problemsand purely hyperbolic problems.

Finite Volume Methods Finite volume methods are utilizedextensively in computational fluid dynamics. In this method, a massbalance is made over a cell, accounting for the change in what is in thecell, and the flow in and out. Figure 3-52 illustrates the geometry ofthe ith cell. A mass balance made on this cell (with area A perpendic-ular to the paper) is

Ax (cni

1 cni) tA(Jj1/2 Ji1/2)

where J is the flux due to convection and diffusion, positive in the +xdirection.

J uc D '

'

cx, Ji1/2 ui1/2ci1/2 D

ci

∆cxi1/2

The concentration at the edge of the cell is taken as

ci1/2 12

(ci ci1/2)

Rearrangement for the case when the velocity u is the same for allnodes gives

cn

i1

t

cni

u(ci

21

xc i1)

Dx2 (ci1 2ci ci1)

This is the same equation obtained by using the finite difference method.This isn’t always true, and the finite volume equations are easy to derive.In two and three dimensions, the mesh need not be rectangular, as longas it is possible to compute the velocity normal to an edge of the cell. Thefinite volume method is useful for applications involving filling, such asinjection molding, when only part of the cell is filled with fluid. Suchapplications do involve some approximations, since the interface is nottracked precisely, but they are useful engineering approximations.

Parabolic Equations in Two or Three Dimensions Computa-tions become much more lengthy when there are two or more spatialdimensions. For example, we may have the unsteady heat conductionequation

ρCp = k + − Q∂2T∂y2

∂2T∂x2

∂T∂t

Most engineers use computer packages to solve such problems.If there is both convection and diffusion in the problem, the sameconsiderations apply: A fine mesh is needed when the Peclet num-ber is large. The upstream weighting and Petrov-Galerkin methodscan be used, but it is important to apply the smoothing only in thedirection of flow, since smoothing in the direction transverse to theflow direction would be incorrect. Some transverse smoothing isunavoidable, but the engineer needs to be sure that the smoothingis just enough to allow a good solution without creating largeerrors.

Computer Software When you are choosing computer soft-ware to solve your problem, there are a number of important con-siderations. The first decision is whether to use an approximate,engineering flow model, developed from correlations, or to solvethe partial differential equations that govern the problem. Correla-tions are quick and easy to apply, but they may not be appropriateto your problem or give the needed detail. When you are using acomputer package to solve partial differential equations, the firsttask is always to generate a mesh covering the problem domain.This is not a trivial task, and special methods have been developedto permit importation of a geometry from a computer-aided design(CAD) program. Then the mesh must be created automatically. Ifthe boundary is irregular, the finite element method is especiallywell suited, although special embedding techniques can be used infinite difference methods (which are designed to be solved on rect-angular meshes). Another capability to consider is the ability totrack free surfaces that move during the computation. This phe-nomenon introduces the same complexity that occurs in problemswith a large Peclet number, with the added difficulty that the freesurface moves between mesh points and improper representationcan lead to unphysical oscillations. The method used to solve theequations is important, and both explicit and implicit methods (asdescribed above) can be used. Implicit methods may introduceunacceptable extra diffusion, so the engineer needs to examine thesolution carefully. The methods used to smooth unphysical oscilla-tions from node to node are also important, and the engineer needsto verify that the added diffusion or smoothing does not give inac-curate solutions. Since current-day problems are mostly nonlinear,convergence is always an issue since the problems are solved itera-tively. Robust programs provide several methods for convergence,each of which is best in some circumstance or other. It is wise tohave a program that includes many iterative methods. If the itera-tive solver is not very robust, the only recourse to solving a steady-state problem may be to integrate the time-dependent problem tosteady state. The solution time may be long, and the final result maybe further from convergence than would be the case if a robust iter-ative solver were used.

A variety of computer programs are available on the Internet,some free. First consider general-purpose programs. On the NISTweb page http://gams.nist.gov/, choose “problem decision tree”then “differential and integral equations” and then “partial differ-ential equations.” The programs are organized by type of problem(elliptic, parabolic, and hyperbolic) and by the number of spatialdimensions (one or more than one). On the Netlib web sitehttp://www.netlib.org/, search on “partial differential equation.”Lau (1994, 2004) provides many programs in C++ (also seehttp://www.nr.com/). The multiphysics program Comsol Multi-physics (formerly FEMLAB) also solves many standard equationsarising in mathematical physics.

Computational fluid dynamics (CFD) programs are more special-ized, and most have been designed to solve sets of equations that areappropriate to specific industries. They can then include approxima-tions and correlations for some features that would be difficult tosolve for directly. Four major packages widely used are Fluent(http://www.fluent.com/), CFX (now part of ANSYS), Comsol Multi-physics (formerly FEMLAB) (http://www.comsol.com/), and ANSYS(http://www.ansys.com/). Of these, Comsol Multiphysics is particu-larly useful because it has a convenient graphical-user interface,permits easy mesh generation and refinement (including adaptivemesh refinement), allows the user to add phenomena and equationseasily, permits solution by continuation methods (thus enhancing

3-58 MATHEMATICS

i –1st cell

ithcell

i−1

∆x

i−1/2 i i+1/2

FIG. 3-52 Nomenclature for finite volume method.

Page 62: 03 mathematics

convergence), and has extensive graphical output capabilities. Otherpackages are also available (see http://cfd-online.com/), and thesemay contain features and correlations specific to the engineer’sindustry. One important point to note is that for turbulent flow, allthe programs contain approximations, using the k-epsilon models ofturbulence, or large eddy simulations; the direct numerical simula-tion of turbulence is too slow to apply to very big problems, althoughit does give insight (independent of any approximations) that is use-ful for interpreting turbulent phenomena. Thus, the method used toinclude those turbulent correlations is important, and the methodalso may affect convergence or accuracy.

FAST FOURIER TRANSFORM

The Fourier transform and inverse transform are

Y(ω) = ∞

−∞y(t)eiωt dt

y(t) = ∞

−∞Y(ω)e d

Suppose a signal y(t) is sampled at equal intervals

yn = y(n∆), n = . . . , −2, −1, 0, 1, 2, . . .

∆ = sampling interval (e.g., time between samples)

The Nyquist critical frequency or critical angular frequency is

fc = , ωc =

If a function y(t) is bandwidth-limited to frequencies smaller than fc,such as

Y(ω) = 0 for ω > ωc

then the function is completely determined by its samples yn. Thus,the entire information content of a signal can be recorded by samplingat a rate ∆−1 = 2fc. If the function is not bandwidth-limited, then alias-ing occurs. Once a sample rate ∆ is chosen, information correspond-ing to frequencies greater than fc is simply aliased into that range. Theway to detect this in a Fourier transform is to see if the transformapproaches zero at fc; if not, aliasing has occurred, and a higher sam-pling rate is needed.

Next, suppose we have N samples, where N is even

yk = y(tk) tk = k∆ k = 0,1,2, . . . , N − 1

and the sampling interval is ∆. With only N values yk, it is not possi-ble to determine the complete Fourier transform Y(ω). We calculatethe value Y(ωn) at the discrete points

ωn = , n = − , . . . , 0, . . . ,

Yn = N − 1

k = 0

yke2πikn/N

Y(ωn) = ∆Yn

The discrete inverse Fourier transform is

yk = N − 1

n = 0

Yne−2πikn/N

The fast Fourier transform (FFT) is used to calculate the Fouriertransform as well as the inverse Fourier transform. A discrete Fouriertransform of length N can be written as the sum of two discreteFourier transforms, each of length N/2.

Yk = Yke + W kYk

o

Here Yk is the kth component of the Fourier transform of y, and Yke is

the kth component of the Fourier transform of the even components

1N

N2

N2

2πnN∆

π∆

12∆

12π

of yj and is of length N/2. Similarly, Yko is the kth component of the

Fourier transform of the odd components of yj and is of length N/2.W is a constant, which is taken to the kth power.

W = e2πi/N

Since Yk has N components, while Yke and Yk

o have N/2 components, Yke

and Yko are repeated once to give N components in the calculation of

Yk. This decomposition can be used recursively. Thus, Yke is split into

even and odd terms of length N/4.

Yke = Yk

ee + WkYkeo

Yko = Yk

oe + WkYkoo

This process is continued until there is only one component. For thisreason, the number N is taken as a power of 2. The vector yj is filledwith zeroes, if need be, to make N = 2 p for some p. The standardFourier transform takes N 2 operations to calculate, whereas the fastFourier transform takes only N log2 N. For large N, the difference issignificant; at N = 100 it is a factor of 15, but for N = 1000 it is a factorof 100.

The discrete Fourier transform can also be used for differentiatinga function, and this is used in the spectral method for solving differ-ential equations [Gottlieb, D., and S. A. Orszag, Numerical Analysis ofSpectral Methods: Theory and Applications, SIAM, Philadelphia(1977); Trefethen, L. N., Spectral Methods in Matlab, SIAM,Philadelphia (2000)]. Suppose we have a grid of equidistant points

xn = n∆x, n = 0, 1, 2, . . . , 2N − 1, ∆x =

The solution is known at each of these grid points y(xn). First the dis-crete Fourier transform is taken:

Yk = 2N − 1

n = 0

y(xn)e−2ikπxn /L, k = −N, − N + 1, . . . , 0, . . . , N − 1, N

The inverse transformation is

y(x) = N

k = −N

Yke2ikπx/L

Differentiate this to get

= N

k = −N

Yk e2ikπx/L

Thus at the grid points

n=

N

k = −N

Yk e2ikπxn /L

The process works as follows. From the solution at all grid points theFourier transform is obtained using FFT, Yk. Then this is multipliedby 2πik/L to obtain the Fourier transform of the derivative.

Yk = Yk

Then the inverse Fourier transform is taken using FFT, giving thevalue of the derivative at each of the grid points.

n=

N

k = −N

Yke2ikπxn /L

The spectral method is used for direct numerical simulation (DNS)of turbulence. The Fourier transform is taken of the differential equa-tion, and the resulting equation is solved. Then the inverse transfor-mation gives the solution. When there are nonlinear terms, they arecalculated at each node in physical space, and the Fourier transform istaken of the result. This technique is especially suited to time-depen-dent problems, and the major computational effort is in the fastFourier transform.

1L

dydx

2πik

L

2πik

Ldydx

2πik

L1L

dydx

1L

12N

L2N

NUMERICAL ANALYSIS AND APPROXIMATE METHODS 3-59

Page 63: 03 mathematics

REFERENCES: General references include the following textbooks. For nonlin-ear programming, Fletcher, R., Practical Methods of Optimization, Wiley (1987);Nocedal, J., and S. J. Wright, Numerical Optimization, Springer, New York (1999);Conn, A. R., N. Gould, and P. Toint, Trust Region Methods, SIAM, Philadelphia(2000); Edgar, T. F., D. M. Himmelblau, and L. S. Lasdon, Optimization of Chem-ical Processes, McGraw-Hill (2002). For linear programming, Dantzig, G. B., Lin-ear Programming and Extensions, Princeton, N.J.: Princeton University Press,(1963); Hillier, F., and G. J. Lieberman, Introduction to Operations Research,Holden-Day, San Francisco (1974). For mixed integer programming, Biegler, L.T., I. E. Grossmann, and A. W. Westerberg, Systematic Methods for ChemicalProcess Design, Prentice-Hall, Englewood Cliffs, N.J. (1997); Nemhauser, G. L.,and L. A. Wolsey, Integer and Combinatorial Optimization, Wiley-Interscience,New York (1988). For global optimization, Floudas, C. A., Deterministic GlobalOptimization: Theory, Algorithms and Applications, Kluwer Academic Publishers(2000); Horst, R., and H. Tuy, Global Optimization: Deterministic Approaches,Springer-Verlag, Berlin (1993); Tawarmalani, M., and N. Sahinidis, Convexifica-tion and Global Optimization in Continuous and Mixed-Integer Nonlinear Pro-gramming: Theory, Algorithms, Software, and Applications, Kluwer AcademicPublishers (2002). Many useful resources including descriptions, trial software,and examples can be found on the NEOS server (http://www-neos.mcs.anl.gov)maintained at Argonne National Laboratory. Background material for this sectionincludes the two previous sections on matrix algebra and numerical analysis.

INTRODUCTION

Optimization is a key enabling tool for decision making in chemicalengineering. It has evolved from a methodology of academic interestinto a technology that continues to have a significant impact on engi-neering research and practice. Optimization algorithms form the coretools for (1) experimental design, parameter estimation, model devel-opment, and statistical analysis; (2) process synthesis analysis, design,and retrofit; (3) model predictive control and real-time optimization;and (4) planning, scheduling, and the integration of process operationsinto the supply chain.

As shown in Fig. 3-53, optimization problems that arise in chemicalengineering can be classified in terms of continuous and discrete vari-ables. For the former, nonlinear programming (NLP) problems formthe most general case, and widely applied specializations include lin-ear programming (LP) and quadratic programming (QP). An impor-tant distinction for NLP is whether the optimization problem isconvex or nonconvex. The latter NLP problem may have multiplelocal optima, and an important question is whether a global solution isrequired for the NLP. Another important distinction is whether theproblem is assumed to be differentiable or not.

Mixed integer problems also include discrete variables. These canbe written as mixed integer nonlinear programs (MINLP), or as mixedinteger linear programs (MILP), if all variables appear linearly in theconstraint and objective functions. For the latter an important caseoccurs when all the variables are integer; this gives rise to an integerprogramming (IP) problem. IP problems can be further classified intomany special problems (e.g., assignment, traveling salesperson, etc.),which are not shown in Fig. 3-53. Similarly, the MINLP problem alsogives rise to special problem classes, although here the main distinc-tion is whether its relaxation is convex or nonconvex.

The ingredients of formulating optimization problems include amathematical model of the system, an objective function that quanti-fies a criterion to be extremized, variables that can serve as decisions,and, optionally, inequality constraints on the system. When repre-sented in algebraic form, the general formulation of discrete/continu-ous optimization problems can be written as the following mixedinteger optimization problem:

Min f(x, y)subject to h(x, y) = 0

g(x, y) ≤ 0 (3-84)x ∈ n, y ∈ 0, 1

where f(x, y) is the objective function (e.g., cost, energy consumption,etc.), h(x, y) = 0 are the equations that describe the performance of thesystem (e.g., material balances, production rates), and the inequalityconstraints g(x, y) ≤ 0 can define process specifications or constraintsfor feasible plans and schedules. Note that the operator max f(x) isequivalent to min [−f(x)]. We define the real n vector x to represent thecontinuous variables while the t vector y represents the discrete vari-ables, which, without loss of generality, are often restricted to take val-ues of 0 or 1 to define logical or discrete decisions, such as assignmentof equipment and sequencing of tasks. (These variables can also be for-mulated to take on other integer values as well.) Problem (3-84) corre-sponds to a mixed integer nonlinear program when any of the functionsinvolved are nonlinear. If all functions are linear, it corresponds to amixed integer linear program. If there are no 0–1 variables, then prob-lem (3-84) reduces to a nonlinear program (3-85) or linear program (3-97) depending on whether the functions are linear.

We start with continuous variable optimization and consider in thenext section the solution of NLP problems with differentiable objec-tive and constraint functions. If only local solutions are required forthe NLP problem, then very efficient large-scale methods can be con-sidered. This is followed by methods that are not based on local opti-mality criteria; we consider direct search optimization methods thatdo not require derivatives as well as deterministic global optimizationmethods. Following this, we consider the solution of mixed integerproblems and outline the main characteristics of algorithms for theirsolution. Finally, we conclude with a discussion of optimization mod-eling software and its implementation on engineering models.

GRADIENT-BASED NONLINEAR PROGRAMMING

For continuous variable optimization we consider (3-84) without dis-crete variable y. The general NLP problem (3-85) is presented here:

Min f(x)

subject to h(x) = 0 (3-85)

g(x) ≤ 0

and we assume that the functions f(x), h(x), and g(x) have continuousfirst and second derivatives. A key characteristic of (3-85) is whetherthe problem is convex or not, i.e., whether it has a convex objectivefunction and a convex feasible region. A function φ(x) of x in somedomain X is convex if and only if for all points x1, x2 ∈ X

φ[α x1 + (1 − α)x2] ≤ αφ[x1 + (1 − α)x2] + (1 − α)φ(x2) (3-86)

holds for all ∈ (0,1). [Strict convexity requires that the inequality(3-86) be strict.] Convex feasible regions require g(x) to be a convexfunction and h(x) to be linear. If (3-85) is a convex problem, then anylocal solution is guaranteed to be a global solution to (3-85). Moreover,if the objective function is strictly convex, then this solution x* isunique. On the other hand, nonconvex problems may have multiplelocal solutions, i.e., feasible solutions that minimize the objectivefunction within some neighborhood about the solution.

We consider first methods that find only local solutions to noncon-vex problems, as more difficult (and expensive) search procedures arerequired to find a global solution. Local methods are currently very

3-60 MATHEMATICS

OPTIMIZATION

FIG. 3-53 Classes of optimization problems and algorithms.

Optimization

Mixed Integer(Discrete)

NLP(Continuous)

MINLP

MILP

IP

Differentiable

Convex

Nondifferentiable

Nonconvex

LP QP Local Global

DirectSearch

Page 64: 03 mathematics

efficient and have been developed to deal with very large NLP prob-lems. Moreover, by considering the structure of convex NLP prob-lems (including LP and QP problems), even more powerful methodscan be applied. To study these methods, we first consider conditionsfor local optimality.

Local Optimality Conditions: A Kinematic InterpretationInstead of a formal development of conditions that define a localoptimum, we present a more intuitive kinematic illustration. Considerthe contour plot of the objective function f(x), given in Fig. 3-54, as asmooth valley in space of the variables x1 and x2. For the contour plotof this unconstrained problem Min f(x), consider a ball rolling in thisvalley to the lowest point of f(x), denoted by x*. This point is at least alocal minimum and is defined by a point with a zero gradient and atleast nonnegative curvature in all (nonzero) directions p. We use thefirst-derivative (gradient) vector ∇f(x) and second-derivative (Hess-ian) matrix ∇xxf(x) to state the necessary first- and second-order con-ditions for unconstrained optimality:

∇x f(x*) = 0 pT∇xx f(x*)p ≥ 0 for all p ≠ 0 (3-87)

These necessary conditions for local optimality can be strengthenedto sufficient conditions by making the inequality in (3-87) strict (i.e.,positive curvature in all directions). Equivalently, the sufficient (nec-essary) curvature conditions can be stated as follows: ∇xx f(x*) has allpositive (nonnegative) eigenvalues and is therefore defined as a posi-tive (semidefinite) definite matrix.

Now consider the imposition of inequality [g(x) ≤ 0] and equality con-straints [h(x) = 0] in Fig. 3-55. Continuing the kinematic interpretation,the inequality constraints g(x) ≤ 0 act as “fences” in the valley, and equal-ity constraints h(x) = 0 act as “rails.” Consider now a ball, constrained ona rail and within fences, to roll to its lowest point. This stationary pointoccurs when the normal forces exerted by the fences [− ∇g(x*)]and rails [− ∇h(x*)] on the ball are balanced by the force of gravity [− ∇f(x*)]. This condition can be stated by the following Karush-Kuhn-Tucker (KKT) necessary conditions for constrained optimality:

Balance of Forces It is convenient to define the L functionL(x,λ,ν) = f(x) + g(x)Tλ + h(x)Tν, along with “weights” or multipliers λand ν for the constraints. The stationarity condition (balance of forcesacting on the ball) is then given by

∇L(x, λ, ν) = ∇f(x) + ∇h(x)λ + ∇g(x)ν = 0 (3-88)

Feasibility Both inequality and equality constraints must be sat-isfied (ball must lie on the rail and within the fences):

h(x) = 0, g(x) ≤ 0 (3-89)

Complementarity Inequality constraints are either strictly satis-fied (active) or inactive, in which case they are irrelevant to the solu-

tion. In the latter case the corresponding KKT multiplier must bezero. This is written as

νTg(x) = 0, ν ≥ 0 (3-90)

Constraint Qualification For a local optimum to satisfy theKKT conditions, an additional regularity condition is required on theconstraints. This can be defined in several ways. A typical condition isthat the active constraints at x* be linearly independent; i.e., thematrix: [∇h(x*) |∇gA(x*)] is full column rank, where gA is the vector ofinequality constraints with elements that satisfy g A,i(x*) = 0. With thisconstraint qualification, the KKT multipliers (λ, ν) are guaranteed tobe unique at the optimal solution.

Second-Order Conditions As with unconstrained optimization,nonnegative (positive) curvature is necessary (sufficient) in all theallowable (i.e., constrained) nonzero directions p. The necessarysecond-order conditions can be stated as

pT∇xxL(x*)p ≥ 0for all p ≠ 0, ∇h(x*)Tp = 0, ∇g(x*)Tp ≥ 0,∇gA(x*)Tp = 0 (3-91)

and the corresponding sufficient conditions require first the inequal-ity in (3-91) to be strict. Note that for the example in Fig. 3-54, theallowable directions p span the entire space for x while in Fig. 3-55there are no allowable directions p.

Example To illustrate the KKT conditions, consider the following uncon-strained NLP problem:

Min (x1)2 − 4x1 + 3/2 (x2)2 − 7x2 + x1x2 + 9 − ln x1 − ln x2 (3-92)

corresponding to the contour plot in Fig. 3-54. The optimal solution can befound by solving for the first-order conditions (3-87):

∇f(x) = = 0 ⇒ x* = (3-93)

and f(x*) = − 2.8742. Checking the second-order conditions leads to

∇xx f(x*) = ⇒ ∇xx f(x*) = (positive definite)

(3-94)

Now consider the constrained NLP problems

Min (x1*)2 − 4x1 + 3/2(x2

*)2 − 7x2 + x1x2 + 9 − ln x1 − ln x2

subject to 4 − x1x2 ≤ 0 (3-95)2x1 − x2 = 0

that correspond to the plot in Fig. 3-54. The optimal solution can be found by

2.5507 11 3.2387

2 + 1/(x1*)2 1

1 3 + 1/(x2*)2

1.34752.0470

2x1 − 4 + x2 − 1/x1

3x2 − 7 + x1 − 1/x2

OPTIMIZATION 3-61

FIG. 3-54 Unconstrained minimum.

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

x1

x 2

−2 02

1050

100

x*

FIG. 3-55 Constrained minimum.

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

−2 02

1050

100 x*

g(x)≤0

h(x*)=0

h(x* )

x 2

x1

f (x* )

g(x* )

Page 65: 03 mathematics

applying the first-order KKT conditions (3-88) to (3-90):

(L(x, λ ,ν) = (f(x) + (h(x)λ + (g(x)ν = + λ + ν = 0

g(x) = 4 − x1x2 ≤ 0, h(x) = 2x1 − x2 = 0

g(x) ν = (4 − x1x2) ν, ν ≥ 0 (3-96)

x* = , λ* = 1.036, ν* = 1.068

and f(x*) = −1.8421. Checking the second-order conditions (3-91) leads to

(xxL(x*,λ*,ν*) = (xx[ f(x*) + h(x*)λ* + g(x*)ν*] = =

[(h(x*) | (gA(x*)]Tp = T

p = 0, p ≠ 0

However, note that because [(h(x*) | (gA(x*)] is nonsingular, there are nononzero vectors p that satisfy the allowable directions. Hence, the sufficientsecond-order conditions [pT(xxL(x*,λ*,ν*)p > 0, for all allowable p] are vacu-ously satisfied for this problem.

Convex Cases of NLP Problems Linear programs and qua-dratic programs are special cases of (3-85) that allow for more efficientsolution, based on application of KKT conditions (3-88) through (3-91).Because these are convex problems, any locally optimal solution is aglobal solution. In particular, if the objective and constraint functionsin (3-85) are linear, then the following linear program (LP)

Min cTx

subject to Ax = b (3-97)Cx ≤ d

can be solved in a finite number of steps, and the optimal solution liesat a vertex of the polyhedron described by the linear constraints. Thisis shown in Fig. 3-56, and in so-called primal degenerate cases, multi-ple vertices can be alternate optimal solutions, with the same values ofthe objective function. The standard method to solve (3-97) is the sim-plex method, developed in the late 1940s (see Dantzig, 1963)although, starting from Karmarkar’s discovery in 1984, interior pointmethods have become quite advanced and competitive for highly con-strained problems [Wright, S. J., Primal-Dual Interior Point Methods,

2 − 2.8284− 1 − 1.4142

2.5 0.0680.068 3.125

2 + 1(x1)2 1 − ν1 − ν 3 + 1(x2)2

1.41422.8284

− x2

− x1

2− 1

2x1 − 4 + x2 − 1x1

3x2 − 7 + x1 − 1x2

SIAM, Philadelphia (1996)]. The simplex method proceeds by movingsuccessively from vertex to vertex with improved objective functionvalues. Methods to solve (3-97) are well implemented and widelyused, especially in planning and logistical applications. They also formthe basis for MILP methods discussed later. Currently, state-of-the-art LP solvers can handle millions of variables and constraints, and theapplication of further decomposition methods leads to the solution ofproblems that are two or three orders of magnitude larger than this.See the general references of Hillier and Lieberman (1974) andEdgar et al. (2002) for more details. Also, the interior point method isdescribed below from the perspective of more general NLP problems.

Quadratic programs (QPs) represent a slight modification of (3-97)and can be stated as

Min cTx + 1/2 xTQxsubject to Ax = b (3-98)

Cx ≤ d

If the matrix Q is positive semidefinite (positive definite) when pro-jected into the null space of the active constraints, then (3-98) is(strictly) convex and the QP is a global (and unique) minimum. Other-wise, local solutions exist for (3-98), and more extensive global opti-mization methods are needed to obtain the global solution. Like LPs,convex QPs can be solved in a finite number of steps. However, asseen in Fig. 3-57, these optimal solutions can lie on a vertex, on a con-straint boundary, or in the interior. A number of active set strategieshave been created that solve the KKT conditions of the QP and incor-porate efficient updates of active constraints. Popular methodsinclude null space algorithms, range space methods, and Schur com-plement methods. As with LPs, QP problems can also be solved withinterior point methods [see Wright (1996)].

Solving the General NLP Problem Solution techniques for (3-85) deal with satisfaction of the KKT conditions (3-88) through (3-91). Many NLP solvers are based on successive quadratic program-ming (SQP) as it allows the construction of a number of NLPalgorithms based on the Newton-Raphson method for equation solving(see “Numerical Analysis” section). SQP solvers have been shown torequire the fewest function evaluations to solve NLP problems [Schit-tkowski, K., Lecture Notes in Economics and Mathematical Systems,no. 282, Springer-Verlag, Berlin (1987)], and they can be tailored to abroad range of process engineering problems with different structure.

The SQP strategy applies the equivalent of a Newton step to theKKT conditions of the nonlinear programming problem, and thisleads to a fast rate of convergence. By adding slack variables s, thefirst-order KKT conditions can be rewritten as

∇f(x) + ∇h(x) λ + ∇g(x)ν = 0 (3-99a)h(x) = 0 (3-99b)

3-62 MATHEMATICS

Min

Linear Program

Min

Linear Program(Alternate Optima)

FIG. 3-56 Contour plots of linear programs.

Page 66: 03 mathematics

g(x) + s = 0 (3-99c)

SVe = 0 (3-99d)

(s, ν) ≥ 0 (3-99e)

where e = [1, 1, . . . , 1]T, S = diags, and V = diagν. SQP methodsfind solutions that satisfy (3-99) by generating Newton-like searchdirections at iteration k. However, Eqs. (3-99d) and active bounds (3-99e)are dependent at the solution and serve to make the KKT system ill-conditioned near the solution. SQP algorithms treat these conditionsin two ways. In the active set strategy, discrete decisions are maderegarding the active constraint set i ∈ I = i|gi(x*) = 0 and (3-99d) isreplaced by si = 0, i ∈ I, and νi = 0, i ∉ I. Determining the active set isa combinatorial problem, and a straightforward way to determine anestimate of the active set [and to satisfy (3-99e)] is to formulate andsolve, at a point xk, the following QP at iteration k:

Min ∇ f(xk)Tp + 1/2 pT∇xx L(xk, λk, νk)psubject to h(xk) + ∇h(xk)Tp = 0 (3-100)

g(xk) + ∇g(xk)Tp + s = 0, s ≥ 0

The KKT conditions of (3-100) are given by

∇f(xk) + ∇2L(xk, λk, ν k)p + ∇h(xk)λ + ∇g(xk)ν = 0 (3-101a)

h(xk) + ∇h(xk) T p = 0 (3-101b)

g(xk) + ∇g(xk) T p + s = 0 (3-101c)

SVe = 0 (3-101d)

(s, ν) ≥ 0 (3-101e)

where the Hessian of the Lagrange function ∇xxL(x, λ, ν) = ∇xx[f(x) +h(x)T λ + g(x)Tν] is calculated directly or through a quasi-Newtonapproximation (created by differences of gradient vectors). It is easy toshow that (3-101a) through (3-101c) correspond to a Newton-Raphsonstep for (3-99a) through (3-99c) applied at iteration k. Also, selection ofthe active set is now handled at the QP level by satisfying the condi-tions (3-101d) and (3-101e). To evaluate and change candidate activesets, QP algorithms apply inexpensive matrix updating strategies to theKKT matrix associated with (3-100). Details of this approach can befound in Nocedal and Wright (1999) and Fletcher (1987).

As alternatives that avoid the combinatorial problem of selectingthe active set, interior point (or barrier) methods modify the NLPproblem (3-85) to form

Min f(xk) − µΣi lnsi

subject to h(xk) = 0 (3-102)g(xk) + s = 0

where the solution to (3-103) has s > 0 for the penalty parameter µ > 0, and decreasing µ to 0 leads to solution of problem (3-85). TheKKT conditions for this problem can be written as

∇f(x*) + ∇h(x*)λ + ∇g(x*)ν = 0

h(x*) = 0g(x*) + s = 0 (3-103)SVe = µe

and for µ > 0, s > 0, and ν > 0, Newton steps generated to solve (3-103)are well behaved and analogous to (3-101), with a modification on theright-hand side of (3-101d). A detailed description of this algorithm,called IPOPT, can be found in Wächter and Biegler [Math. Prog.106(1), 25–57 (2006)].

Both active set and interior point methods possess clear tradeoffs.Interior point methods may require more iterations to solve (3-102)for various values of µ, while active set methods require the solutionof the more expensive QP subproblem (3-100). Thus, if there are fewinequality constraints or an active set is known (say from a good start-ing guess, or a known QP solution from a previous iteration),then solving (3-100) is not expensive and the active set method isfavored. On the other hand, for problems with many inequality con-straints, interior point methods are often faster, as they avoid the com-binatorial problem of selecting the active set. This is especially true forlarge-scale problems and when a large number of bounds are active.Examples that demonstrate the performance of these approachesinclude the solution of model predictive control (MPC) problems[Rao et al., J. Optim. Theory Appl. 99:723 (1998); Albuquerque et al.,Comp. Chem. Eng. 23:283 (1997)] and the solution of large optimalcontrol problems using barrier NLP solvers. For instance, IPOPTallows the solution of problems with more than 1,000,000 variablesand up to 50,000 degrees of freedom [see Biegler et al., Chem. Eng.Sci. 57(4):575–593 (2002); Laird et al., ASCE J. Water Resource Man-agement and Planning 131(2):125 (2005)].

Other Gradient-Based NLP Solvers In addition to SQP meth-ods, a number of NLP solvers have been developed and adapted forlarge-scale problems. Generally these methods require more func-tion evaluations than of SQP methods, but they perform very wellwhen interfaced to optimization modeling platforms, where functionevaluations are cheap. All these can be derived from the perspectiveof applying Newton steps to portions of the KKT conditions.

LANCELOT (Conn et al., 2000) is based on the solution of bound-constrained subproblems. Here an augmented Lagrangian is formedfrom (3-85) and the following subproblem is solved:

Min f(x) + λTh(x) + νT[g(x) + s] + 1/2 ρ||h(x), g(x) + s||2(3-104)

subject to s ≥ 0

The above subproblem can be solved very efficiently for fixed valuesof the multipliers λ and ν and penalty parameter ρ. Here a gradientprojection trust region method is applied. Once subproblem (3-104)is solved, the multipliers and penalty parameter are updated in anouter loop and the cycle repeats until the KKT conditions for (3-85)are satisfied. LANCELOT works best when exact second derivativesare available. This promotes a fast convergence rate in solving each

OPTIMIZATION 3-63

Min

Min

Min

Convex Objective FunctionsLinear Constraints

FIG. 3-57 Contour plots of convex quadratic programs.

Page 67: 03 mathematics

subproblem and allows a bound-constrained trust region method toexploit directions of negative curvature in the Hessian matrix.

Reduced gradient methods are active set strategies that rely on par-titioning the variables and solving (3-99) in a nested manner. Withoutloss of generality, problem (3-85) can be rewritten as Min f (z) subjectto c(z) = 0, a ≤ z ≤ b. Variables are partitioned as nonbasic variables(those fixed to their bounds), basic variables (those that can be solvedfrom the equality constraints), and superbasic variables (those remain-ing variables between bounds that serve to drive the optimization);this leads to zT = [zN

T, zBT, zS

T]. This partition is derived from localinformation and may change over the course of the optimization iter-ations. The corresponding KKT conditions can be written as

∇N f(z) + ∇Nc(z)γ = βa − βb (3-105a)

∇B f(z) + ∇Bc(z)γ = 0 (3-105b)

∇S f(z) + ∇Sc(z)γ = 0 (3-105c)

c(z) = 0 (3-105d)

zN,j = aj or bj, βa,j ≥ 0, βb,j = 0 or βb,j ≥ 0, βa,j = 0 (3-105e)

where γ and β are the KKT multipliers for the equality and boundconstraints, respectively, and (3-105e) replaces the complementarityconditions (3-90). Reduced gradient methods work by nesting equa-tions (3-105b,d) within (3-105a,c). At iteration k, for fixed values of zN

k and zSk, we can solve for zB by using (3-105d) and for γ by using

(3-105b). Moreover, linearization of these equations leads to sensitiv-ity information (i.e., constrained derivatives or reduced gradients) thatindicates how zB changes with respect to zS and zN. The algorithm thenproceeds by updating zS by using reduced gradients in a Newton-typeiteration to solve Eq. (3-105c). Following this, bound multipliers β arecalculated from (3-105a). Over the course of the iterations, if the vari-ables zB or zS exceed their bounds or if some bound multipliers βbecome negative, then the variable partition needs to be changed andEqs. (3-105) are reconstructed. These reduced gradient methods areembodied in the popular GRG2, CONOPT, and SOLVER codes(Edgar et al., 2002). The SOLVER code has been incorporated intoMicrosoft Excel. CONOPT [Drud, A., ORSA J. Computing 6: 207–216(1994)] is an efficient and widely used code in several optimizationmodelling environments.

MINOS (Murtagh and Saunders, Technical Report SOL 83-20R,Stanford University, 1987) is a well-implemented package that offersa variation on reduced gradient strategies. At iteration k, Eq. (3-105d)is replaced by its linearization

c(zkN, zk

B, zkS) + ∇Bc(zk)T (zB − zk

B) + ∇Sc(zk)T (zS − zkS) = 0 (3-106)

and Eqs. (3-105a–c, e) are solved with (3-106) as a subproblem byusing concepts from the reduced gradient method. At the solution ofthis subproblem, constraints (3-105d) are relinearized and the cyclerepeats until the KKT conditions of (3-105) are satisfied. The aug-mented lagrangian function from (3-104) is used to penalize move-ment away from the feasible region. For problems with few degrees offreedom, the resulting approach leads to an extremely efficientmethod even for very large problems. MINOS has been interfaced toa number of modelling systems and enjoys widespread use. It per-forms especially well on large problems with few nonlinear con-straints. However, on highly nonlinear problems it is usually lessreliable than other reduced gradient methods.

Algorithmic Details for NLP Methods All the above NLPmethods incorporate concepts from the Newton-Raphson method forequation solving. Essential features of these methods are that theyprovide (1) accurate derivative information to solve for the KKT con-ditions, (2) stabilization strategies to promote convergence of theNewton-like method from poor starting points, and (3) regularizationof the Jacobian matrix in Newton’s method (the so-called KKT matrix)if it becomes singular or ill-conditioned.

1. NLP methods provide first and second derivatives. The KKTconditions require first derivatives to define stationary points, soaccurate first derivatives are essential to determine locally optimalsolutions for differentiable NLPs. Moreover, Newton-Raphson meth-ods that are applied to the KKT conditions, as well as the task ofchecking second-order KKT conditions, necessarily require second-

derivative information. (Note that second-order conditions are notchecked by methods that do not use second derivatives.) With therecent development of automatic differentiation tools, many model-ing and simulation platforms can provide exact first and second deriv-atives for optimization. When second derivatives are available for theobjective or constraint functions, they can be used directly inLANCELOT as well as SQP and reduced gradient methods. Other-wise, on problems with few superbasic variables, both reduced gradi-ent methods and SQP methods [with reduced gradient methodsapplied to the QP subproblem (3-100)] can benefit from positive def-inite quasi-Newton approximations [Nocedal and Wright (1999)]applied to reduced second-derivative quantities (the so-calledreduced Hessian). Finally, for problems with least squares functions(see “Statistics” subsection), as in data reconciliation, parameter esti-mation, and model predictive control, one can often assume that thevalues of the objective function and its gradient at the solution arevanishingly small. Under these conditions, one can show that the mul-tipliers (λ, ν) also vanish and ∇xxL(x, λ, ν) can be substituted by ∇xx f(x*). This Gauss-Newton approximation has been shown to bevery efficient for the solution of least squares problems [see Nocedaland Wright (1999)].

2. Line search and trust region methods promote convergence frompoor starting points. These are commonly used with the search direc-tions calculated from NLP subproblems such as (3-100). In a trustregion approach, the constraint ||p|| ≤ ∆ is added and the iteration stepis taken if there is sufficient reduction of some merit function (e.g.,the objective function weighted with some measure of the constraintviolations). The size of the trust region ∆ is adjusted based on theagreement of the reduction of the actual merit function compared toits predicted reduction from the subproblem (see Conn et al., 2000,for details). Such methods have strong global convergence propertiesand are especially appropriate for ill-conditioned NLPs. Thisapproach has been applied in the KNITRO code [Byrd, Hribar, andNocedal, SIAM J. Optimization 9(4):877 (1999)]. Line search methodscan be more efficient on problems with reasonably good startingpoints and well-conditioned subproblems, as in real-time optimiza-tion. Typically, once a search direction is calculated from (3-100),or other related subproblem, a step size α ∈ (0, 1) is chosen so that xk + αp leads to a sufficient decrease of a merit function. As a recentalternative, a novel filter stabilization strategy (for both line search andtrust region approaches) has been developed based on a bicriterionminimization, with the objective function and constraint infeasibilityas competing objectives [Fletcher et al., SIAM J. Optimization13(3):635 (2002)]. This method often leads to better performancethan that based on merit functions.

3. Regularization of the KKT matrix for the NLP subproblem isessential for good performance of general-purpose algorithms. Forinstance, to obtain a unique solution to (3-100), active constraint gra-dients must be full rank and the Hessian matrix, when projected intothe null space of the active constraint gradients, must be positive def-inite. These properties may not hold far from the solution, and cor-rections to the Hessian in SQP may be necessary (see Fletcher, 1987).Regularization methods ensure that subproblems such as (3-100)remain well-conditioned; they include addition of positive constantsto the diagonal of the Hessian matrix to ensure its positive definite-ness, judicious selection of active constraint gradients to ensure thatthey are linearly independent, and scaling the subproblem to reducethe propagation of numerical errors. Often these strategies are heuris-tics built into particular NLP codes. While quite effective, most ofthese heuristics do not provide convergence guarantees for generalNLPs.

From the conceptual descriptions as well as algorithmic detailsgiven above, it is clear that NLP solvers are complex algorithms thathave required considerable research and development to turn theminto reliable and efficient software tools. Practitioners who are con-fronted with engineering optimization problems should thereforeleverage these efforts, rather than write their own codes. Table 3-4presents a sampling of available NLP codes that represent the aboveclassifications. Much more information on these and other codes canbe found on the NEOS server (www-neos.mcs.anl.gov) and the NEOSSoftware Guide: http://www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide.

3-64 MATHEMATICS

Page 68: 03 mathematics

OPTIMIZATION METHODS WITHOUT DERIVATIVES

A broad class of optimization strategies does not require derivativeinformation. These methods have the advantage of easy implementa-tion and little prior knowledge of the optimization problem. In partic-ular, such methods are well suited for “quick and dirty” optimizationstudies that explore the scope of optimization for new problems, priorto investing effort for more sophisticated modeling and solutionstrategies. Most of these methods are derived from heuristics that nat-urally spawn numerous variations. As a result, a very broad literaturedescribes these methods. Here we discuss only a few important trendsin this area.

Classical Direct Search Methods Developed in the 1960s and1970s, these methods include one-at-a-time search and methodsbased on experimental designs (EVOP). At that time, these directsearch methods were the most popular optimization methods inchemical engineering. Methods that fall into this class include the pat-tern search of Hooke and Jeeves [J. ACM 8:212 (1961)], the conjugatedirection method of Powell (1964), simplex and complex searches, inparticular Nelder-Mead [Comput. J. 7: 308 (1965)], and the adaptiverandom search methods of Luus-Jaakola [AIChE J. 19: 760 (1973)],Goulcher and Cesares Long [Comp. Chem. Engr. 2: 23 (1978)] and

Banga et al. [in State of the Art in Global Optimization, C. Floudasand P. Pardalos (eds.), Kluwer, Dordrecht, p. 563 (1996)]. All thesemethods require only objective function values for unconstrainedminimization. Associated with these methods are numerous studieson a wide range of process problems. Moreover, many of these meth-ods include heuristics that prevent premature termination (e.g., direc-tional flexibility in the complex search as well as random restarts anddirection generation). To illustrate these methods, Fig. 3-58 illustratesthe performance of a pattern search method as well as a randomsearch method on an unconstrained problem.

Simulated Annealing This strategy is related to random searchmethods and derives from a class of heuristics with analogies to themotion of molecules in the cooling and solidification of metals[Laarhoven and Aarts, Simulated Annealing: Theory and Applications,Reidel Publishing, Dordrecht (1987)]. Here a temperature parameter can be raised or lowered to influence the probability of acceptingpoints that do not improve the objective function. The method startswith a base point x and objective value f(x). The next point x& is chosenat random from a distribution. If f(x&) < f(x), the move is accepted withx& as the new point. Otherwise, x& is accepted with probability p(, x&,x). Options include the Metropolis distribution p(, x, x&) = exp−[f(x&)−f(x)]/ and the Glauber distribution, p(, x, x&) = exp−[f(x&)−f(x)]//(1+ exp−[f(x&)−f(x)]/). The parameter is then reduced, and themethod continues until no further progress is made.

Genetic Algorithms This approach, first proposed in Holland,J. H., Adaptations in Natural and Artificial Systems [University ofMichigan Press, Ann Arbor (1975)], is based on the analogy of improv-ing a population of solutions through modifying their gene pool. It alsohas similar performance characteristics as random search methodsand simulated annealing. Two forms of genetic modification,crossover or mutation, are used and the elements of the optimizationvector x are represented as binary strings. Crossover deals with ran-dom swapping of vector elements (among parents with highest objec-tive function values or other rankings of population) or any linearcombinations of two parents. Mutation deals with the addition of arandom variable to elements of the vector. Genetic algorithms (GAs)have seen widespread use in process engineering, and a number ofcodes are available. Edgar et al. (2002) describe a related GA that isavailable in MS Excel.

Derivative-free Optimization (DFO) In the past decade, theavailability of parallel computers and faster computing hardware andthe need to incorporate complex simulation models within optimiza-tion studies have led a number of optimization researchers to recon-sider classical direct search approaches. In particular, Dennis andTorczon [SIAM J. Optim. 1: 448 (1991)] developed a multidimen-sional search algorithm that extends the simplex approach of Nelder

OPTIMIZATION 3-65

TABLE 3-4 Representative NLP Solvers

Algorithm Second-orderMethod type Stabilization information

CONOPT Reduced gradient Line search Exact and (Drud, 1994) quasi-Newton

GRG2 (Edgar Reduced gradient Line search Quasi-Newtonet al., 2002)

IPOPT SQP, barrier Line search Exact KNITRO (Byrd SQP, barrier Trust region Exact and

et al., 1997) quasi-NewtonLANCELOT Augmented lagrangian, Trust region Exact and

bound constrained quasi-NewtonLOQO SQP, barrier Line search Exact MINOS Reduced gradient, Line search Quasi-Newton

augmented lagrangianNPSOL SQP, active set Line search Quasi-NewtonSNOPT Reduced space Line search Quasi-Newton

SQP, active setSOCS SQP, active set Line search Exact SOLVER Reduced gradient Line search Quasi-NewtonSRQP Reduced space Line search Quasi-Newton

SQP, active set

FIG. 3-58 Examples of optimization methods without derivatives. (a) Pattern search method. (b) Random search method., first phase; ∆, second phase; *, third phase.

(a) (b)

Page 69: 03 mathematics

and Mead (1965). They note that the Nelder-Mead algorithm fails asthe number of variables increases, even for very simple problems. Toovercome this, their multidimensional pattern search approach com-bines reflection, expansion, and contraction steps that act as linesearch algorithms for a number of linear independent search direc-tions. This approach is easily adapted to parallel computation, and themethod can be tailored to the number of processors available. More-over, this approach converges to locally optimal solutions for uncon-strained problems and observes an unexpected performance synergywhen multiple processors are used. The work of Dennis and Torczon(1991) has spawned considerable research on the analysis and codedevelopment for DFO methods. For instance, Conn et al. [Math. Pro-gramming, Series B, 79(3): 397 (1997)] constructed a multivariableDFO algorithm that uses a surrogate model for the objective functionwithin a trust region method. Here points are sampled to obtain awell-defined quadratic interpolation model, and descent conditionsfrom trust region methods enforce convergence properties. A numberof trust region methods that rely on this approach are reviewed inConn et al. (1997). Moreover, a number of DFO codes have beendeveloped that lead to black box optimization implementations forlarge, complex simulation models [see Audet and Dennis, SIAM J.Optim.13: 889 (2003); Kolda et al., SIAM Rev. 45(3): 385 (2003)].These include the DAKOTA package at Sandia National Lab (Eldred,2002; http://endo.sandia.gov/DAKOTA/software.html) and FOCUSdeveloped at Boeing Corporation (Booker et al., CRPC TechnicalReport 98739, Rice University, February 1998).

Direct search methods are easy to apply to a wide variety of prob-lem types and optimization models. Moreover, because their termina-tion criteria are not based on gradient information and stationarypoints, they are more likely to favor the search for globally optimalrather than locally optimal solutions. These methods can also beadapted easily to include integer variables. However, no rigorous con-vergence properties to globally optimal solutions have yet been dis-covered. Also, these methods are best suited for unconstrainedproblems or for problems with simple bounds. Otherwise, they mayhave difficulties with constraints, as the only options open for handlingconstraints are equality constraint elimination and addition of penaltyfunctions for inequality constraints. Both approaches can be unreli-able and may lead to failure of the optimization algorithm. Finally, theperformance of direct search methods scales poorly (and often expo-nentially) with the number of decision variables. While performancecan be improved with the use of parallel computing, these methodsare rarely applied to problems with more than a few dozen decisionvariables.

GLOBAL OPTIMIZATION

Deterministic optimization methods are available for nonconvex non-linear programming problems of the form (3-85) that guarantee con-vergence to the global optimum. More specifically, one can showunder mild conditions that they converge to an % distance to the globaloptimum in a finite number of steps. These methods are generallymore expensive than local NLP methods, and they require theexploitation of the structure of the nonlinear program.

Because there are no optimality conditions like the KKT conditionsfor global optimization, these methods work by first partitioning theproblem domain (i.e., containing the feasible region) into subregions.Upper bounds on the objective function are computed over all subre-gions of the problem. In addition, lower bounds can be derived fromconvex relaxations of the objective function and constraints for eachsubregion. The algorithm then proceeds to eliminate all subregionsthat have infeasible constraint relaxations or lower bounds that aregreater than the least upper bound. After this, the remaining regionsare further partitioned to create new subregions, and the cycle con-tinues until the upper and lower bounds converge.

This basic concept leads to a wide variety of global algorithms, withthe following features that can exploit different problem classes.Bounding strategies relate to the calculation of upper and lowerbounds. For the former, any feasible point or, preferably, a locallyoptimal point in the subregion can be used. For the lower bound, con-vex relaxations of the objective and constraint functions are derived.

The refining step deals with the construction of partitions in thedomain and further partitioning them during the search process.Finally, the selection step decides on the order of exploring the opensubregions.

For simplicity, consider the problem Min f(x) subject to g(x) ≤ 0where each function can be defined by additive terms. Convex relax-ations for f(x) and g(x) can be derived in the following ways:• Convex additive terms remain unmodified in these functions.• Concave additive unary terms are replaced by a linear underesti-

mating function that matches the term at the bounds of the subre-gion.

• Nonconvex polynomial terms can be replaced by a set of binaryterms, with new variables introduced to define the higher-orderpolynomials.

• Binary terms can be relaxed by using the McCormick underestima-tor; e.g., the binary term xz is replaced by a new variable w and lin-ear inequality constraints

w ≥ xlz + zlx − xlzl

(3-107)w ≥ xuz + zux − xuzu

w ≤ xuz + zlx − xuzl

w ≤ xlz + zux − xlzu

where the subregions are defined by xl ≤ x ≤ xu and zl ≤ z ≤ zu. Thusthe feasible region and the objective function are replaced by con-vex envelopes to form relaxed problems.Solving these convex relaxed problems leads to global solutions that

are lower bounds to the NLP in the particular subregion. Finally, wesee that gradient-based NLP solvers play an important role in globaloptimization algorithms, as they often yield the lower and upperbounds for the subregions. The spatial branch and bound global opti-mization algorithm can therefore be given by the following steps:

0. Initialize algorithm: calculate upper and lower bounds over theentire (relaxed) feasible region.

For iteration k with a set of partitions Mkj and bounds in each sub-region fLj and fUj:

1. Bound. Define best upper bound: fU = minj fUj and delete(fathom) all subregions j with lower bounds fLj ≥ fU . If the remaining sub-regions satisfy fLj ≥ fU− ε, stop.

2. Refine. Divide the remaining active subregions into partitionsMk,j1 and Mk,j2. (Many branching rules are available for this step.)

3. Select. Solve the convex relaxed NLP in the new partitions toobtain fLj1 and fLj2. Delete the partition if there is no feasible solution.

4. Update. Obtain upper bounds fUj1 and fUj2 to new partitions, ifpresent. Set k = k+1, update partition sets, and go to step 1.

Example To illustrate the spatial branch and bound algorithm, considerthe global solution of

Min f(x) = x4 20x3 + 55x2 57x, subject to 0.5 ≤ x ≤ 2.5

(3-108)

As seen in Fig. 3-59, this problem has local solutions at x* = 2.5 and at x* =0.8749. The latter is also the global solution with f(x*) = −19.7. To find the globalsolution, we note that all but the−20x3 term in (3-108) are convex, so we replacethis term by a new variable and a linear underestimator within a particular sub-region, i.e.,

Min fL(x) = x4 − 20w + 55x2−57x

subject to x l ≤ x ≤ xu (3-109)

w = xl3

xx

u

u

−−

xx

l

+ xu3

xx

u

−−

xxl

l

In Fig. 3-59 we also propose subregions that are created by simple bisectionpartitioning rules, and we use a “loose” bounding tolerance of ε = 0.2. Ineach partition the lower bound fL is determined by (3-109) and the upperbound fU is determined by the local solution of the original problem in thesubregion. Figure 3-60 shows the progress of the spatial branch and boundalgorithm as the partitions are refined and the bounds are updated. In Fig.3-60, note the definitions of the partitions for the nodes, and the sequencenumbers in each node that show the order in which the partitions are

52

52

3-66 MATHEMATICS

Page 70: 03 mathematics

processed. The grayed partitions correspond to the deleted subregions, andat termination of the algorithm we see that fLj ≥ fU − ε (that is, −19.85 ≥−19.7 − 0.2), with the gray subregions in Fig. 3-59 still active. Further parti-tioning in these subregions will allow the lower and upper bounds to con-verge to a tighter tolerance.

Note that a number of improvements can be made to the bounding, refine-ment, and selection strategies in the algorithm that accelerate the convergenceof this method. A comprehensive discussion of all these options can be found inFloudas (2000), Tawarlamani and Sahinidis (2002), and Horst and Tuy (1993).Also, a number of efficient global optimization codes have recently been devel-oped, including αBB, BARON, LGO, and OQNLP. An interesting numericalcomparison of these and other codes can be found in Neumaier et al., Math.Prog. B 103(2): 335(2005).

MIXED INTEGER PROGRAMMING

Mixed integer programming deals with both discrete and continuousdecision variables. For the purpose of illustration we consider discretedecisions as binary variables, that is, yi = 0 or 1, and we consider themixed integer problem (3-84). Unlike in local optimization methods,there are no optimality conditions, such as the KKT conditions, thatcan be applied directly. Instead, as in the global optimization methods,a systematic search of the solution space, coupled with upper andlower bounding information, is applied. As with global optimizationproblems, large mixed integer programs can be expensive to solve,and some care is needed in the problem formulation.

Mixed Integer Linear Programming If the objective and con-straint functions are all linear, then (3-84) becomes a mixed integerlinear programming problem given by

Min Z = aTx + cTy

subject to Ax + By ≤ b (3-110)x ≥ 0, y ∈ 0, 1t

Note that if we relax the t binary variables by the inequalities 0 ≤ y ≤ 1,then (3-110) becomes a linear program with a (global) solution that isa lower bound to the MILP (3-110). There are specific MILP classeswhere the LP relaxation of (3-110) has the same solution as the MILP.Among these problems is the well-known assignment problem. OtherMILPs that can be solved with efficient special-purpose methods arethe knapsack problem, the set covering and set partitioning problems,and the traveling salesperson problem. See Nemhauser and Wolsey(1988) for a detailed treatment of these problems.

More generally, MILPs are solved with branch and bound algo-rithms, similar to the spatial branch and bound method of the previ-ous section, that explore the search space. As seen in Fig. 3-61, binaryvariables are used to define the search tree, and a number of bound-ing properties can be noted from the structure of (3-110).

Upper bounds on the objective function can be found from any fea-sible solution to (3-110), with y set to integer values. These can befound at the bottom or “leaf” nodes of a branch and bound tree (andsometimes at intermediate nodes as well). The top, or root, node in

OPTIMIZATION 3-67

FIG. 3-59 Global optimization example with partitions.

0.5 21.51 2.5−20

−19

−18

−17

−16

−15

−14

−13

x

f(x)

-1--107.6-19.7

-15--17.4

-12--19.03

-11--19.58

-6--19.6

-9--19.87-19.7

-8--19.85-19.7

-4--22.38-19.7

-2--34.5-19.7

-5--23.04-19.5

-13--107.6-19.7

-16--16.25

-18--15.7

-19--15.76

x≤1.75≤x

x≤1.5≤x

x≤1≤x

x≤0.75≤x x≤1.25≤x

x≤1.125≤xx≤0.875≤x

x≤2 ≤x

x≤2.25≤x

-node#-fL

fU( x *)

-10--20.12-19.5

-7--20.5-19.7

-14--21.66-15.1

-17--21.99-13.55

-3--44.1-13.9

FIG. 3-60 Spatial branch and bound sequence for global optimization example.

Page 71: 03 mathematics

the tree is the solution to the linear programming relaxation of (3-110); this is a lower bound to (3-110). On the other hand, as oneproceeds down the tree with a partial assignment of the binary vari-ables, a lower bound for any leaf node in that branch can be found fromsolution of the linear program at this intermediate node with the remain-ing binary variables relaxed. This leads to the following properties: • Any intermediate node with an infeasible LP relaxation has infeasi-

ble leaf nodes and can be fathomed (i.e., all remaining children ofthis node can be eliminated).

• If the LP solution at an intermediate node is not less than an exist-ing integer solution, then the node can be fathomed.

These properties lead to pruning of the search tree. Branching thencontinues in the tree until the upper and lower bounds converge.

This basic concept leads to a wide variety of MILP algorithms with thefollowing features. LP solutions at intermediate nodes are relatively easyto calculate with the simplex method. If the solution of the parent nodeis known, multiplier information from this solution can be used to calcu-late (via a very efficient pivoting operation) the LP solution at the childnode. Branching strategies to navigate the tree take a number of forms.More common depth-first strategies expand the most recent node to aleaf node or infeasible node and then backtrack to other branches in thetree. These strategies are simple to program and require little storage ofpast nodes. On the other hand, breadth-first strategies expand all thenodes at each level of the tree, select the node with the lowest objectivefunction, and then proceed until the leaf nodes are reached. Here, morestorage is required but generally fewer nodes are evaluated than indepth-first search. In addition, selection of binary variable for branchingis based on a number of criteria, including choosing the variable with therelaxed value closest to 0 or 1, or the one leading to the largest change inthe objective. Additional description of these strategies can be found inBiegler et al. (1997) and Nemhauser and Wolsey (1988).

Example To illustrate the branch and bound approach, we consider theMILP:

Min Z = x + y1 +2y2 + 3y3

subject to −x + 3y1 + y2 + 2y3 ≤ 0 (3-111)

− 4y1− 8y2− 3y3 ≤ −10

x ≥ 0, y1, y2, y3 = 0, 1

The solution to (3-111) is given by x = 4, y1 = 1, y2 = 1, y3 = 0, and Z = 7. Herewe use a depth-first strategy and branch on the variables closest to 0 or 1. Fig. 3-61 shows the progress of the branch and bound algorithm as the binary vari-ables are selected and the bounds are updated. The sequence numbers for eachnode in Fig. 3-61 show the order in which they are processed. The grayed parti-tions correspond to the deleted nodes, and at termination of the algorithm wesee that Z = 7 and an integer solution is obtained at an intermediate node wherecoincidentally y3 = 0.

A number of improvements that can be made to the branching rules willaccelerate the convergence of this method. A comprehensive discussion of allthese options can be found in Nemhauser and Wolsey (1988). Also, a number ofefficient MILP codes have recently been developed, including CPLEX, OSL,XPRESS, and ZOOM. All these serve as excellent large-scale optimization codesas well. A detailed description and availability of these and other MILP solvers

can be found in the NEOS Software Guide: http://www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide.

Mixed Integer Nonlinear Programming Without loss of gen-erality, we can rewrite the MINLP in (3-84) as:

Min Z = f(x) + cTy

subject to g(x) + By ≤ b (3-112)

x ≥ 0, y ∈ 0, 1t

where the binary variables are kept as separate linear terms. Wedevelop several MINLP solution strategies by drawing from the mate-rial in the preceding sections. MINLP strategies can be classified intotwo types. The first deals with nonlinear extensions of the branch andbound method discussed above for MILPs. The second deals withouter approximation decomposition strategies that provide lower andupper bounding information for convergence.

Nonlinear Branch and Bound The MINLP (3-112) canbe solved in a similar manner to (3-110). If the functions f(x) and g(x) in(3-112) are convex, then direct extensions to the branch and boundmethod can be made. A relaxed NLP can be solved at the root node,upper bounds to the solution of (3-112) can be found at the leaf nodes,and the bounding properties due to NLP solutions at intermediatenodes still hold. However, this approach is more expensive than the cor-responding MILP method. First, NLPs are more expensive than LPs tosolve. Second, unlike with relaxed LP solutions, NLP solutions at childnodes cannot be updated directly from solutions at parent nodes.Instead, the NLP needs to be solved again (but one hopes with a betterstarting guess). The NLP branch and bound method is used in the SBBcode interfaced to GAMS. In addition, Leyffer, [Comput. Optim. Appl.18: 295 (2001)] proposed a hybrid MINLP strategy nested within anSQP algorithm. At each iteration, a mixed integer quadratic program isformed, and a branch and bound algorithm is executed to solve it.

If f(x) and g(x) are nonconvex, additional difficulties can occur. Inthis case, nonunique, local solutions can be obtained at intermediatenodes, and consequently lower bounding properties would be lost. Inaddition, the nonconvexity in g(x) can lead to locally infeasible prob-lems at intermediate nodes, even if feasible solutions can be found inthe corresponding leaf node. To overcome problems with nonconvex-ities, global solutions to relaxed NLPs can be solved at the intermedi-ate nodes. This preserves the lower bounding information and allowsnonlinear branch and bound to inherit the convergence propertiesfrom the linear case. However, as noted above, this leads to muchmore expensive solution strategies.

Outer Approximation Decomposition Methods Again, weconsider the MINLP (3-112) with convex f(x) and g(x). Note that theNLP with binary variables fixed at y–

Min Z = f(x) + cTy–

subject to g(x) + By– ≤ b (3-113)

x ≥ 0

if feasible, leads to a solution that is an upper bound on the MINLP

3-68 MATHEMATICS

-1-Z=5

(0.5, 1, 0)

-7-Inf.

(0,1,0)

-6-Z=7.625

(0, 0.875, 1)

-5-Z=6.33

(0,1,0.67)

-node#-Z

(y1, y2, y3)

-4-Inf.

(1,0,1)

-3-Z=7

(1,1,0)

-2-Z=6.25

(1, 0.75, 0)

y1

y3y2

FIG. 3-61 Branch and bound sequence for MILP example.

Page 72: 03 mathematics

solution. In addition, linearizations of convex functions lead to under-estimation of the function itself:

(x) ≥ (xk) + (xk)T(x − xk) (3-114)

Consequently, linearization of (3-112) at a point xk, to form the problem

Min Z = f(xk) + (f(xk)T(x − xk) + cTy

subject to g(xk) + (g(xk)T(x−xk) + By ≤ b (3-115)

x ≥ 0, y ∈ 0, 1t

leads to overapproximation of the feasible region and underapproxi-mation of the objective function in (3-112). Consequently, solution of(3-115) is a lower bound to the solution of (3-112). Adding more lin-earizations from other points does not change the bounding property,so for a set of points xl, l = 1, . . ., k, the problem

Min Z =

subject to ≥ f(xl) + (f(xl)T(x − xl) + cTy l = 1, . . . , k (3-116)g(xl) + (g(xl)T(x − xl) + By ≤ bx ≥ 0, y ∈ 0,1t

where is a scalar variable, still has a solution that is a lower bound to(3-112). The outer approximation strategy is depicted in Fig. 3-62.

The outer approximation algorithm [Duran, M., and I. E. Gross-mann, Math. Programming 36: 307 (1986)] begins by initializing theproblem, either with a predetermined starting guess or by solving arelaxed NLP based on (3-112). An upper bound to the solution is thengenerated by fixing the binary variables to their current values yk andsolving the NLP (3-113). This solution determines the continuous vari-able values xk for the MILP (3-116). [If (3-113) is an infeasible prob-lem, any point may be chosen for xk, or the linearizations could beomitted.] Note that this MILP also contains linearizations from previ-ous solutions of (3-113). Finally, the integer cut )

i|yi yk

i | ≥ 1 is addedto (3-116) to avoid revisiting previously encountered values of binaryvariables. (In convex problems the integer cut is not needed, but ithelps to accelerate solution of the MILP.) Solution of (3-116) yieldsnew values of y and (without the integer cut) must lead to a lowerbound to the solution of (3-112). Consequently, if the objective func-tion of the lower bounding MILP is greater than the least upper bounddetermined in solutions of (3-113), then the algorithm terminates.Otherwise, the new values of y are used to solve the next NLP (3-113).

Compared to nonlinear branch and bound, the outer approximationalgorithm usually requires very few solutions of the MILP and NLP sub-problems. This is especially advantageous on problems where the NLPsare large and expensive to solve. Moreover, there are two variations ofouter approximation that may be suitable for particular problem types:

In Generalized Benders Decomposition (GBD) [see Sahinidis andGrossmann, Computers and Chem. Eng. 15: 481 (1991)], the lower

bounding problem in Fig. 3-62 is replaced by the following MILP:

Min

subject to ≥ f(xl) cTy ν lT[g(xl) By] l 1, . . . , k (3-117))i|yi yl

i| ≥ 1

where ν l is the vector of KKT multipliers from the solution of (3-113)at iteration l. This MILP can be derived through a reformulation ofthe MILP used in Fig. 3-62 with the inactive constraints from (3-113)dropped. Solution of (3-117) leads to a weaker lower bound than (3-116), and consequently, more solutions of the NLP and MILP sub-problems are needed to converge to the solution. However, (3-117)contains only a single continuous variable and far fewer inequalityconstraints and is much less expensive to solve than (3-116). Thus,GBD is favored over outer approximation if (3-113) is relatively inex-pensive to solve or solution of (3-116) is too expensive.

The extended cutting plane (ECP) algorithm [Westerlund and Pet-tersson, Computers and Chem. Engng. 19: S131 (1995)] is comple-mentary to GBD. While the lower bounding problem in Fig. 3-62remains essentially the same, the continuous variables xk are chosenfrom the MILP solution and the NLP (3-113) is replaced by a simpleevaluation of the objective and constraint functions. As a result, onlyMILP problems [(3-116) plus integer cuts] need be solved. Conse-quently, the ECP approach has weaker upper bounds than outerapproximation and requires more MILP solutions. It has advantagesover outer approximation when the NLP (3-113) is expensive tosolve.

Additional difficulties arise for the outer approximation algorithmand its GBD and ECP extensions when neither f(x) nor g(x) is convex.Under these circumstances, the lower bounding properties resultingfrom the linearization and formulation of the MILP subproblem arelost, and the MILP solution may actually exclude the solution of (3-112). Hence, these algorithms need to be applied with care tononconvex problems. To deal with nonconvexities, one can relax thelinearizations in (3-116) through the introduction of additional devi-ation variables that can be penalized in the objective function. Alter-nately, the linearizations in (3-116) can be replaced by validunderestimating functions, such as those derived for global optimiza-tion [e.g., (3-107)]. However, this requires knowledge of structuralinformation for (3-112) and may lead to weak lower bounds in theresulting MILP.

Finally, the performance of both MILP and MINLP algorithms isstrongly dependent on the problem formulations (3-110) and (3-112).In particular, the efficiency of the approach is impacted by the lowerbounds produced by the relaxation of the binary variables and subse-quent solution of the linear program in the branch and bound tree. Anumber of approaches have been proposed to improve the quality ofthe lower bounds, including these:• Logic-based methods such as generalized disjunctive programming

(GDP) can be used to formulate MINLPs with fewer discrete vari-ables that have tighter relaxations. Moreover, the imposition oflogic-based constraints prevents the generation of unsuitable alter-natives, leading to a less expensive search. In addition, constrainedlogic programming (CLP) methods offer an efficient search alter-native to MILP solvers for highly combinatorial problems. SeeRaman and Grossmann, Computers and Chem. Engng. 18(7): 563(1994) for more details.

• Convex hull formulations of MILPs and MINLPs lead to relaxedproblems that have much tighter lower bounds. This leads to theexamination of far fewer nodes in the branch and bound tree. SeeGrossmann and Lee, Comput. Optim. Applic. 26: 83 (2003) formore details.

• Reformulation and preprocessing strategies including bound tight-ening of the variables, coefficient reduction, lifting facets, and spe-cial ordered set constraints frequently lead to improved lowerbounds and significant performance improvements in mixed inte-ger programming algorithms. A number of efficient codes are available for the solution of

MINLPs. These include the AlphaECP, BARON, DICOPT, MINLP,and SBB solvers; descriptions of these can be found on thehttp://www.gamsworld.org/minlp/solvers.htm web site.

OPTIMIZATION 3-69

FIG. 3-62 Outer approximation MINLP algorithm.

LB ≥ UB

Initializex 0, y 0

Upper bound withy fixedNLP (3-113)

Lower bound MILP (3-116)+ integer cuts

Update y

Page 73: 03 mathematics

DEVELOPMENT OF OPTIMIZATION MODELS

The most important aspect to a successful optimization study is the for-mulation of the optimization model. These models must reflect the real-world problem so that meaningful optimization results are obtained;they also must satisfy the properties of the problem class. For instance,NLPs addressed by gradient-based methods need to have functions thatare defined in the variable domain and have bounded and continuousfirst and second derivatives. In mixed integer problems, proper formu-lations are also needed to yield good lower bounds for efficient search.With increased understanding of optimization methods and the devel-opment of efficient and reliable optimization codes, optimization prac-titioners now focus on the formulation of optimization models that arerealistic, well posed, and inexpensive to solve. Finally, convergenceproperties of NLP, MILP, and MINLP solvers require accurate first(and often second) derivatives from the optimization model. If thesecontain numerical errors (say, through finite difference approxima-tions), then the performance of these solvers can deteriorate consider-ably. As a result of these characteristics, modeling platforms areessential for the formulation task. These are classified into two broadareas: optimization modeling platforms and simulation platforms withoptimization.

Optimization modeling platforms provide general-purpose interfacesfor optimization algorithms and remove the need for the user to inter-face to the solver directly. These platforms allow the general formula-tion for all problem classes discussed above with direct interfaces tostate-of-the-art optimization codes. Three representative platforms arethe GAMS (General Algebraic Modeling Systems), AMPL (A Mathe-matical Programming Language), and AIMMS (Advanced IntegratedMultidimensional Modeling Software). All three require problemmodel input via a declarative modeling language and provide exact gra-dient and Hessian information through automatic differentiation strate-gies. Although it is possible, these platforms were not designed tohandle externally added procedural models. As a result, these platformsare best applied on optimization models that can be developed entirelywithin their modeling framework. Nevertheless, these platforms arewidely used for large-scale research and industrial applications. In addi-tion, the MATLAB platform allows the flexible formulation of optimiza-tion models as well, although it currently has only limited capabilitiesfor automatic differentiation and limited optimization solvers. Moreinformation on these and other modeling platforms can be foundat http://www-fp.mcs.anl.gov/otc/Guide/SoftwareGuide.

Simulation platforms with optimization are often dedicated,application-specific modeling tools to which optimization solvers havebeen interfaced. These lead to very useful optimization studies, butbecause they were not originally designed for optimization models,they need to be used with some caution. In particular, most of theseplatforms do not provide exact derivatives to the optimization solver;often they are approximated through finite differences. In addition,the models themselves are constructed and calculated throughnumerical procedures, instead of through an open declarative lan-guage. Examples of these include widely used process simulators suchas Aspen/Plus, PRO/II, and Hysys. Also note that more recent plat-forms such as Aspen Custom Modeler and gPROMS include declara-tive models and exact derivatives.

For optimization tools linked to procedural models, note that reli-able and efficient automatic differentiation tools are available that linkto models written, say, in FORTRAN and C, and calculate exact first(and often second) derivatives. Examples of these include ADIFOR,ADOL-C, GRESS, Odyssee, and PADRE. When used with care,these can be applied to existing procedural models and, when linkedto modern NLP and MINLP algorithms, can lead to powerful opti-mization capabilities. More information on these and other automaticdifferentiation tools can be found at http://www-unix.mcs.anl.gov/autodiff/AD_Tools/.

For optimization problems that are derived from (ordinary or par-tial) differential equation models, a number of advanced optimiza-tion strategies can be applied. Most of these problems are posed asNLPs, although recent work has also extended these models toMINLPs and global optimization formulations. For the optimizationof profiles in time and space, indirect methods can be applied basedon the optimality conditions of the infinite-dimensional problemusing, for instance, the calculus of variations. However, these meth-ods become difficult to apply if inequality constraints and discretedecisions are part of the optimization problem. Instead, currentmethods are based on NLP and MINLP formulations and can bedivided into two classes:• Simultaneous formulations are based on converting the differen-

tial equation model to algebraic constraints via discretization (say,with a Runge-Kutta method) and solving a large-scale NLP orMINLP. This approach requires efficient large-scale optimizationsolvers but also leads to very flexible and efficient problem for-mulations, particularly for optimal control and state-constrainedproblems. It also allows the treatment of unstable dynamics. Areview of these strategies can be found in Biegler et al. [Chem.Engng. Sci. 57(4): 575 (2002)] and Betts [Practical Methods forOptimal Control Using Nonlinear Programming, SIAM, Philadel-phia (2001)].

• Sequential formulations require the linkage of NLP or MINLPsolvers to embedded ODE or PDE solvers. This leads to smalleroptimization problems only in the decision (or control) variables.However, an important aspect to this formulation is the need torecover accurate gradient information from the differential equationsolver. Here, methods based on the solution of direct and adjointsensitivity equations have been derived and automated [see Cao etal., SIAM J. Sci. Comp. 24(3): 1076 (2003) for a description of thesesensitivity methods]. For the former, commercial tools such asAspen Custom Modeler and gPROMS have been developed. Finally, the availability of automatic differentiation and related sensi-

tivity tools for differential equation models allows for considerable flex-ibility in the formulation of optimization models. In van BloemenWaanders et al. (Sandia Technical Report SAND2002-3198, October2002), a seven-level modeling hierarchy is proposed that matches opti-mization algorithms with models that range from completely open (fullydeclarative model) to fully closed (entirely procedural without sensitivi-ties). At the lowest, fully procedural level, only derivative-free optimiza-tion methods are applied, while the highest, declarative level allows theapplication of an efficient large-scale solver that uses first and secondderivatives. This report notes that, based on the modeling level, opti-mization solver performance can vary by several orders of magnitude.

3-70 MATHEMATICS

STATISTICS

REFERENCES: Baird, D. C., Experimentation: An Introduction to MeasurementTheory and Experiment Design, 3d ed., Prentice-Hall, Englewood Cliffs, N.J.(1995); Box, G. P., J. S. Hunter, and W. G. Hunter, Statistics for Experimenters:Design, Innovation, and Discovery, 2d ed., Wiley, New York (2005); Cropley, J. B.,“Heuristic Approach to Complex Kinetics,” pp. 292–302 in Chemical ReactionEngineering—Houston, ACS Symposium Series 65, American Chemical Society,Washington (1978); Lipschutz, S., and J. J. Schiller, Jr., Schaum’s Outline of The-ory and Problems of Introduction to Probability and Statistics, McGraw-Hill,New York (1988); Moore, D. S., and G. P. McCabe, Introduction to the Practice ofStatistics, 4th ed., Freeman, San Francisco (2003); Montgomery, D. C., and G. C.Runger, Applied Statistics and Probability for Engineers, 3d ed., Wiley, New York

(2002); and Montgomery, D. C., G. C. Runger, and N. F. Hubele, EngineeringStatistics, 3d ed., Wiley, New York (2004).

INTRODUCTION

Statistics represents a body of knowledge which enables one to dealwith quantitative data reflecting any degree of uncertainty. There aresix basic aspects of applied statistics. These are:

1. Type of data2. Random variables

Page 74: 03 mathematics

3. Models4. Parameters5. Sample statistics6. Characterization of chance occurrences

From these can be developed strategies and procedures for dealingwith (1) estimation and (2) inferential statistics. The following has beendirected more toward inferential statistics because of its broader utility.

Detailed illustrations and examples are used throughout to developbasic statistical methodology for dealing with a broad area of applica-tions. However, in addition to this material, there are many special-ized topics as well as some very subtle areas which have not beendiscussed. The references should be used for more detailed informa-tion. Section 8 discusses the use of statistics in statistical process con-trol (SPC).

Type of Data In general, statistics deals with two types of data:counts and measurements. Counts represent the number of discreteoutcomes, such as the number of defective parts in a shipment, thenumber of lost-time accidents, and so forth. Measurement data aretreated as a continuum. For example, the tensile strength of a syn-thetic yarn theoretically could be measured to any degree of precision.A subtle aspect associated with count and measurement data is thatsome types of count data can be dealt with through the application oftechniques which have been developed for measurement data alone.This ability is due to the fact that some simplified measurement statis-tics serve as an excellent approximation for the more tedious countstatistics.

Random Variables Applied statistics deals with quantitative data.In tossing a fair coin the successive outcomes would tend to be differ-ent, with heads and tails occurring randomly over a period of time.Given a long strand of synthetic fiber, the tensile strength of successivesamples would tend to vary significantly from sample to sample.

Counts and measurements are characterized as random variables,that is, observations which are susceptible to chance. Virtually allquantitative data are susceptible to chance in one way or another.

Models Part of the foundation of statistics consists of the mathe-matical models which characterize an experiment. The models them-selves are mathematical ways of describing the probability, or relativelikelihood, of observing specified values of random variables. Forexample, in tossing a coin once, a random variable x could be definedby assigning to x the value 1 for a head and 0 for a tail. Given a faircoin, the probability of observing a head on a toss would be a .5, andsimilarly for a tail. Therefore, the mathematical model governing thisexperiment can be written as

x P(x)

0 .51 .5

where P(x) stands for what is called a probability function. This termis reserved for count data, in that probabilities can be defined for par-ticular outcomes.

The probability function that has been displayed is a very specialcase of the more general case, which is called the binomial probabilitydistribution.

For measurement data which are considered continuous, the termprobability density is used. For example, consider a spinner wheelwhich conceptually can be thought of as being marked off on the cir-cumference infinitely precisely from 0 up to, but not including, 1. Inspinning the wheel, the probability of the wheel’s stopping at a speci-fied marking point at any particular x value, where 0 ≤ x < 1, is zero,for example, stopping at the value x = .5. For the spinning wheel,the probability density function would be defined by f(x) = 1 for 0 ≤x < 1. Graphically, this is shown in Fig. 3-63. The relative-probabilityconcept refers to the fact that density reflects the relative likelihood of occurrence; in this case, each number between 0 and 1 is equallylikely. For measurement data, probability is defined by the area underthe curve between specified limits. A density function always musthave a total area of 1.

Example For the density of Fig. 3-63 the

P[0 ≤ x ≤ .4] = .4

P[.2 ≤ x ≤ .9] = .7P[.6 ≤ x < 1] = .4

and so forth. Since the probability associated with any particular point value iszero, it makes no difference whether the limit point is defined by a closed inter-val (≤ or ≥) or an open interval (< or >).

Many different types of models are used as the foundation for sta-tistical analysis. These models are also referred to as populations.

Parameters As a way of characterizing probability functions anddensities, certain types of quantities called parameters can be defined.For example, the center of gravity of the distribution is defined to bethe population mean, which is designated as µ. For the coin toss µ = .5, which corresponds to the average value of x; i.e., for half of thetime x will take on a value 0 and for the other half a value 1. The aver-age would be .5. For the spinning wheel, the average value would alsobe .5.

Another parameter is called the standard deviation, which is des-ignated as σ. The square of the standard deviation is used frequentlyand is called the popular variance, σ2. Basically, the standard devia-tion is a quantity which measures the spread or dispersion of the dis-tribution from its mean µ. If the spread is broad, then the standarddeviation will be larger than if it were more constrained.

For specified probability and density functions, the respectivemeans and variances are defined by the following:

Probability functions Probability density functions

E(x) = µ = x

x P(x) E(x) = µ = x

x f(x) dx

Var(x) = σ2 = x

(x − µ)2 P(x) Var(x) = σ2 = x

(x − µ)2 f(x) dx

where E(x) is defined to be the expected or average value of x.Sample Statistics Many types of sample statistics will be

defined. Two very special types are the sample mean, designated asx, and the sample standard deviation, designated as s. These are, bydefinition, random variables. Parameters like µ and σ are not randomvariables; they are fixed constants.

Example In an experiment, six random numbers (rounded to four deci-mal places) were observed from the uniform distribution f(x) = 1 for 0 ≤ x < 1:

.1009

.3754

.0842

.9901

.1280

.6606

The sample mean corresponds to the arithmetic average of the observations,which will be designated as x1 through x6, where

x = n

i = 1

xi with n = 6, x⎯ = 0.3899

The sample standard deviation s is defined by the computation

s = (xi − x)2

n − 1

1n

STATISTICS 3-71

FIG. 3-63 Density function.

Page 75: 03 mathematics

= (3-118)

In effect, this represents the root of a statistical average of the squares. The divi-sor quantity (n − 1) will be referred to as the degrees of freedom. The samplevalue of the standard deviation for the data given is .3686.

The value of n 1 is used in the denominator because the deviations fromthe sample average must total zero, or

)(xi x–) 0Thus knowing n − 1 values of xi x– permits calculation of the nth value ofxi x–.

The sample mean and sample standard deviation are obtained by usingMicrosoft Excel with the commands AVERAGE(B2:B7) and STDEV(B2:B7)when the observations are in cells B2 to B7.

In effect, the standard deviation quantifies the relative magnitude ofthe deviation numbers, i.e., a special type of “average” of the distanceof points from their center. In statistical theory, it turns out that the cor-responding variance quantities s2 have remarkable properties whichmake possible broad generalities for sample statistics and thereforealso their counterparts, the standard deviations.

For the corresponding population, the parameter values are µ = .50and σ = .2887, which are obtained by calculating the integrals definedabove with f(x) = 1 and 0 ≤ x ≤ 1. If, instead of using individual obser-vations only, averages of six were reported, then the correspondingpopulation parameter values would be µ = .50 and σx = σ/6 = .1179.The corresponding variance for an average will be written occasionallyas Var (x) = var (x)/n. In effect, the variance of an average is inverselyproportional to the sample size n, which reflects the fact that sampleaverages will tend to cluster about µ much more closely than individ-ual observations. This is illustrated in greater detail under “Measure-ment Data and Sampling Densities.”

Characterization of Chance Occurrences To deal with abroad area of statistical applications, it is necessary to characterize theway in which random variables will vary by chance alone. The basicfoundation for this characteristic is laid through a density called thegaussian, or normal, distribution.

Determining the area under the normal curve is a very tedious pro-cedure. However, by standardizing a random variable that is normallydistributed, it is possible to relate all normally distributed randomvariables to one table. The standardization is defined by the identity z = (x − µ)/σ, where z is called the unit normal. Further, it is possibleto standardize the sampling distribution of averages x by the identity z = (x − µ)/(σ/n).

A remarkable property of the normal distribution is that, almostregardless of the distribution of x, sample averages x will approach thegaussian distribution as n gets large. Even for relatively small values of n, of about 10, the approximation in most cases is quite close. Forexample, sample averages of size 10 from the uniform distribution willhave essentially a gaussian distribution.

Also, in many applications involving count data, the normal distri-bution can be used as a close approximation. In particular, the approx-imation is quite close for the binomial distribution within certainguidelines.

The normal probability distribution function can be obtained inMicrosoft Excel by using the NORMDIST function and supplyingthe desired mean and standard deviation. The cumulative value canalso be determined. In MATLAB, the corresponding command israndn.

ENUMERATION DATA ANDPROBABILITY DISTRIBUTIONS

Introduction Many types of statistical applications are charac-terized by enumeration data in the form of counts. Examples are thenumber of lost-time accidents in a plant, the number of defectiveitems in a sample, and the number of items in a sample that fall withinseveral specified categories.

The sampling distribution of count data can be characterized throughprobability distributions. In many cases, count data are appropriately

n xi2 − ( xi)2

n(n − 1)

interpreted through their corresponding distributions. However, inother situations analysis is greatly facilitated through distributionswhich have been developed for measurement data. Examples of eachwill be illustrated in the following subsections.

Binomial Probability DistributionNature Consider an experiment in which each outcome is classi-

fied into one of two categories, one of which will be defined as a suc-cess and the other as a failure. Given that the probability of success pis constant from trial to trial, then the probability of observing a spec-ified number of successes x in n trials is defined by the binomial dis-tribution. The sequence of outcomes is called a Bernoulli process,

Nomenclaturen = total number of trialsx = number of successes in n trialsp = probability of observing a success on any one trialp = x/n, the proportion of successes in n trialsProbability Law

P(x) = P = px(1 − p)n − x x = 0, 1, 2, . . . , n

where =Properties E(x) = np Var(x) = np(1 − p)

E( p) = p Var( p) = p(1 − p)/n

Geometric Probability DistributionNature Consider an experiment in which each outcome is classi-

fied into one of two categories, one of which will be defined as a suc-cess and the other as a failure. Given that the probability of success pis constant from trial to trial, then the probability of observing the firstsuccess on the xth trial is defined by the geometric distribution.

Nomenclaturep = probability of observing a success on any one trialx = the number of trials to obtain the first successProbability Law

P(x) = p(1 − p)x − 1 x = 1, 2, 3, . . .Properties

E(x) = 1/p Var (x) = (1 − p)/p2

Poisson Probability DistributionNature In monitoring a moving threadline, one criterion of qual-

ity would be the frequency of broken filaments. These can be identi-fied as they occur through the threadline by a broken-filamentdetector mounted adjacent to the threadline. In this context, the ran-dom occurrences of broken filaments can be modeled by the Poissondistribution. This is called a Poisson process and corresponds to aprobabilistic description of the frequency of defects or, in general,what are called arrivals at points on a continuous line or in time. Otherexamples include:

1. The number of cars (arrivals) that pass a point on a high-speedhighway between 10:00 and 11:00 A.M. on Wednesdays

2. The number of customers arriving at a bank between 10:00 and10:10 A.M.

3. The number of telephone calls received through a switchboardbetween 9:00 and 10:00 A.M.

4. The number of insurance claims that are filed each week5. The number of spinning machines that break down during 1 day

at a large plant.Nomenclaturex = total number of arrivals in a total length L or total period Ta = average rate of arrivals for a unit length or unit timeλ = aL = expected or average number of arrivals for the total

length Lλ = aT = expected or average number of arrivals for the total time TProbability Law Given that a is constant for the total length L or

period T, the probability of observing x arrivals in some period L or T

n!x!(n − x)!

nx

nx

xn

3-72 MATHEMATICS

Page 76: 03 mathematics

is given by

P(x) = e−λ x = 0, 1, 2, . . .

Properties E(x) = λ Var (x) = λ

Example The number of broken filaments in a threadline has been aver-aging .015 per yard. What is the probability of observing exactly two broken fil-aments in the next 100 yd? In this example, a = .015/yd and L = 100 yd; thereforeλ = (.015)(100) = 1.5:

P(x = 2) = e−1.5 = .2510

Example A commercial item is sold in a retail outlet as a unit product. Inthe past, sales have averaged 10 units per month with no seasonal variation. Theretail outlet must order replacement items 2 months in advance. If the outletstarts the next 2-month period with 25 items on hand, what is the probabilitythat it will stock out before the end of the second month?

Given a = 10/month, then λ = 10 × 2 = 20 for the total period of 2 months:

P(x ≥ 26) = ∞

26

P(x) = 1 − 25

0

P(x)

25

0

e−20 = e−20 1 + + + ⋅⋅⋅ + = .887815

Therefore P(x ≥ 26) = .112185 or roughly an 11 percent chance of a stockout.

Hypergeometric Probability DistributionNature In an experiment in which one samples from a relatively

small group of items, each of which is classified in one of two cate-gories, A or B, the hypergeometric distribution can be defined. Oneexample is the probability of drawing two red and two black cardsfrom a deck of cards. The hypergeometric distribution is the analog ofthe binomial distribution when successive trials are not independent,i.e., when the total group of items is not infinite. This happens whenthe drawn items are not replaced.

NomenclatureN = total group sizen = sample group sizeX = number of items in the total group with a specified

attribute AN − X = number of items in the total group with the other

attribute Bx = number of items in the sample with a specified attribute A

n − x = number of items in the sample with the other attribute B

Population Sample

Category A X xCategory B N − X n − x

Total N n

Probability Law

P(x) = E(x) =

var (x) = nP(1 − P)

Example What is the probability that an appointed special committee of4 has no female members when the members are randomly selected from a can-didate group of 10 males and 7 females?

P(x = 0) =

= .0882

174

70

104

N − nN − 1

nXN

Nn

Xx

N − Xn − x

2025

25!

202

2!

201

20x

x!

(1.5)2

2!

λx

x!

Example A bin contains 300 items, of which 240 are good and 60 aredefective. In a sample of 6 what is the probability of selecting 4 good and 2defective items by chance?

P(x) =

= .2478

Multinomial Distribution

Nature For an experiment in which successive outcomes can beclassified into two or more categories and the probabilities associatedwith the respective outcomes remain constant, then the experimentcan be characterized through the multinomial distribution.

Nomenclaturen = total number of trialsk = total number of distinct categoriespj = probability of observing category j on any one trial, j = 1,

2, . . . , kxj = total number of occurrences in category j in n trialsProbability Law

P(x1, x2, . . . , xk) = p1x1p2

x2 ⋅ ⋅ ⋅ pkxk

Example In tossing a die 12 times, what is the probability that each facevalue will occur exactly twice?

p(2, 2, 2, 2, 2, 2) = 2

2

2

2

2

2

= .003438

To compute these probabilities in Microsoft Excel, put the value of x in cellB2, say, and use the functions

Binomial distribution: = BINOMDIST(B2, n,p,0)Poisson distribution: = POISSON(B2,*,0)Hypergeometric distribution: = HYPGEOMDIST(B2,n,X,N)

To generate a table of values with these probability distributions, in MicrosoftExcel use the following functions:

Bernoulli random values: = CRITBINOM(1,p,RAND())Binomial random values: = CRITBINOM(n,p,RAND())

The factorial function is FACT(n) in Microsoft Excel and factorial(n) in MAT-LAB. Be sure that x is an integer.

MEASUREMENT DATA AND SAMPLING DENSITIES

Introduction The following example data are used throughoutthis subsection to illustrate concepts. Consider, for the purpose ofillustration, that five synthetic-yarn samples have been selected ran-domly from a production line and tested for tensile strength on eachof 20 production days. For this, assume that each group of five corre-sponds to a day, Monday through Friday, for a period of 4 weeks:

Monday Tuesday Wednesday Thursday Friday Groups of 251 2 3 4 5 pooled

36.48 38.06 35.28 36.34 36.7335.33 31.86 36.58 36.25 37.1735.92 33.81 38.81 30.46 33.0732.28 30.30 33.31 37.37 34.2731.61 35.27 33.88 37.52 36.94

x = 34.32 33.86 35.57 35.59 35.64 35.00s = 2.22 3.01 2.22 2.92 1.85 2.40

6 7 8 9 10

38.67 36.62 35.03 35.80 36.8232.08 33.05 36.22 33.16 36.4933.79 35.43 32.71 35.19 32.8332.85 36.63 32.52 32.91 32.4335.22 31.46 27.23 35.44 34.16

x = 34.52 34.64 32.74 34.50 34.54 34.19s = 2.60 2.30 3.46 1.36 2.03 2.35

16

16

16

16

16

16

12!2!2!2!2!2!2!

n!x1!x2! . . . xk!

3006

602

2404

STATISTICS 3-73

Page 77: 03 mathematics

11 12 13 14 15

39.63 34.52 36.05 36.64 31.5734.38 37.39 35.36 31.18 36.2136.51 34.16 35.00 36.13 33.8430.00 35.76 33.61 37.51 35.0139.64 37.63 36.98 39.05 34.95

x = 36.03 35.89 35.40 36.10 34.32 35.55s = 4.04 1.59 1.25 2.96 1.75 2.42

Monday Tuesday Wednesday Thursday Friday Groups of 2516 17 18 19 20 pooled

37.68 35.97 33.71 35.61 36.6536.38 35.92 32.34 37.13 37.9138.43 36.51 33.29 31.37 42.1839.07 33.89 32.81 35.89 39.2533.06 36.01 37.13 36.33 33.32

x = 36.92 35.66 33.86 35.27 37.86 35.91s = 2.38 1.02 1.90 2.25 3.27 2.52

Pooled sample of 100: x = 35.16 s = 2.47

Even if the process were at steady state, tensile strength, a keyproperty would still reflect some variation. Steady state, or stableoperation of any process, has associated with it a characteristic varia-tion. Superimposed on this is the testing method, which is itself aprocess with its own characteristic variation. The observed variation isa composite of these two variations.

Assume that the table represents “typical” production-line perfor-mance. The numbers themselves have been generated on a computerand represent random observations from a population with µ = 35 anda population standard deviation σ = 2.45. The sample values reflectthe way in which tensile strength can vary by chance alone. In prac-tice, a production supervisor unschooled in statistics but interested inhigh tensile performance would be despondent on the eighth day andexuberant on the twentieth day. If the supervisor were more con-cerned with uniformity, the lowest and highest points would havebeen on the eleventh and seventeenth days.

An objective of statistical analysis is to serve as a guide in decisionmaking in the context of normal variation. In the case of the produc-tion supervisor, it is to make a decision, with a high probability ofbeing correct, that something has in fact changed the operation.

Suppose that an engineering change has been made in the processand five new tensile samples have been tested with the results:

36.8138.34 x = 37.1434.87 s = 1.8539.5836.12

In this situation, management would inquire whether the product hasbeen improved by increased tensile strength. To answer this question,in addition to a variety of analogous questions, it is necessary to havesome type of scientific basis upon which to draw a conclusion.

A scientific basis for the evaluation and interpretation of data is con-tained in the accompanying table descriptions. These tables charac-terize the way in which sample values will vary by chance alone in thecontext of individual observations, averages, and variances.

DesignatedTable number symbol Variable Sampling distribution of

3-5 z Observations*

3-5 z Averages

3-6 t Averages when σ is unknown*

3-7 χ2 (s2/σ2)(df) Variances*

3-8 F s12 /s 2

2 Ratio of two independentsample variances*

*When sampling from a gaussian distribution.

x − µs/n

x − µσ /n

x − µ

σ

Normal Distribution of Observations Many types of data fol-low what is called the gaussian, or bell-shaped, curve; this is especiallytrue of averages. Basically, the gaussian curve is a purely mathematicalfunction which has very special properties. However, owing to somemathematically intractable aspects primary use of the function is re-stricted to tabulated values.

Basically, the tabled values represent area (proportions or probability)associated with a scaling variable designated by Z in Fig. 3-64. The nor-mal curve is centered at 0, and for particular values of Z, designated as z,the tabulated numbers represent the corresponding area under thecurve between 0 and z. For example, between 0 and 1 the area is .3413.(Get this number from Table 3-5. The value of A includes the area onboth sides of zero. Thus we want A/2. For z = 1, A = 0.6827, A/2 = 0.3413.For z = 2, A/2 = 0.4772.) The area between 0 and 2 is .4772; therefore,the area between 1 and 2 is .4772 − .3413 = .1359.

Also, since the normal curve is symmetric, areas to the left can bedetermined in exactly the same way. For example, the area between −2 and +1 would include the area between −2 and 0, .4772 (the sameas 0 to 2), plus the area between 0 and 1, .3413, or a total area of .8185.

Any types of observation which are applicable to the normal curvecan be transformed to Z values by the relationship z = (x − µ)/σ and,conversely, Z values to x values by x = µ + σz, as shown in Fig. 3-64.For example, for tensile strength, with µ = 35 and σ = 2.45, this woulddictate z = (x − 35)/2.45 and x = 35 + 2.45z.

Example What proportion of tensile values will fall between 34 and 36?

z1 = (34 − 35)/2.45 = −.41 z2 = (36 − 35)/2.45 = .41

P[−.41 ≤ z ≤ .41] = .3182, or roughly 32 percent

The value 0.3182 is interpolated from Table 3-4 using z = 0.40, A = 0.3108, andz = 0.45, A = 0.3473.

Example What midrange of tensile values will include 95 percent of thesample values? Since P[−1.96 ≤ z ≤ 1.96] = .95, the corresponding values of x are

x1 = 35 − 1.96(2.45) = 30.2

x2 = 35 + 1.96(2.45) = 39.8

or P[30.2 ≤ x ≤ 39.8] = .95

Normal Distribution of Averages An examination of the tensile-strength data previously tabulated would show that the range (largestminus the smallest) of tensile strength within days averages 5.12. Theaverage range in x values within each week is 2.37, while the range in thefour weekly averages is 1.72. This reflects the fact that averages tend tobe less variable in a predictable way. Given that the variance of x is var(x) = σ2, then the variance of x based on n observations is var (x) = σ2/n.

For averages of n observations, the corresponding relationship forthe Z-scale relationship is

z = (x − µ) /σ/n or x = µ + z

The Microsoft Excel function NORMDIST(X, , , 1) gives the prob-ability that x ≤ µ.

The command CONFIDENCE(, , n) gives the confidenceinterval about the mean for a sample size n. To obtain 95 percent con-fidence limits, use = .025; 2 = 1 – A.

σn

3-74 MATHEMATICS

FIG. 3-64 Transformation of z values.

Page 78: 03 mathematics

Example What proportion of daily tensile averages will fall between 34and 36? Using Table 3-5

z1 = (34 − 35)/(2.45/5) = −.91 z2 = (36 − 35)/(2.45/5) = .91

P[−.91 ≤ z ≤ .91] = .637, or roughly 64 percent

Using Microsoft Excel, the more precise answer is

P[34 ≤ x ≤ 36] 2 NORMDIST(36, 35, 2.455, 1) 1

1 2 NORMDIST(34, 35, 2.455,1) .6386

Example What midrange of daily tensile averages will include 95 percentof the sample values? Using Table 3-5

x1 = 35 − 1.96(2.45/5) = 32.85

x2 = 35 + 1.96(2.45/5) = 37.15

P[32.85 ≤ x ≤ 37.15] = .95

or using Microsoft Excel,

35 ± CONFIDENCE(.05, 2.45, 5) = 35 ± 2.15

t Distribution of Averages The normal curve relies on a knowl-edge of σ, or in special cases, when it is unknown, s can be used withthe normal curve as an approximation when n > 30. For example,with n > 30 the intervals x ± s and x ± 2s will include roughly 68 and95 percent of the sample values respectively when the distribution isnormal.

In applications sample sizes are usually small and σ unknown. Inthese cases, the t distribution can be used where

t = (x − µ)/(s/n) or x = µ + ts/n

The t distribution is also symmetric and centered at zero. It is said tobe robust in the sense that even when the individual observations xare not normally distributed, sample averages of x have distributionswhich tend toward normality as n gets large. Even for small n of 5

through 10, the approximation is usually relatively accurate. It issometimes called the Student’s t distribution.

In reference to the tensile-strength table, consider the summary sta-tistics x and s by days. For each day, the t statistic could be computed.If this were repeated over an extensive simulation and the resultant t quantities plotted in a frequency distribution, they would match thecorresponding distribution of t values summarized in Table 3-6.

Since the t distribution relies on the sample standard deviation s,the resultant distribution will differ according to the sample size n. Todesignate this difference, the respective distributions are classifiedaccording to what are called the degrees of freedom and abbreviatedas df. In simple problems, the df are just the sample size minus 1. Inmore complicated applications the df can be different. In general,degrees of freedom are the number of quantities minus the number ofconstraints. For example, four numbers in a square which must haverow and column sums equal to zero have only one df, i.e., four num-bers minus three constraints (the fourth constraint is redundant). TheMicrosoft Excel function TDIST(X, df,1) gives the right-tail probabil-ity, and TDIST(X, df, 2) gives twice that. The probability that t ≤ X is1 – TDIST(X, df, 1) when X ≥ 0 and TDIST(abs(X),df,1) when X < 0.The probability that –X ≤ t ≤ + X is 1–TDIST(X, df, 2).

Example For a sample size n = 5, what values of t define a midarea of 90percent? For 4 df the tabled value of t corresponding to a midarea of 90 percentis 2.132; i.e., P[−2.132 ≤ t ≤ 2.132] = .90.

Using Microsoft Excel, TINV(.1, 4) = 2.132.

Example For a sample size n = 25, what values of t define a midarea of 95percent? For 24 df the tabled value of t corresponding to a midarea of 95 per-cent is 2.064; i.e., P[−2.064 ≤ t ≤ 2.064] = .95.

Example Also, 1 – TDIST(2.064, 24, 2) = .9500.

Using Microsoft Excel, TINV(.05, 24) = 2.064.

STATISTICS 3-75

TABLE 3-5 Ordinates and Areas between Abscissa Values -z and +z of the Normal Distribution Curve

A = area between µ − σz and µ + σz

z X Y A 1 − A z X Y A 1 − A

0 µ 0.399 0.0000 1.0000 1.50 µ 1.50σ 0.1295 0.8664 0.1336 .10 µ .10σ .397 .0797 .9203 1.60 µ 1.60σ .1109 .8904 .1096 .20 µ .20σ .391 .1585 .8415 1.70 µ 1.70σ .0940 .9109 .0891

.30 µ .30σ .381 .2358 .7642 1.80 µ 1.80σ .0790 .9281 .0719 .40 µ .40σ .368 .3108 .6892 1.90 µ 1.90σ .0656 .9446 .0574

.50 µ .50σ .352 .3829 .6171 2.00 µ 2.00σ .0540 .9545 .0455 .60 µ .60σ .333 .4515 .5485 2.10 µ 2.10σ .0440 .9643 .0357 .70 µ .70σ .312 .5161 .4839 2.20 µ 2.20σ .0335 .9722 .0278

.80 µ .80σ .290 .5763 .4237 2.30 µ 2.30σ .0283 .9786 .0214 .90 µ .90σ .266 .6319 .3681 2.40 µ 2.40σ .0224 .9836 .0164

1.00 µ 1.00σ .242 .6827 .3173 2.50 µ 2.50σ .0175 .9876 .01241.10 µ 1.10σ .218 .7287 .2713 2.60 µ 2.60σ .0136 .9907 .00931.20 µ 1.20σ .194 .7699 .2301 2.70 µ 2.70σ .0104 .9931 .0069

1.30 µ 1.30σ .171 .8064 .1936 2.80 µ 2.80σ .0079 .9949 .00511.40 µ 1.40σ .150 .8385 .1615 2.90 µ 2.90σ .0060 .9963 .0037

1.50 µ 1.50σ .130 .8664 .1336 3.00 µ 3.00σ .0044 .9973 .00274.00 µ 4.00σ .0001 .99994 .000065.00 µ 5.00s .000001 .9999994 .0000006

0.000 µ 0.3989 .0000 1.0000 1.036 µ 1.036σ 0.2331 0.7000 0.3000 .126 µ 0.126σ .3958 .1000 0.9000 1.282 µ 1.282σ .1755 .8000 .2000 .253 µ .253σ .3863 .2000 .8000 1.645 µ 1.645σ .1031 .9000 .1000 .385 µ .385σ .3704 .3000 .7000 1.960 µ 1.960σ .0584 .9500 .0500 .524 µ .524σ .3477 .4000 .6000 2.576 µ 2.576σ .0145 .9900 .0100 .674 µ .674σ .3178 .5000 .5000 3.291 µ 3.291σ .0018 .9990 .0010 .842 µ .842σ .2800 .6000 .4000 3.891 µ 3.891σ .0002 .9999 .0001

This table is obtained in Microsoft Excel with the function Y NORMDIST(X, , , 0).If X > , A = 2 NORMDIST(X, , , 1) – 1. If X < , A = 1 – 2 NORMDIST(X, , , 1)

Page 79: 03 mathematics

Example What is the sample value of t for the first day of tensile data?

Sample t = (34.32 − 35)/(2.22/5) = −.68

Note that on the average 90 percent of all such sample values would be expectedto fall within the interval 2.132.

t Distribution for the Difference in Two Sample Means withEqual Variances The t distribution can be readily extended to thedifference in two sample means when the respective populations havethe same variance σ:

t = (3-119)

where sp2 is a pooled variance defined by

sp2 = (3-120)

In this application, the t distribution has (n1 + n2 − 2) df.t Distribution for the Difference in Two Sample Means with

Unequal Variances When population variances are unequal, anapproximate t quantity can be used:

t =

with a = s12 /n1 b = s2

2 /n2

(x1 − x2) − (µ1 − µ2)

a+ b

(n1 − 1)s12 + (n2 − 1)s2

2

(n1 − 1) + (n2 − 1)

(x1 − x2) − (µ1 − µ2)

sp1/n1+ 1/n2

and df =

Chi-Square Distribution For some industrial applications, prod-uct uniformity is of primary importance. The sample standard deviations is most often used to characterize uniformity. In dealing with thisproblem, the chi-square distribution can be used where χ2 = (s2/σ2) (df).The chi-square distribution is a family of distributions which aredefined by the degrees of freedom associated with the sample variance.For most applications, df is equal to the sample size minus 1.

The probability distribution function is

p(y) = y0ydf − 2 exp where y0 is chosen such that the integral of p(y) over all y is one.

In terms of the tensile-strength table previously given, the respec-tive chi-square sample values for the daily, weekly, and monthly fig-ures could be computed. The corresponding df would be 4, 24, and 99respectively. These numbers would represent sample values from therespective distributions which are summarized in Table 3-7.

In a manner similar to the use of the t distribution, chi square canbe interpreted in a direct probabilistic sense corresponding to amidarea of (1 − α):

P[χ 12 ≤ (s2/σ2)(df) ≤ χ 2

2 ] = 1 − αwhere χ 1

2 corresponds to a lower-tail area of α/2 and χ 22 an upper-tail

area of α/2.The basic underlying assumption for the mathematical derivation of

chi square is that a random sample was selected from a normal distri-bution with variance σ2. When the population is not normal butskewed, chi-square probabilities could be substantially in error.

Example On the basis of a sample size n = 5, what midrange of values willinclude the sample ratio s/σ with a probability of 95 percent?

Use Table 3-7 for 4 df and read χ 12 = 0.484 for a lower tail area of 0.05/2, 2.5

percent, and read χ 22 = 11.1 for an upper tail area of 97.5 percent.

P[.484 ≤ (s2/σ2)(4) ≤ 11.1] = .95

or P[.35 ≤ s/σ ≤ 1.66] = .95

The Microsoft Excel functions CHIINV(.025, 4) and CHIINV(.975, 4) givethe same values.

Example On the basis of a sample size n = 25, what midrange of valueswill include the sample ratio s/σ with a probability of 95 percent?

P[12.4 ≤ (s2/σ2)(24) ≤ 39.4] = .95

or P[.72 ≤ s/σ ≤ 1.28] = .95

This states that the sample standard deviation will be at least 72 percent and notmore than 128 percent of the population variance 95 percent of the time. Con-versely, 10 percent of the time the standard deviation will underestimate oroverestimate the population standard deviation by the corresponding amount.Even for samples as large as 25, the relative reliability of a sample standard devi-ation is poor.

The chi-square distribution can be applied to other types of appli-cation which are of an entirely different nature. These include appli-cations which are discussed under “Goodness-of-Fit Test” and“Two-Way Test for Independence of Count Data.” In these applica-tions, the mathematical formulation and context are entirely different,but they do result in the same table of values.

F Distribution In reference to the tensile-strength table, thesuccessive pairs of daily standard deviations could be ratioed andsquared. These ratios of variance would represent a sample from a dis-tribution called the F distribution or F ratio. In general, the F ratio isdefined by the identity

F(γ1, γ 2) = s12 /s2

2

where γ1 and γ2 correspond to the respective df’s for the sample vari-ances. In statistical applications, it turns out that the primary area ofinterest is found when the ratios are greater than 1. For this reason,most tabled values are defined for an upper-tail area. However,

−(df )2

2

(a + b)2

a2/(n1 − 1) + b2/(n2 − 1)

3-76 MATHEMATICS

TABLE 3-6 Values of t

df t.40 t.30 t.20 t.10 t.05 t.025 t.01 t.005

1 0.325 0.727 1.376 3.078 6.314 12.706 31.821 63.6572 .289 .617 1.061 1.886 2.920 4.303 6.965 9.9253 .277 .584 0.978 1.638 2.353 3.182 4.541 5.8414 .271 .569 .941 1.533 2.132 2.776 3.747 4.6045 .267 .559 .920 1.476 2.015 2.571 3.365 4.032

6 .265 .553 .906 1.440 1.943 2.447 3.143 3.7077 .263 .549 .896 1.415 1.895 2.365 2.998 3.4998 .262 .546 .889 1.397 1.860 2.306 2.896 3.3559 .261 .543 .883 1.383 1.833 2.262 2.821 3.250

10 .260 .542 .879 1.372 1.812 2.228 2.764 3.169

11 .260 .540 .876 1.363 1.796 2.201 2.718 3.10612 .259 .539 .873 1.356 1.782 2.179 2.681 3.05513 .259 .538 .870 1.350 1.771 2.160 2.650 3.01214 .258 .537 .868 1.345 1.761 2.145 2.624 2.97715 .258 .536 .866 1.341 1.753 2.131 2.602 2.947

16 .258 .535 .865 1.337 1.746 2.120 2.583 2.92117 .257 .534 .863 1.333 1.740 2.110 2.567 2.89818 .257 .534 .862 1.330 1.734 2.101 2.552 2.87819 .257 .533 .861 1.328 1.729 2.093 2.539 2.86120 .257 .533 .860 1.325 1.725 2.086 2.528 2.845

21 .257 .532 .859 1.323 1.721 2.080 2.518 2.83122 .256 .532 .858 1.321 1.717 2.074 2.508 2.81923 .256 .532 .858 1.319 1.714 2.069 2.500 2.80724 .256 .531 .857 1.318 1.711 2.064 2.492 2.79725 .256 .531 .856 1.316 1.708 2.060 2.485 2.787

26 .256 .531 .856 1.315 1.706 2.056 2.479 2.77927 .256 .531 .855 1.314 1.703 2.052 2.473 2.77128 .256 .530 .855 1.313 1.701 2.048 2.467 2.76329 .256 .530 .854 1.311 1.699 2.045 2.462 2.75630 .256 .530 .854 1.310 1.697 2.042 2.457 2.750

40 .255 .529 .851 1.303 1.684 2.021 2.423 2.70460 .254 .527 .848 1.296 1.671 2.000 2.390 2.660

120 .254 .526 .845 1.289 1.658 1.980 2.358 2.617∞ .253 .524 .842 1.282 1.645 1.960 2.326 2.576

Above values refer to a single tail outside the indicated limit of t. For exam-ple, for 95 percent of the area to be between −t and +t in a two-tailed t distrib-ution, use the values for t0.025 or 2.5 percent for each tail. This table is obtainedin Mircrosoft Excel using the function TINV(,df), where α is .05.

Page 80: 03 mathematics

defining F2 to be that value corresponding to an upper-tail area ofα/2, then F1 for a lower-tail area of α/2 can be determined throughthe identity

F1(γ1, γ 2) = 1/F2(γ 2, γ1)The F distribution, similar to the chi square, is sensitive to the basic

assumption that sample values were selected randomly from a normaldistribution. The Microsoft Excel function FDIST(X, df1, df2) givesthe upper percent points of Table 3-8, where X is the tabular value.The function FINV(Percent, df1, df2) gives the table value.

Example For two sample variances with 4 df each, what limits will brackettheir ratio with a midarea probability of 90 percent?

Use Table 3-8 with 4 df in the numerator and denominator and upper 5 per-cent points (to get both sides totaling 10 percent). The entry is 6.39. Thus:

P[1/6.39 ≤ s12 /s2

2 ≤ 6.39] = .90

or P[.40 ≤ s1 /s2 ≤ 2.53] = .90

The Microsoft Excel function FINV(.05, 4, 4) gives 6.39, too.

Confidence Interval for a Mean For the daily sample tensile-strength data with 4 df it is known that P[−2.132 ≤ t ≤ 2.132] = .90.This states that 90 percent of all samples will have sample t valueswhich fall within the specified limits. In fact, for the 20 daily samplesexactly 16 do fall within the specified limits (note that the binomialwith n = 20 and p = .90 would describe the likelihood of exactly nonethrough 20 falling within the prescribed limits—the sample of 20 isonly a sample).

Consider the new daily sample (with n = 5, x = 37.14, and s = 1.85)which was observed after a process change. In this case, the sameprobability holds. However, in this instance the sample value of tcannot be computed, since the new µ, under the process change, isnot known. Therefore P[−2.132 ≤ (37.14 − µ)/(1.85/5) ≤ 2.132] =.90. In effect, this identity limits the magnitude of possible values for

µ. The magnitude of µ can be only large enough to retain the t quan-tity above −2.132 and small enough to retain the t quantity below+2.132. This can be found by rearranging the quantities within thebracket; i.e., P[35.38 ≤ µ ≤ 38.90] = .90. This states that we are 90percent sure that the interval from 35.38 to 38.90 includes theunknown parameter µ.

In general,

Px − t ≤ µ ≤ x + t = 1 − α

where t is defined for an upper-tail area of α/2 with (n − 1) df. In thisapplication, the interval limits (x + t s/n) are random variables whichwill cover the unknown parameter µ with probability (1 − α). The con-verse, that we are 100(1 − α) percent sure that the parameter value iswithin the interval, is not correct. This statement defines a probabilityfor the parameter rather than the probability for the interval.

Example What values of t define the midarea of 95 percent for weeklysamples of size 25, and what is the sample value of t for the second week?

P[−2.064 ≤ t ≤ 2.064] = .95

and (34.19 − 35)/(2.35/25) = 1.72.

Example For the composite sample of 100 tensile strengths, what is the90 percent confidence interval for µ?

Use Table 3-6 for t.05 with df ≈ ∞.

P 35.16 − 1.645 < µ < 35.16 + 1.645 = .90

or P[34.75 ≤ µ ≤ 35.57] = .90

Confidence Interval for the Difference in Two PopulationMeans The confidence interval for a mean can be extended to

2.47100

2.47100

sn

sn

STATISTICS 3-77

TABLE 3-7 Percentiles of the c2 Distribution

Percent

df 0.5 1 2.5 5 10 90 95 97.5 99 99.5

1 0.000039 0.00016 0.00098 0.0039 0.0158 2.71 3.84 5.02 6.63 7.882 .0100 .0201 .0506 .1026 .2107 4.61 5.99 7.38 9.21 10.603 .0717 .115 .216 .352 .584 6.25 7.81 9.35 11.34 12.844 .207 .297 .484 .711 1.064 7.78 9.49 11.14 13.28 14.865 .412 .554 .831 1.15 1.61 9.24 11.07 12.83 15.09 16.75

6 .676 .872 1.24 1.64 2.20 10.64 12.59 14.45 16.81 18.557 .989 1.24 1.69 2.17 2.83 12.02 14.07 16.01 18.48 20.288 1.34 1.65 2.18 2.73 3.49 13.36 15.51 17.53 20.09 21.969 1.73 2.09 2.70 3.33 4.17 14.68 16.92 19.02 21.67 23.59

10 2.16 2.56 3.25 3.94 4.87 15.99 18.31 20.48 23.21 25.19

11 2.60 3.05 3.82 4.57 5.58 17.28 19.68 21.92 24.73 26.7612 3.07 3.57 4.40 5.23 6.30 18.55 21.03 23.34 26.22 28.3013 3.57 4.11 5.01 5.89 7.04 19.81 22.36 24.74 27.69 29.8214 4.07 4.66 5.63 6.57 7.79 21.06 23.68 26.12 29.14 31.3215 4.60 5.23 6.26 7.26 8.55 22.31 25.00 27.49 30.58 32.80

16 5.14 5.81 6.91 7.96 9.31 23.54 26.30 28.85 32.00 34.2718 6.26 7.01 8.23 9.39 10.86 25.99 28.87 31.53 34.81 37.1620 7.43 8.26 9.59 10.85 12.44 28.41 31.41 34.17 37.57 40.0024 9.89 10.86 12.40 13.85 15.66 33.20 36.42 39.36 42.98 45.5630 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67

40 20.71 22.16 24.43 26.51 29.05 51.81 55.76 59.34 63.69 66.7760 35.53 37.48 40.48 43.19 46.46 74.40 79.08 83.30 88.38 91.95

120 83.85 86.92 91.58 95.70 100.62 140.23 146.57 152.21 158.95 163.64

For large values of degrees of freedom the approximate formula

χ a2 = n 1 − + za

3

where za is the normal deviate and n is the number of degrees of freedom, may be used. For example, χ.299 = 60[1 − 0.00370 + 2.326(0.06086)]3 = 60(1.1379)3 = 88.4

for the 99th percentile for 60 degrees of freedom. The Microsoft Excel function CHIDIST(X, df), where X is the table value, gives 1 – Percent. The function CHIINV(1– Percent, df) gives the table value.

29n

29n

Page 81: 03 mathematics

include the difference between two population means. This interval isbased on the assumption that the respective populations have thesame variance σ2:

(x1 − x2) − tsp1/n1+ 1/n2 ≤ µ1 − µ2 ≤ (x1 − x2) + tsp1/n1+ 1/n2

Example Compute the 95 percent confidence interval based on the orig-inal 100-point sample and the subsequent 5-point sample:

sp2 = = 5.997

or sp = 2.45

With 103 df and α = .05, t = 1.96 using t.025 in Table 3-6. Therefore

(35.16 − 37.14) 1.96(2.45) 1/100 + 1/5 = −1.98 2.20

or −4.18 ≤ (µ1 − µ2) ≤ .22

Note that if the respective samples had been based on 52 observations eachrather than 100 and 5, the uncertainty factor would have been .94 rather thanthe observed 2.20. The interval width tends to be minimum when n1 = n2.

Confidence Interval for a Variance The chi-square distribu-tion can be used to derive a confidence interval for a population vari-ance σ2 when the parent population is normally distributed. For a100(1 − α) percent confidence interval

≤ σ2 ≤

where χ12 corresponds to a lower-tail area of α/2 and χ2

2 to an upper-tailarea of α/2.

(df)s2

χ1

2

(df)s2

χ2

2

99(2.47)2 + 4(1.85)2

103

Example For the first week of tensile-strength samples compute the 90percent confidence interval for σ2 (df = 24, corresponding to n = 25, using 5 per-cent and 95 percent in Table 3-7):

≤ σ2 ≤

3.80 ≤ σ2 ≤ 10.02

or 1.95 ≤ σ ≤ 3.17

TESTS OF HYPOTHESIS

General Nature of Tests The general nature of tests can beillustrated with a simple example. In a court of law, when a defendantis charged with a crime, the judge instructs the jury initially to pre-sume that the defendant is innocent of the crime. The jurors are thenpresented with evidence and counterargument as to the defendant’sguilt or innocence. If the evidence suggests beyond a reasonabledoubt that the defendant did, in fact, commit the crime, they havebeen instructed to find the defendant guilty; otherwise, not guilty. Theburden of proof is on the prosecution.

Jury trials represent a form of decision making. In statistics, an anal-ogous procedure for making decisions falls into an area of statisticalinference called hypothesis testing.

Suppose that a company has been using a certain supplier of rawmaterials in one of its chemical processes. A new supplier approaches thecompany and states that its material, at the same cost, will increase theprocess yield. If the new supplier has a good reputation, the companymight be willing to run a limited test. On the basis of the test results itwould then make a decision to change suppliers or not. Good manage-ment would dictate that an improvement must be demonstrated

24(2.40)2

13.8

24(2.40)2

36.4

3-78 MATHEMATICS

TABLE 3-8 F Distribution

Upper 5% Points (F.95)

Degrees of freedom for numerator

1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120 ∞

1 161 200 216 225 230 234 237 239 241 242 244 246 248 249 250 251 252 253 2542 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.4 19.4 19.4 19.4 19.4 19.5 19.5 19.5 19.5 19.5 19.53 10.1 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.534 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.635 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.37

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.677 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.238 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.939 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.7110 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54

11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.4012 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.3013 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.2114 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.1315 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.07

16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.0117 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.9618 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.9219 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.8820 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.84

21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.8122 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.7823 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.7624 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.7325 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.71

30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.09 2.01 1.93 1.89 1.84 1.79 1.74 1.68 1.6240 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 2.00 1.92 1.84 1.79 1.74 1.69 1.64 1.58 1.5160 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.39120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91 1.83 1.75 1.66 1.61 1.55 1.50 1.43 1.35 1.25∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83 1.75 1.67 1.57 1.52 1.46 1.39 1.32 1.22 1.00

Interpolation should be performed using reciprocals of the degrees of freedom.

Deg

rees

of f

reed

om fo

r de

nom

inat

or

Page 82: 03 mathematics

(beyond a reasonable doubt) for the new material. That is, the burden ofproof is tied to the new material. In setting up a test of hypothesis for thisapplication, the initial assumption would be defined as a null hypothesisand symbolized as H0. The null hypothesis would state that yield for thenew material is no greater than for the conventional material. The sym-bol µ0 would be used to designate the known current level of yield for thestandard material and µ for the unknown population yield for the newmaterial. Thus, the null hypothesis can be symbolized as H0: µ ≤ µ0.

The alternative to H0 is called the alternative hypothesis and is sym-bolized as H1: µ > µ0.

Given a series of tests with the new material, the average yield xwould be compared with µ0. If x < µ0, the new supplier would be dis-missed. If x > µ0, the question would be: Is it sufficiently greater in thelight of its corresponding reliability, i.e., beyond a reasonable doubt?If the confidence interval for µ included µ0, the answer would be no,but if it did not include µ0, the answer would be yes. In this simpleapplication, the formal test of hypothesis would result in the sameconclusion as that derived from the confidence interval. However, theutility of tests of hypothesis lies in their generality, whereas confidenceintervals are restricted to a few special cases.

Test of Hypothesis for a Mean ProcedureNomenclatureµ = mean of the population from which the sample has been

drawnσ = standard deviation of the population from which the sample

has been drawnµ0 = base or reference levelH0 = null hypothesisH1 = alternative hypothesisα = significance level, usually set at .10, .05, or .01t = tabled t value corresponding to the significance level α. For a

two-tailed test, each corresponding tail would have an area ofα/2, and for a one-tailed test, one tail area would be equal toα. If σ2 is known, then z would be used rather than the t.

t = (x − µ0)/(s/n) = sample value of the test statistic.Assumptions1. The n observations x1, x2, . . . , xn have been selected randomly.2. The population from which the observations were obtained is

normally distributed with an unknown mean µ and standard deviationσ. In actual practice, this is a robust test, in the sense that in mosttypes of problems it is not sensitive to the normality assumption whenthe sample size is 10 or greater.

Test of Hypothesis1. Under the null hypothesis, it is assumed that the sample came

from a population whose mean µ is equivalent to some base or refer-ence designated by µ0. This can take one of three forms:

Form 1 Form 2 Form 3

H0: µ = µ0 H0: µ ≤ µ0 H0: µ ≥ µ0

H1: µ ≠ µ0 H1: µ > µ0 H1: µ < µ0

Two-tailed test Upper-tailed test Lower-tailed test

2. If the null hypothesis is assumed to be true, say, in the case of atwo-sided test, form 1, then the distribution of the test statistic t isknown. Given a random sample, one can predict how far its samplevalue of t might be expected to deviate from zero (the midvalue of t)by chance alone. If the sample value of t does, in fact, deviate too farfrom zero, then this is defined to be sufficient evidence to refute theassumption of the null hypothesis. It is consequently rejected, and theconverse or alternative hypothesis is accepted.

3. The rule for accepting H0 is specified by selection of the α levelas indicated in Fig. 3-65. For forms 2 and 3 the α area is defined to bein the upper or the lower tail respectively. The parameter α is theprobability of rejecting the null hypothesis when it is actually true.

4. The decision rules for each of the three forms are defined as fol-lows: If the sample t falls within the acceptance region, accept H0 forlack of contrary evidence. If the sample t falls in the critical region,reject H0 at a significance level of 100α percent.

ExampleApplication. In the past, the yield for a chemical process has been estab-

lished at 89.6 percent with a standard deviation of 3.4 percent. A new supplierof raw materials will be used and tested for 7 days.

Procedure1. The standard of reference is µ0 = 89.6 with a known σ = 3.4.2. It is of interest to demonstrate whether an increase in yield is achieved

with the new material; H0 says it has not; therefore,H0: µ ≤ 89.6 H1: µ > 89.6

3. Select α = .05, and since σ is known (the new material would not affect theday-to-day variability in yield), the test statistic would be z with a correspondingcritical value cv(z) = 1.645 (Table 3-6, df = ∞).

4. The decision rule:

Accept H0 if sample z < 1.645Reject H0 if sample z > 1.645

5. A 7-day test was carried out, and daily yields averaged 91.6 percent with asample standard deviation s = 3.6 (this is not needed for the test of hypothesis).

6. For the data sample z = (91.6 − 89.6)/(3.4/7) = 1.56.7. Since the sample z < cv(z), accept the null hypothesis for lack of contrary

evidence; i.e., an improvement has not been demonstrated beyond a reasonabledoubt.

ExampleApplication. In the past, the break strength of a synthetic yarn has averaged

34.6 lb. The first-stage draw ratio of the spinning machines has been increased.Production management wants to determine whether the break strength haschanged under the new condition.

Procedure1. The standard of reference is µ0 = 34.6.2. It is of interest to demonstrate whether a change has occurred; therefore,

H0: µ = 34.6 H1: µ ≠ 34.63. Select α = .05, and since with the change in draw ratio the uniformity

might change, the sample standard deviation would be used, and therefore twould be the appropriate test statistic.

4. A sample of 21 ends was selected randomly and tested on an Instron withthe results x = 35.55 and s = 2.041.

5. For 20 df and a two-tailed α level of 5 percent, the critical values of t.025

(two tailed) are given by ±2.086 with a decision rule (Table 3-6, t.025, df = 20):

Accept H0 if −2.086 < sample t < 2.086Reject H0 if sample t < −2.086 or > 2.086

6. For the data sample t = (35.55 − 34.6)/(2.041/21) = 2.133.7. Since 2.133 > 2.086, reject H0 and accept H1. It has been demonstrated

that an improvement in break strength has been achieved.

Two-Population Test of Hypothesis for MeansNature Two samples were selected from different locations in a

plastic-film sheet and measured for thickness. The thickness of therespective samples was measured at 10 close but equally spaced pointsin each of the samples. It was of interest to compare the average thick-ness of the respective samples to detect whether they were signifi-cantly different. That is, was there a significant variation in thicknessbetween locations?

From a modeling standpoint statisticians would define this problemas a two-population test of hypothesis. They would define the respec-tive sample sheets as two populations from which 10 sample thicknessdeterminations were measured for each.

In order to compare populations based on their respective samples, itis necessary to have some basis of comparison. This basis is predicatedon the distribution of the t statistic. In effect, the t statistic characterizes

STATISTICS 3-79

FIG. 3-65 Acceptance region for two-tailed test. For a one-tailed test, area = on one side only.

Page 83: 03 mathematics

the way in which two sample means from two separate populations willtend to vary by chance alone when the population means and variancesare equal. Consider the following:

Population 1 Population 2

Normal Sample 1 Normal Sample 2

µ1 n1 µ2 n2

x1 x2

σ12 s1

2 σ 22 s2

2

Consider the hypothesis µ1 = µ2. If, in fact, the hypothesis is correct,i.e., µ1 = µ2 (under the condition σ 1

2 = σ 22), then the sampling distribu-

tion of (x1 − x2) is predictable through the t distribution. The observedsample values then can be compared with the corresponding t distri-bution. If the sample values are reasonably close (as reflected throughthe α level), that is, x1 and x2 are not “too different” from each otheron the basis of the t distribution, the null hypothesis would beaccepted. Conversely, if they deviate from each other “too much” andthe deviation is therefore not ascribable to chance, the conjecturewould be questioned and the null hypothesis rejected.

ExampleApplication. Two samples were selected from different locations in a plastic-

film sheet. The thickness of the respective samples was measured at 10 close butequally spaced points.

Procedure1. Demonstrate whether the thicknesses of the respective sample locations

are significantly different from each other; therefore,

H0: µ1 = µ2 H1: µ1 ≠ µ2

2. Select α = .05.3. Summarize the statistics for the respective samples:

Sample 1 Sample 2

1.473 1.367 1.474 1.4171.484 1.276 1.501 1.4481.484 1.485 1.485 1.4691.425 1.462 1.435 1.4741.448 1.439 1.348 1.452

x1 = 1.434 s1 = .0664 x2 = 1.450 s2 = .0435

4. As a first step, the assumption for the standard t test, that σ 12 = σ 2

2, can betested through the F distribution. For this hypothesis, H0: σ 1

2 = σ 22 would be

tested against H1: σ 12 ≠ σ 2

2. Since this is a two-tailed test and conventionally onlythe upper tail for F is published, the procedure is to use the largest ratio and thecorresponding ordered degrees of freedom. This achieves the same end resultthrough one table. However, since the largest ratio is arbitrary, it is necessary todefine the true α level as twice the value of the tabled value. Therefore, by usingTable 3-8 with α = .05 the corresponding critical value for F(9,9) = 3.18 wouldbe for a true α = .10. For the sample,

Sample F = (.0664/.0435)2 = 2.33

Therefore, the ratio of sample variances is no larger than one might expect toobserve when in fact σ 1

2 = σ 22. There is not sufficient evidence to reject the null

hypothesis that σ 12 = σ 2

2.5. For 18 df and a two-tailed α level of 5 percent the critical values of t are

given by 2.101 (Table 3-6, t0.025, df = 18).6. The decision rule:

Accept H0 if −2.101 ≤ sample t ≤ 2.101Reject H0 otherwise

7. For the sample the pooled variance estimate is given by Eq. (3-120).

sp2 = = = .00315

or sp = .056

8. The sample statistic value of t is given by Eq. (3-119).

Sample t = = −.64

9. Since the sample value of t falls within the acceptance region, accept H0 forlack of contrary evidence; i.e., there is insufficient evidence to demonstrate thatthickness differs between the two selected locations.

1.434 − 1.450.0561/10 + 1/10

(.0664)2 + (.0435)2

2

9(.0664)2 + 9(.0435)2

9 + 9

Test of Hypothesis for Paired ObservationsNature In some types of applications, associated pairs of observa-

tions are defined. For example, (1) pairs of samples from two popula-tions are treated in the same way, or (2) two types of measurementsare made on the same unit. For applications of this type, it is not onlymore effective but necessary to define the random variable as the dif-ference between the pairs of observations. The difference numberscan then be tested by the standard t distribution.

Examples of the two types of applications are as follows:1. Sample treatmenta. Two types of metal specimens buried in the ground together in a

variety of soil types to determine corrosion resistanceb. Wear-rate test with two different types of tractor tires mounted

in pairs on n tractors for a defined period of time2. Same unita. Blood-pressure measurements made on the same individual

before and after the administration of a stimulusb. Smoothness determinations on the same film samples at two dif-

ferent testing laboratories

Test of Hypothesis for Matched Pairs: ProcedureNomenclaturedi = sample difference between the ith pair of observationss = sample standard deviation of differencesµ = population mean of differencesσ = population standard deviation of differences

µ0 = base or reference level of comparisonH0 = null hypothesisH1 = alternative hypothesisα = significance levelt = tabled value with (n − 1) dft = (d − µ0)/(s/n), the sample value of t

Assumptions1. The n pairs of samples have been selected and assigned for test-

ing in a random way.2. The population of differences is normally distributed with a

mean µ and variance σ2. As in the previous application of the t distri-bution, this is a robust procedure, i.e., not sensitive to the normalityassumption if the sample size is 10 or greater in most situations.

Test of Hypothesis1. Under the null hypothesis, it is assumed that the sample came

from a population whose mean µ is equivalent to some base or refer-ence level designated by µ0. For most applications of this type, thevalue of µ0 is defined to be zero; that is, it is of interest generally todemonstrate a difference not equal to zero. The hypothesis can takeone of three forms:

Form 1 Form 2 Form 3

H0: µ = µ0 H0: µ ≤ µ0 H0: µ ≥ µ0

H1: µ ≠ µ0 H1: µ > µ0 H1: µ < µ0

Two-tailed test Upper-tailed test Lower-tailed test

2. If the null hypothesis is assumed to be true, say, in the case of alower-tailed test, form 3, then the distribution of the test statistic t isknown under the null hypothesis that limits µ = µ0. Given a randomsample, one can predict how far its sample value of t might beexpected to deviate from zero by chance alone when µ = µ0. If thesample value of t is too small, as in the case of a negative value, then this would be defined as sufficient evidence to reject the nullhypothesis.

3. Select α.4. The critical values or value of t would be defined by the tabled

value of t with (n − 1) df corresponding to a tail area of α. For a two-tailed test, each tail area would be α/2, and for a one-tailed test therewould be an upper-tail or a lower-tail area of α corresponding to forms2 and 3 respectively.

5. The decision rule for each of the three forms would be toreject the null hypothesis if the sample value of t fell in that area ofthe t distribution defined by α, which is called the critical region.

3-80 MATHEMATICS

Page 84: 03 mathematics

Otherwise, the alternative hypothesis would be accepted for lack ofcontrary evidence.

ExampleApplication. Pairs of pipes have been buried in 11 different locations to

determine corrosion on nonbituminous pipe coatings for underground use. Onetype includes a lead-coated steel pipe and the other a bare steel pipe.

Procedure1. The standard of reference is taken as µ0 = 0, corresponding to no differ-

ence in the two types.2. It is of interest to demonstrate whether either type of pipe has a greater

corrosion resistance than the other. Therefore,

H0: µ = 0 H1: µ ≠ 0

3. Select α = .05. Therefore, with n = 11 the critical values of t with 10 df aredefined by t = 2.228 (Table 3.5, t.025).

4. The decision rule:

Accept H0 if −2.228 ≤ sample t ≤ 2.228Reject H0 otherwise

5. The sample of 11 pairs of corrosion determinations and their differencesare as follows:

Lead-coated Bare steelSoil type steel pipe pipe d = difference

A 27.3 41.4 −14.1B 18.4 18.9 −0.5C 11.9 21.7 −9.8D 11.3 16.8 −5.5E 14.8 9.0 5.8F 20.8 19.3 1.5

G 17.9 32.1 −14.2H 7.8 7.4 0.4I 14.7 20.7 −6.0J 19.0 34.4 −15.4K 65.3 76.2 −10.9

6. The sample statistics, Eq. (3-118)

d = −6.245 s2 = = 52.59

or s = 7.25

Sample t = (−6.245 − 0)/(7.25/11)

= −2.86

7. Since the sample t of −2.86 < tabled t of −2.228, reject H0 and accept H1;that is, it has been demonstrated that, on the basis of the evidence, lead-coatedsteel pipe has a greater corrosion resistance than bare steel pipe.

ExampleApplication. A stimulus was tested for its effect on blood pressure. Ten men

were selected randomly, and their blood pressure was measured before andafter the stimulus was administered. It was of interest to determine whether thestimulus had caused a significant increase in the blood pressure.

Procedure1. The standard of reference was taken as µ0 ≤ 0, corresponding to no

increase.2. It was of interest to demonstrate an increase in blood pressure if in fact an

increase did occur. Therefore,

H0: µ0 ≤ 0 H1: µ0 > 0

3. Select α = .05. Therefore, with n = 10 the critical value of t with 9 df isdefined by t = 1.833 (Table 3-6, t.05, one-sided).

4. The decision rule:

Accept H0 if sample t < 1.833Reject H0 if sample t > 1.833

5. The sample of 10 pairs of blood pressure and their differences were asfollows:

Individual Before After d = difference

1 138 146 82 116 118 23 124 120 −44 128 136 8

11 d2 − ( d)2

11 × 10

5 155 174 19

6 129 133 47 130 129 −18 148 155 79 143 148 5

10 159 155 −4

6. The sample statistics:

d = 4.4 s = 6.85Sample t = (4.4 − 0)/(6.85/10) = 2.03

7. Since the sample t = 2.03 > critical t = 1.833, reject the null hypothesis. It hasbeen demonstrated that the population of men from which the sample was drawntend, as a whole, to have an increase in blood pressure after the stimulus has beengiven. The distribution of differences d seems to indicate that the degree ofresponse varies by individuals.

Test of Hypothesis for a ProportionNature Some types of statistical applications deal with counts

and proportions rather than measurements. Examples are (1) theproportion of workers in a plant who are out sick, (2) lost-time workeraccidents per month, (3) defective items in a shipment lot, and (4)preference in consumer surveys.

The procedure for testing the significance of a sample proportionfollows that for a sample mean. In this case, however, owing to thenature of the problem the appropriate test statistic is Z. This followsfrom the fact that the null hypothesis requires the specification of thegoal or reference quantity p0, and since the distribution is a binomialproportion, the associated variance is [p0(1 − p0)]n under the nullhypothesis. The primary requirement is that the sample size n satisfynormal approximation criteria for a binomial proportion, roughly np > 5 and n(1 − p) > 5.

Test of Hypothesis for a Proportion: ProcedureNomenclature

p = mean proportion of the population from which thesample has been drawn

p0 = base or reference proportion[p0(1 − p0)]/n = base or reference variance

p = x/n = sample proportion, where x refers to the numberof observations out of n which have the specifiedattribute

H0 = assumption or null hypothesis regarding the popu-lation proportion

H1 = alternative hypothesisα = significance level, usually set at .10, .05, or .01z = tabled Z value corresponding to the significance

level α. The sample sizes required for the zapproximation according to the magnitude of p0

are given in Table 3-6.z = (p − p0)/p0(1 − p0)/n, the sample value of the test

statisticAssumptions1. The n observations have been selected randomly.2. The sample size n is sufficiently large to meet the requirement

for the Z approximation.Test of Hypothesis1. Under the null hypothesis, it is assumed that the sample came

from a population with a proportion p0 of items having the specifiedattribute. For example, in tossing a coin the population could bethought of as having an unbounded number of potential tosses. If it isassumed that the coin is fair, this would dictate p0 = 1/2 for the pro-portional number of heads in the population. The null hypothesis cantake one of three forms:

Form 1 Form 2 Form 3

H0: p = p0 H0: p ≤ p0 H0: p ≥ p0

H1: p ≠ p0 H1: p > p0 H1: p < p0

Two-tailed test Upper-tailed test Lower-tailed test

STATISTICS 3-81

Page 85: 03 mathematics

2. If the null hypothesis is assumed to be true, then the samplingdistribution of the test statistic Z is known. Given a random sample, itis possible to predict how far the sample proportion x/n might deviatefrom its assumed population proportion p0 through the Z distribution.When the sample proportion deviates too far, as defined by the signif-icance level α, this serves as the justification for rejecting the assump-tion, that is, rejecting the null hypothesis.

3. The decision rule is given byForm 1: Accept H0 if lower critical z < sample z < upper critical z

Reject H0 otherwiseForm 2: Accept H0 if sample z < upper critical z

Reject H0 otherwiseForm 3: Accept H0 if lower critical z < sample z

Reject H0 otherwise

ExampleApplication. A company has received a very large shipment of rivets. One

product specification required that no more than 2 percent of the rivets havediameters greater than 14.28 mm. Any rivet with a diameter greater than thiswould be classified as defective. A random sample of 600 was selected andtested with a go–no-go gauge. Of these, 16 rivets were found to be defective. Isthis sufficient evidence to conclude that the shipment contains more than 2 per-cent defective rivets?

Procedure1. The quality goal is p ≤ .02. It would be assumed initially that the shipment

meets this standard; i.e., H0: p ≤ .02.2. The assumption in step 1 would first be tested by obtaining a random sam-

ple. Under the assumption that p ≤ .02, the distribution for a sample proportionwould be defined by the z distribution. This distribution would define an upperbound corresponding to the upper critical value for the sample proportion. Itwould be unlikely that the sample proportion would rise above that value if, infact, p ≤ .02. If the observed sample proportion exceeds that limit, correspond-ing to what would be a very unlikely chance outcome, this would lead one toquestion the assumption that p ≤ .02. That is, one would conclude that the nullhypothesis is false. To test, set

H0: p ≤ .02 H1: p > .02

3. Select α = .05.4. With α = .05, the upper critical value of Z = 1.645 (Table 3-6, t.05, df = ∞,

one-sided).5. The decision rule:

Accept H0 if sample z < 1.645

Reject H0 if sample z > 1.645

6. The sample z is given by

Sample z =

= 1.17

7. Since the sample z < 1.645, accept H0 for lack of contrary evidence; thereis not sufficient evidence to demonstrate that the defect proportion in the ship-ment is greater than 2 percent.

Test of Hypothesis for Two ProportionsNature In some types of engineering and management-science

problems, we may be concerned with a random variable which repre-sents a proportion, for example, the proportional number of defectiveitems per day. The method described previously relates to a singleproportion. In this subsection two proportions will be considered.

A certain change in a manufacturing procedure for producing com-ponent parts is being considered. Samples are taken by using both theexisting and the new procedures in order to determine whether thenew procedure results in an improvement. In this application, it is ofinterest to demonstrate statistically whether the population propor-tion p2 for the new procedure is less than the population proportion p1

for the old procedure on the basis of a sample of data.

Test of Hypothesis for Two Proportions: ProcedureNomenclaturep1 = population 1 proportionp2 = population 2 proportionn1 = sample size from population 1n2 = sample size from population 2

(16/600) − .02(.02)(.98)/600

x1 = number of observations out of n1 that have the designatedattribute

x2 = number of observations out of n2 that have the designatedattribute

p1 = x1/n1, the sample proportion from population 1p2 = x2/n2, the sample proportion from population 2α = significance level

H0 = null hypothesisH1 = alternative hypothesis

z = tabled Z value corresponding to the stated significance level α

z = , the sample value of Z

Assumptions1. The respective two samples of n1 and n2 observations have been

selected randomly.2. The sample sizes n1 and n2 are sufficiently large to meet the

requirement for the Z approximation; i.e., x1 > 5, x2 > 5.Test of Hypothesis1. Under the null hypothesis, it is assumed that the respective two

samples have come from populations with equal proportions p1 = p2.Under this hypothesis, the sampling distribution of the correspondingZ statistic is known. On the basis of the observed data, if the resultantsample value of Z represents an unusual outcome, that is, if it fallswithin the critical region, this would cast doubt on the assumption ofequal proportions. Therefore, it will have been demonstrated statisti-cally that the population proportions are in fact not equal. The varioushypotheses can be stated:

Form 1 Form 2 Form 3

H0: p1 = p2 H0: p1 ≤ p2 H0: p1 ≥ p2

H1: p1 ≠ p2 H1: p1 > p2 H1: p1 < p2

Two-tailed test Upper-tailed test Lower-tailed test

2. The decision rule for form 1 is given byAccept H0 if lower critical z < sample z < upper critical zReject H0 otherwise

ExampleApplication. A change was made in a manufacturing procedure for compo-

nent parts. Samples were taken during the last week of operations with the oldprocedure and during the first week of operations with the new procedure.Determine whether the proportional numbers of defects for the respective pop-ulations differ on the basis of the sample information.

Procedure1. The hypotheses are

H0: p1 = p2 H1: p1 ≠ p2

2. Select α = .05. Therefore, the critical values of z are 1.96 (Table 3-5, A = 0.9500).

3. For the samples, 75 out of 1720 parts from the previous procedure and 80out of 2780 parts under the new procedure were found to be defective; therefore,

p1 = 75/1720 = .0436 p2 = 80/2780 = .0288

4. The decision rule:

Accept H0 if −1.96 ≤ sample z ≤ 1.96Reject H0 otherwise

5. The sample statistic:

Sample z =

= 2.53

6. Since the sample z of 2.53 > tabled z of 1.96, reject H0 and conclude thatthe new procedure has resulted in a reduced defect rate.

Goodness-of-Fit TestNature A standard die has six sides numbered from 1 to 6. If one

were really interested in determining whether a particular die was wellbalanced, one would have to carry out an experiment. To do this, it mightbe decided to count the frequencies of outcomes, 1 through 6, in tossingthe die N times. On the assumption that the die is perfectly balanced,

.0436 − .0288(.0436)(.9564)/1720 + (.0288)(.9712)/2780

p1 − p2p1(1 − p1)/n1 + p2(1 − p2)/n2

3-82 MATHEMATICS

Page 86: 03 mathematics

one would expect to observe N/6 occurrences each for 1, 2, 3, 4, 5, and 6.However, chance dictates that exactly N/6 occurrences each will not beobserved. For example, given a perfectly balanced die, the probability isonly 1 chance in 65 that one will observe 1 outcome each, for 1 through6, in tossing the die 6 times. Therefore, an outcome different from 1occurrence each can be expected. Conversely, an outcome of six 3swould seem to be too unusual to have occurred by chance alone.

Some industrial applications involve the concept outlined here. Thebasic idea is to test whether or not a group of observations follows a pre-conceived distribution. In the case cited, the distribution is uniform;i.e., each face value should tend to occur with the same frequency.

Goodness-of-Fit Test: ProcedureNomenclature Each experimental observation can be classified

into one of r possible categories or cells.r = total number of cells

Oj = number of observations occurring in cell jEj = expected number of observations for cell j based on the pre-

conceived distributionN = total number of observationsf = degrees of freedom for the test. In general, this will be equal

to (r − 1) minus the number of statistical quantities on whichthe Ej’s are based (see the examples which follow for details).

Assumptions1. The observations represent a sample selected randomly from a

population which has been specified.2. The number of expectation counts Ej within each category

should be roughly 5 or more. If an Ej count is significantly less than 5,that cell should be pooled with an adjacent cell.

Computation for Ej On the basis of the specified population, theprobability of observing a count in cell j is defined by pj. For a sampleof size N, corresponding to N total counts, the expected frequency isgiven by Ej = Npj.

Test Statistics: Chi Square

χ2 = r

j = 1

with f df

Test of Hypothesis1. H0: The sample came from the specified theoretical distribution

H1: The sample did not come from the specified theoreticaldistribution

2. For a stated level of α,Reject H0 if sample χ2 > tabled χ2

Accept H0 if sample χ2 < tabled χ2

ExampleApplication A production-line product is rejected if one of its characteris-

tics does not fall within specified limits. The standard goal is that no more than2 percent of the production should be rejected.

Computation1. Of 950 units produced during the day, 28 units were rejected.2. The hypotheses:

H0: the process is in controlH1: the process is not in control

3. Assume that α = .05; therefore, the critical value of χ2(1) = 3.84 (Table 3-7,95 percent, df = 1). One degree of freedom is defined since (r − 1) = 1, and no sta-tistical quantities have been computed for the data.

4. The decision rule:

Reject H0 if sample χ2 > 3.84Accept H0 otherwise

5. Since it is assumed that p = .02, this would dictate that in a sample of 950there would be on the average (.02)(950) = 19 defective items and 931 accept-able items:

ExpectationCategory Observed Oj Ej = 950pj

Acceptable 922 931Not acceptable 28 19

Total 950 950

(Oj − Ej)2

Ej

Sample χ2 = +

= 4.35 with critical χ2 = 3.84

6. Conclusion. Since the sample value exceeds the critical value, it would beconcluded that the process is not in control.

ExampleApplication A frequency count of workers was tabulated according to the

number of defective items that they produced. An unresolved question iswhether the observed distribution is a Poisson distribution. That is, do observedand expected frequencies agree within chance variation?

Computation1. The hypotheses:

H0: there are no significant differences, in number of defective units,between workers

H1: there are significant differences2. Assume that α = .05.3. Test statistic:

No. ofdefective units Oj Ej

0 3 10 2.06 8.70 pool1 7 6.642 9 10.733 12 11.554 9 9.335 6 6.03

6 3 3.247 2 1.508 0 6 .60 5.66 pool9 1 .22

≥10 0 .10Sum 52 52

The expectation numbers Ej were computed as follows: For the Poisson dis-tribution, λ = E(x); therefore, an estimate of λ is the average number of defec-tive units per worker, i.e., λ = (1/52)(0 × 3 + 1 × 7 + ⋅ ⋅ ⋅ + 9 × 1) = 3.23. Giventhis approximation, the probability of no defective units for a worker would be(3.23)0/0!)e−3.23 = .0396. For the 52 workers, the number of workers producingno defective units would have an expectation E = 52(0.0396) = 2.06, and so forth.

The sample chi-square value is computed from

χ2 = + + ⋅ ⋅ ⋅ +

= .522

4. The critical value of χ2 would be based on four degrees of freedom. This cor-responds to (r − 1) − 1, since one statistical quantity λ was computed from the sam-ple and used to derive the expectation numbers.

5. The critical value of χ2(4) = 9.49 (Table 3-7) with α = .05; therefore, acceptH0.

Two-Way Test for Independence for Count DataNature When individuals or items are observed and classified

according to two different criteria, the resultant counts can be statisti-cally analyzed. For example, a market survey may examine whether anew product is preferred and if it is preferred due to a particular char-acteristic.

Count data, based on a random selection of individuals or itemswhich are classified according to two different criteria, can be statisti-cally analyzed through the χ2 distribution. The purpose of this analysisis to determine whether the respective criteria are dependent. That is,is the product preferred because of a particular characteristic?

Two-Way Test for Independence for Count Data: ProcedureNomenclature1. Each observation is classified according to two categories:a. The first one into 2, 3, . . . , or r categoriesb. The second one into 2, 3, . . . , or c categories2. Oij = number of observations (observed counts) in cell (i, j) with

i = 1, 2, . . . , rj = 1, 2, . . . , c

(6 − 5.66)2

5.66

(9 − 10.73)2

10.73

(10 − 8.70)2

8.70

(28 − 19)2

19

(922 − 931)2

931

STATISTICS 3-83

Page 87: 03 mathematics

3. N = total number of observations4. Eij = computed number for cell (i,j) which is an expectation

based on the assumption that the two characteristics are independent5. Ri = subtotal of counts in row i6. Cj = subtotal of counts in column j7. α = significance level8. H0 = null hypothesis9. H1 = alternative hypothesis

10. χ2 = critical value of χ2 corresponding to the significance levelα and (r − 1)(c − 1) df

11. Sample χ2 = c,r

i, j

Assumptions1. The observations represent a sample selected randomly from a

large total population.2. The number of expectation counts Eij within each cell should be

approximately 2 or more for arrays 3 × 3 or larger. If any cell containsa number smaller than 2, appropriate rows or columns should becombined to increase the magnitude of the expectation count. Forarrays 2 × 2, approximately 4 or more are required. If the number isless than 4, the exact Fisher test should be used.

Test of Hypothesis Under the null hypothesis, the classificationcriteria are assumed to be independent, i.e.,

H0: the criteria are independentH1: the criteria are not independent

For the stated level of α,

Reject H0 if sample χ2 > tabled χ2

Accept H0 otherwise

Computation for Eij Compute Eij across rows or down columnsby using either of the following identities:

Eij = Cj across rows

Eij = Ri down columns

Sample c2 Value

χ2 = i, j

In the special case of r = 2 and c = 2, a more accurate and simplified for-mula which does not require the direct computation of Eij can be used:

χ2 =

ExampleApplication A market research study was carried out to relate the subjective

“feel” of a consumer product to consumer preference. In other words, is theconsumer’s preference for the product associated with the feel of the product, oris the preference independent of the product feel?

Procedure1. It was of interest to demonstrate whether an association exists between

feel and preference; therefore, assume

H0: feel and preference are independentH1: they are not independent

2. A sample of 200 people was asked to classify the product according to twocriteria:

a. Liking for this productb. Liking for the feel of the product

Like feel

Yes No Ri

Like product Yes 114 13 = 127No 55 18 = 73Cj 169 31 200

[|O11O22 − O12O21| − aN]2N

R1R2C1C2

(Oij − Eij)2

Eij

CjN

RiN

(Oij − Eij)2

Eij

3. Select α = .05; therefore, with (r − 1)(c − 1) = 1 df, the critical value of χ2

is 3.84 (Table 3-7, 95 percent).4. The decision rule:

Accept H0 if sample χ2 < 3.84Reject H0 otherwise

5. The sample value of χ2 by using the special formula is

Sample χ2 =

= 6.30

6. Since the sample χ2 of 6.30 > tabled χ2 of 3.84, reject H0 and accept H1.The relative proportionality of E11 = 169(127/200) = 107.3 to the observed 114compared with E22 = 31(73/200) = 11.3 to the observed 18 suggests that whenthe consumer likes the feel, the consumer tends to like the product, and con-versely for not liking the feel. The proportions 169/200 = 84.5 percent and127/200 = 63.5 percent suggest further that there are other attributes of theproduct which tend to nullify the beneficial feel of the product.

LEAST SQUARES

When experimental data is to be fit with a mathematical model, it isnecessary to allow for the fact that the data has errors. The engineer isinterested in finding the parameters in the model as well as the uncer-tainty in their determination. In the simplest case, the model is a lin-ear equation with only two parameters, and they are found by aleast-squares minimization of the errors in fitting the data. Multipleregression is just linear least squares applied with more terms. Non-linear regression allows the parameters of the model to enter in a non-linear fashion. See Press et al. (1986); for a description of maximumlikelihood as it applies to both linear and nonlinear least squares.

In a least squares parameter estimation, it is desired to find para-meters that minimize the sum of squares of the deviation between theexperimental data and the theoretical equation.

χ2 N

i = 1[yi y(xi; a1,a2, . . . , aM)]2

where yi is the ith experimental data point for the value xi, y(xi; a1,a2,. . ., aM) is the theoretical equation at xi, and the parameters a1,a2,. . ., aM are to be determined to minimize χ2. This will also mini-mize the variance of the curve fit

2 N

i =1

Linear Least Squares When the model is a straight line, one isminimizing

χ2

N

i = 1(yi a bxi)2

The linear correlation coefficient r is defined by

r =

and

+2 (1 r2) N

i=1(yi y)2

where y is the average of the yi values. Values of r near 1 indicate apositive correlation; r near –1 means a negative correlation, and r near0 means no correlation. These parameters are easily found by usingstandard programs, such as Microsoft Excel.

ExampleApplication. Brenner (Magnetic Method for Measuring the Thickness of

Non-magnetic Coatings on Iron and Steel, National Bureau of Standards,RP1081, March 1938) suggests an alternative way of measuring the thickness ofnonmagnetic coatings of galvanized zinc on iron and steel. This procedure isbased on a nondestructive magnetic method as a substitute for the standard

N

i =1(xi x–)(yi y–)

N

i =1

(xi x–)2 N

i =1

(yi y–)2

[yi y(xi; a1, a2,. . . , aM)]2

N

[|114 × 18 − 13 × 55| − 100]2200

(169)(31)(127)(73)

3-84 MATHEMATICS

Page 88: 03 mathematics

destructive stripping method. A random sample of 11 pieces was selected andmeasured by both methods.

Nomenclature. The calibration between the magnetic and the strippingmethods can be determined through the model

y = a + bx + εwhere x = strip-method determination

y = magnetic-method determination

Sample data

Thickness, 10−5 in

Stripping method, Magnetic method,x y

104 85114 115116 105129 127132 120139 121

174 155312 250338 310465 443720 630

This example is solved by using Microsoft Excel. Put the data intocolumns A and B as shown, using rows 1 through 11. Then the commands

SLOPE(B1:B11, A1:A11), INTERCEPT(B1:B11, A1:A11),RSQ(B1:B11, A1:A11)

give the slope b, the intercept a, and the value of r2. Here they are3.20, 0.884, and 0.9928, respectively, so that r = 0.9964. By choosingInsert/Chart and Scatter Plot, the data are plotted. Once that is done,place the cursor on a data point and right-click; choose Format trend-line with options selected to display the equation and r2, and you getFig. 3-66. On the Macintosh, use CTRL-click.

Polynomial Regression In polynomial regression, one expandsthe function in a polynomial in x.

y(x) M

j=1 aj x j1

ExampleApplication. Merriman (“The Method of Least Squares Applied to a

Hydraulic Problem,” J. Franklin Inst., 233–241, October 1877) reported on a

study of stream velocity as a function of relative depth of the stream.Sample data

Depth* Velocity, y, ft/s

0 3.1950.1 3.2299.2 3.2532.3 3.2611.4 3.2516

.5 3.2282

.6 3.1807

.7 3.1266

.8 3.0594

.9 2.9759

*As a fraction of total depth.

The model is taken as a quadratic function of position:Velocity a bx cx2

The parameters are easily determined by using computer software. InMicrosoft Excel, the data are put into columns A and B and the graphis created as for a linear curve fit. This time, though, when adding thetrendline, choose the polynomial icon and use 2 (which gives powersup to and including x2). The result is

Velocity 3.195 0.4425x 0.7653x2

The value of r2 is .9993.Multiple Regression In multiple regression, any set of functions

can be used, not just polynomials, such as

y(x) M

j=1aj fj(x)

where the set of functions fj(x) is known and specified. Note thatthe unknown parameters aj enter the equation linearly. In thiscase, the spreadsheet can be expanded to have a column for x andthen successive columns for fj(x). In Microsoft Excel, you chooseRegression under Tools/Data Analysis, and complete the form. Inaddition to the actual correlation, you get the expected variance ofthe unknowns, which allows you to assess how accurately they weredetermined. In the example above, by creating a column for x and x2,one obtains an intercept of 3.195 with a standard error of .0039, b =.4416 with a standard error of .018, and c = −.7645 with a standarderror of .018.

Nonlinear Least Squares There are no analytic methods fordetermining the most appropriate model for a particular set of data.

STATISTICS 3-85

R2 = 0.9928

y = 0.8844x + 3.1996

R2 = 0.9928

0

100

200

300

400

500

600

700

0 100 200 300 400 500 600 700 800

Stripping method

Ma

gn

eti

c m

eth

od

FIG. 3-66 Plot of data and correlating line.

Page 89: 03 mathematics

In many cases, however, the engineer has some basis for a model. Ifthe parameters occur in a nonlinear fashion, then the analysisbecomes more difficult. For example, in relating the temperature tothe elapsed time of a fluid cooling in the atmosphere, a model thathas an asymptotic property would be the appropriate model (temp =a + b exp(−c time), where a represents the asymptotic temperaturecorresponding to t → ∞. In this case, the parameter c appears non-linearly. The usual practice is to concentrate on model developmentand computation rather than on statistical aspects. In general, non-linear regression should be applied only to problems in which there isa well-defined, clear association between the two variables; there-fore, a test of hypothesis on the significance of the fit would be some-what ludicrous. In addition, the generalization of the theory for theassociate confidence intervals for nonlinear coefficients is not welldeveloped.

ExampleApplication. Data were collected on the cooling of water in the atmosphere

as a function of time.

Sample data

Time x Temperature y

0 92.01 85.52 79.53 74.55 67.0

7 60.510 53.515 45.020 39.5

Model. MATLAB can be used to find the best fit of the data to the formulay = a + becx: a = 33.54, b = 57.89, c = 0.11. The value of χ2 is 1.83. Using an alter-native form, y = a + b/(c + x), gives a = 9.872, b = 925.7, c = 11.27, and χ = 0.19.Since this model had a smaller value of χ2 it might be the chosen one, but it isonly a fit of the specified data and may not be generalized beyond that. Bothforms give equivalent plots.

ERROR ANALYSIS OF EXPERIMENTS

Consider the problem of assessing the accuracy of a series of mea-surements. If measurements are for independent, identically distrib-uted observations, then the errors are independent and uncorrelated.Then y, the experimentally determined mean, varies about E(y), thetrue mean, with variance σ2/n, where n is the number of observationsin y. Thus, if one measures something several times today, and eachday, and the measurements have the same distribution, then the vari-ance of the means decreases with the number of samples in each day’smeasurement, n. Of course, other factors (weather, weekends) maymake the observations on different days not distributed identically.

Consider next the problem of estimating the error in a variable thatcannot be measured directly but must be calculated based on resultsof other measurements. Suppose the computed value Y is a linearcombination of the measured variables yi, Y = α1y1 + α2y2 + . . . . Letthe random variables y1, y2, . . . have means E(y1), E(y2), . . . and vari-ances σ2(y1), σ2(y2), . . . . The variable Y has mean

E(Y) = α1E(y1) + α2 E(y2) + . . .

and variance (Cropley, 1978)

σ2(Y) = n

i = 1

α i2σ2(yi) + 2

n

i = 1

n

j = i + 1

αiαj Cov (yi, yj)

If the variables are uncorrelated and have the same variance, then

σ2(Y) = n

i = 1

α i2σ2

Next suppose the model relating Y to yi is nonlinear, but the errorsare small and independent of one another. Then a change in Y isrelated to changes in yi by

dY = dy1 + dy2 + …∂Y∂y2

∂Y∂y1

If the changes are indeed small, then the partial derivatives are con-stant among all the samples. Then the expected value of the change,E(dY), is zero. The variances are given by the following equation(Baird, 1995; Box et al., 2005):

σ2(dY) = N

i = 1

2

σ i2

Thus, the variance of the desired quantity Y can be found. This givesan independent estimate of the errors in measuring the quantity Yfrom the errors in measuring each variable it depends upon.

Example Suppose one wants to measure the thermal conductivity of asolid (k). To do this, one needs to measure the heat flux (q), the thickness of thesample (d), and the temperature difference across the sample (∆T). Each mea-surement has some error. The heat flux (q) may be the rate of electrical heatinput (Q) divided by the area (A), and both quantities are measured to sometolerance. The thickness of the sample is measured with some accuracy, andthe temperatures are probably measured with a thermocouple to some accu-racy. These measurements are combined, however, to obtain the thermal con-ductivity, and it is desired to know the error in the thermal conductivity. Theformula is

k = Q

The variance in the thermal conductivity is then

σ k2 =

2

σ d2 +

2

σ Q2 +

2

σA2 +

2

σ 2∆T

FACTORIAL DESIGN OF EXPERIMENTS AND ANALYSIS OF VARIANCE

Statistically designed experiments consider, of course, the effect ofprimary variables, but they also consider the effect of extraneous vari-ables and the interactions between variables, and they include a mea-sure of the random error. Primary variables are those whose effect youwish to determine. These variables can be quantitative or qualitative.The quantitative variables are ones you may fit to a model in order todetermine the model parameters (see the section “Least Squares”).Qualitative variables are ones you wish to know the effect of, but youdo not try to quantify that effect other than to assign possible errors ormagnitudes. Qualitative variables can be further subdivided into TypeI variables, whose effect you wish to determine directly, and Type IIvariables, which contribute to the performance variability and whoseeffect you wish to average out. For example, if you are studying theeffect of several catalysts on yield in a chemical reactor, each differenttype of catalyst would be a Type I variable because you would like toknow the effect of each. However, each time the catalyst is prepared,the results are slightly different due to random variations; thus, youmay have several batches of what purports to be the same catalyst.The variability between batches is a Type II variable. Since the ulti-mate use will require using different batches, you would like to knowthe overall effect including that variation, since knowing precisely theresults from one batch of one catalyst might not be representative ofthe results obtained from all batches of the same catalyst. A random-ized block design, incomplete block design, or Latin square design(Box et al., ibid.), for example, all keep the effect of experimentalerror in the blocked variables from influencing the effect of the pri-mary variables. Other uncontrolled variables are accounted for byintroducing randomization in parts of the experimental design. Tostudy all variables and their interaction requires a factorial design,involving all possible combinations of each variable, or a fractional fac-torial design, involving only a selected set. Statistical techniques arethen used to determine which are the important variables, what arethe important interactions, and what the error is in estimating theseeffects. The discussion here is only a brief overview of the excellentbook by Box et al. (2005).

Suppose we have two methods of preparing some product and wewish to see which treatment is best. When there are only two treatments,then the sampling analysis discussed in the section “Two-Population Testof Hypothesis for Means” can be used to deduce if the means of the twotreatments differ significantly. When there are more treatments, theanalysis is more detailed. Suppose the experimental results are arranged

k∆T

kA

kQ

kd

dA∆T

∂Y∂yi

3-86 MATHEMATICS

Page 90: 03 mathematics

as shown in the table: several measurements for each treatment. Thegoal is to see if the treatments differ significantly from each other; that is,whether their means are different when the samples have the same vari-ance. The hypothesis is that the treatments are all the same, and the nullhypothesis is that they are different. The statistical validity of the hypoth-esis is determined by an analysis of variance.

Estimating the Effect of Four Treatments

Treatment

1 2 3 4

— — — —— — — —— — — —

— — —— —

—Treatment average — — — —Grand average —

The data for k = 4 treatments is arranged in the table. For eachtreatment, there are nt experiments and the outcome of the ith exper-iment with treatment t is called yti. Compute the treatment average

yt =

Also compute the grand average

y = , N = k

t = 1

nt

Next compute the sum of squares of deviations from the averagewithin the tth treatment

St = nt

i = 1

(yti − yt)2

Since each treatment has nt experiments, the number of degrees offreedom is nt − 1. Then the sample variances are

st2 =

The within-treatment sum of squares is

SR = k

t = 1

St

and the within-treatment sample variance is

sR2 =

Now, if there is no difference between treatments, a second estimateof σ2 could be obtained by calculating the variation of the treatmentaverages about the grand average. Thus compute the between-treatment mean square

sT2 = , ST =

k

t = 1

nt(yt − y)2

Basically the test for whether the hypothesis is true or not hinges on acomparison of the within-treatment estimate sR

2 (with νR = N − kdegrees of freedom) with the between-treatment estimate sT

2 (withνT = k − 1 degrees of freedom). The test is made based on the F dis-tribution for νR and νT degrees of freedom (Table 3-8).

Next consider the case that uses randomized blocking to eliminatethe effect of some variable whose effect is of no interest, such as thebatch-to-batch variation of the catalysts in the chemical reactor exam-ple. Suppose there are k treatments and n experiments in each treat-ment. The results from nk experiments can be arranged as shown inthe block design table; within each block, the various treatments areapplied in a random order. Compute the block average, the treatmentaverage, as well as the grand average as before.

STk − 1

SRN − k

Stnt − 1

k

t = 1

ntyt

N

nt

i = 1

yti

nt

Block Design with Four Treatments and Five Blocks

Treatment 1 2 3 4 Block average

Block 1 — — — — —Block 2 — — — — —Block 3 — — — — —Block 4 — — — — —Block 5 — — — — —

The following quantities are needed for the analysis of variance table.

Name Formula dof

average SA = nky2 1

blocks SB = k n

i = 1 (yi − y)2 n − 1

treatments ST = n k

t = 1 (yt − y)2 k − 1

residuals SR = k

t = 1 n

i = 1 (yti − yi − yt + y)2 (n − 1)(k − 1)

total S = k

t = 1 n

i = 1 y2ti N = nk

The key test is again a statistical one, based on the value of

, sT2 = , sR

2 =

and the F distribution for νR and νT degrees of freedom (Table 3-8).The assumption behind the analysis is that the variations are linear.See Box et al. (2005). There are ways to test this assumption as well astransformations to make if it is not true. Box et al. also give an excel-lent example of how the observations are broken down into a grandaverage, a block deviation, a treatment deviation, and a residual. Fortwo-way factorial design in which the second variable is a real onerather than one you would like to block out, see Box et al.

To measure the effects of variables on a single outcome a factorialdesign is appropriate. In a two-level factorial design, each variable is con-sidered at two levels only, a high and low value, often designated as a +and −. The two-level factorial design is useful for indicating trends, show-ing interactions, and it is also the basis for a fractional factorial design. Asan example, consider a 23 factorial design with 3 variables and 2 levels foreach. The experiments are indicated in the factorial design table.

Two-Level Factorial Design with Three Variables

Variable

Run 1 2 3

1 − − −2 + − −3 − + −4 + + −5 − − +6 + − +7 − + +8 + + +

The main effects are calculated by calculating the difference betweenresults from all high values of a variable and all low values of a vari-able; the result is divided by the number of experiments at each level.For example, for the first variable:

Effect of variable 1 =

Note that all observations are being used to supply information oneach of the main effects and each effect is determined with the preci-sion of a fourfold replicated difference. The advantage of a one-at-a-time experiment is the gain in precision if the variables are additiveand the measure of nonadditivity if it occurs (Box et al., 2005).

Interaction effects between variables 1 and 2 are obtained by calcu-lating the difference between the results obtained with the high andlow value of 1 at the low value of 2 compared with the results obtained

[(y2 + y4 + y6 + y8) − (y1 + y3 + y5 + y7)]

4

SR(n − 1)(k − 1)

STk − 1

sT2

sR

2

STATISTICS 3-87

Page 91: 03 mathematics

with the high and low value 1 at the high value of 2. The 12-inter-action is

12-interaction =

The key step is to determine the errors associated with the effect ofeach variable and each interaction so that the significance can bedetermined. Thus, standard errors need to be assigned. This can bedone by repeating the experiments, but it can also be done by usinghigher-order interactions (such as 123 interactions in a 24 factorialdesign). These are assumed negligible in their effect on the meanbut can be used to estimate the standard error. Then, calculated

[(y4 − y3 + y8 − y7) − (y2 − y1 + y6 − y5)]

2

effects that are large compared with the standard error are consid-ered important, while those that are small compared with the stan-dard error are considered to be due to random variations and areunimportant.

In a fractional factorial design one does only part of the possibleexperiments. When there are k variables, a factorial design requires 2k

experiments. When k is large, the number of experiments can belarge; for k = 5, 25 = 32. For a k this large, Box et al. (2005) do a frac-tional factorial design. In the fractional factorial design with k = 5, only 16 experiments are done. Cropley (1978) gives an exampleof how to combine heuristics and statistical arguments in applicationto kinetics mechanisms in chemical engineering.

3-88 MATHEMATICS

DIMENSIONAL ANALYSIS

Dimensional analysis allows the engineer to reduce the number ofvariables that must be considered to model experiments or correlatedata. Consider a simple example in which two variables F1 and F2

have the units of force and two additional variables L1 and L2 havethe units of length. Rather than having to deduce the relation of onevariable on the other three, F1 = fn (F2, L1, L2), dimensional analysiscan be used to show that the relation must be of the form F1 /F2 = fn(L1 /L2). Thus considerable experimentation is saved. Historically,dimensional analysis can be done using the Rayleigh method or theBuckingham pi method. This brief discussion is equivalent to theBuckingham pi method but uses concepts from linear algebra; seeAmundson, N. R., Mathematical Methods in Chemical Engineering,Prentice-Hall, Englewood Cliffs, N.J. (1966), p. 54, for furtherinformation.

The general problem is posed as finding the minimum number ofvariables necessary to define the relationship between n variables. LetQi represent a set of fundamental units, like length, time, force, andso on. Let [Pi] represent the dimensions of a physical quantity Pi; thereare n physical quantities. Then form the matrix αij

[P1] [P2] … [Pn]

Q1 α11 α12 … α1n

Q2 α21 α22 … α2n…Qm αm1 αm2 … αmn

in which the entries are the number of times each fundamental unitappears in the dimensions [Pi]. The dimensions can then be expressedas follows.

[Pi] = Q1α1iQ2

α2i⋅⋅⋅Qmαmi

Let m be the rank of the α matrix. Then p = n − m is the number ofdimensionless groups that can be formed. One can choose m variablesPi to be the basis and express the other p variables in terms of them,giving p dimensionless quantities.

Example: Buckingham Pi Method—Heat-Transfer FilmCoefficient It is desired to determine a complete set of dimensionlessgroups with which to correlate experimental data on the film coefficient ofheat transfer between the walls of a straight conduit with circular cross sectionand a fluid flowing in that conduit. The variables and the dimensional constantbelieved to be involved and their dimensions in the engineering system aregiven below:

Film coefficient = h = (F/LθT)Conduit internal diameter = D = (L)Fluid linear velocity = V = (L/θ)Fluid density = ρ = (M/L3)Fluid absolute viscosity = µ = (M/Lθ)Fluid thermal conductivity = k = (F/θT)Fluid specific heat = cp = (FL/MT)Dimensional constant = gc = (ML/Fθ2)

The matrix α in this case is as follows.

[Pi]

h D V ρ µ k Cp gc

F 1 0 0 0 0 1 1 −1M 0 0 0 1 1 0 −1 1

Qj L −1 1 1 −3 −1 0 1 1θ −1 0 −1 0 −1 −1 0 −2T −1 0 0 0 0 −1 −1 0

Here m ≤ 5, n = 8, p ≥ 3. Choose D, V, µ, k, and gc as the primary variables. Byexamining the 5 × 5 matrix associated with those variables, we can see that itsdeterminant is not zero, so the rank of the matrix is m = 5; thus, p = 3. Thesevariables are thus a possible basis set. The dimensions of the other three vari-ables h, ρ, and Cp must be defined in terms of the primary variables. This can bedone by inspection, although linear algebra can be used, too.

[h] = D−1k+1; thus = is a dimensionless group

[ρ] = µ1V−1D−1; thus = is a dimensionless group

[Cp] = k+1µ−1; thus = is a dimensionless group

Thus, the dimensionless groups are

: , ,

The dimensionless group hD/k is called the Nusselt number, NNu, and thegroup Cpµ /k is the Prandtl number, NPr. The group DVρ/µ is the familiarReynolds number, NRe, encountered in fluid-friction problems. These threedimensionless groups are frequently used in heat-transfer-film-coefficient cor-relations. Functionally, their relation may be expressed as

φ(NNu, NPr, NRe) = 0 (3-121)or as NNu = φ1(NPr, NRe)

Cpµ

kρVD

µhD

k[Pi]

Q1

α1i Q2α2i⋅⋅⋅Qm

αmi

Cpµ

kCp

k+1µ−1

ρVD

µρ

µ1V−1D−1

hD

kh

D−1k

TABLE 3-9 Dimensionless Groups in the Engineering Systemof Dimensions

Biot number NBi hL/kCondensation number NCo (h/k)(µ2/ρ2g)1/3

Number used in condensation of vapors NCv L3ρ2gλ/kµ∆tEuler number NEu gc(−dp)/ρV2

Fourier number NFo kθ/ρcL2

Froude number NFr V2/LgGraetz number NGz wc/kLGrashof number NGr L3ρ2βg∆t/µ2

Mach number NMa V/Va

Nusselt number NNu hD/kPeclet number NPe DVρc/kPrandtl number NPr cµ/kReynolds number NRe DVρ/µSchmidt number NSc µ/ρDυ

Stanton number NSt h/cVρWeber number NWe LV2ρ/σgc

Page 92: 03 mathematics

that would not form a dimensionless group among themselves. Someof these groups may be found among those presented in Table 3-9.Such a complete set of three dimensionless groups might consist ofStanton, Reynolds, and Prandtl numbers or of Stanton, Peclet, andPrandtl numbers. Also, such a complete set different from thatobtained in the preceding example will result from a multiplication ofappropriate powers of the Nusselt, Prandtl, and Reynolds numbers.For such a set to be complete, however, it must satisfy the conditionthat each of the three dimensionless groups be independent of theother two.

PROCESS SIMULATION 3-89

PROCESS SIMULATION

REFERENCES: Dimian, A., Chem. Eng. Prog. 90: 58–66 (Sept. 1994); Kister, H. Z.,“Can We Believe the Simulation Results?” Chem. Eng. Prog., pp. 52–58 (Oct. 2002);Krieger, J. H., Chem. Eng. News 73: 50–61 (Mar. 27, 1995); Mah, R. S. H., Chemi-cal Process Structure and Information Flows, Butterworths (1990); Seader, J. D.,Computer Modeling of Chemical Processes, AIChE Monograph Series no. 15(1985); Seider, W. D., J. D. Seader, and D. R. Lewin, Product and Process DesignPrinciples: Synthesis, Analysis, and Evaluation, 2d ed., Wiley, New York (2004).

CLASSIFICATION

Process simulation refers to the activity in which mathematical sys-tems of chemical processes and refineries are modeled with equa-tions, usually on the computer. The usual distinction must be madebetween steady-state models and transient models, following theideas presented in the introduction to this section. In a chemicalprocess, of course, the process is nearly always in a transient mode,at some level of precision, but when the time-dependent fluctua-tions are below some value, a steady-state model can be formulated.This subsection presents briefly the ideas behind steady-stateprocess simulation (also called flowsheeting), which are embodied incommercial codes. The transient simulations are important fordesigning the start-up of plants and are especially useful for theoperation of chemical plants.

THERMODYNAMICS

The most important aspect of the simulation is that the thermody-namic data of the chemicals be modeled correctly. It is necessary todecide what equation of state to use for the vapor phase (ideal gas,Redlich-Kwong-Soave, Peng-Robinson, etc.) and what model to usefor liquid activity coefficients [ideal solutions, solubility parameters,Wilson equation, nonrandom two liquid (NRTL), UNIFAC, etc.]. SeeSec. 4, “Thermodynamics.” It is necessary to consider mixtures ofchemicals, and the interaction parameters must be predictable. Thebest case is to determine them from data, and the next-best case is touse correlations based on the molecular weight, structure, and normalboiling point. To validate the model, the computer results of vapor-liquid equilibria could be checked against experimental data to ensuretheir validity before the data are used in more complicated computercalculations.

PROCESS MODULES OR BLOCKS

At the first level of detail, it is not necessary to know the internal para-meters for all the units, since what is desired is just the overall perfor-mance. For example, in a heat exchanger design, it suffices to knowthe heat duty, the total area, and the temperatures of the outputstreams; the details such as the percentage baffle cut, tube layout, orbaffle spacing can be specified later when the details of the proposedplant are better defined. It is important to realize the level of detailmodeled by a commercial computer program. For example, a chemi-cal reactor could be modeled as an equilibrium reactor, in which theinput stream is brought to a new temperature and pressure and the

output stream is in chemical equilibrium at those new conditions. Or,it may suffice to simply specify the conversion, and the computer pro-gram will calculate the outlet compositions. In these cases, the modelequations are algebraic ones, and you do not learn the volume of thereactor. A more complicated reactor might be a stirred tank reactor,and then you would have to specify kinetic information so that thesimulation can be made, and one output would be either the volumeof the reactor or the conversion possible in a volume you specify. Suchmodels are also composed of sets of algebraic equations. A plug flowreactor is modeled as a set of ordinary differential equations as initial-value problems, and the computer program must use numerical meth-ods to integrate them. See “Numerical Solution of OrdinaryDifferential Equations as Initial Value Problems.” Kinetic informationmust be specified, and one learns the conversion possible in a givenreactor volume, or, in some cases, the volume reactor that will achievea given conversion. The simulation engineer determines what a reac-tor of a given volume will do for the specified kinetics and reactor vol-ume. The design engineer, though, wants to achieve a certain resultand wants to know the volume necessary. Simulation packages arebest suited for the simulation engineer, and the design engineer mustvary specifications to achieve the desired output.

Distillation simulations can be based on shortcut methods, usingcorrelations based on experience, but more rigorous methods involvesolving for the vapor-liquid equilibrium on each tray. The shortcutmethod uses relatively simple equations, and the rigorous methodrequires solution of huge sets of nonlinear equations. The computa-tion time of the latter is significant, but the rigorous method may benecessary when the chemicals you wish to distill are not well repre-sented in the correlations. Then the designer must specify the numberof trays and determine the separation that is possible. This, of course,is not what he or she wants: the number of trays needed to achieve aspecified objective. Thus, again, some adjustment of parameters isnecessary in a design situation.

Absorption columns can be modeled in a plate-to-plate fashion(even if it is a packed bed) or as a packed bed. The former model is aset of nonlinear algebraic equations, and the latter model is an ordi-nary differential equation. Since streams enter at both ends, the dif-ferential equation is a two-point boundary value problem, andnumerical methods are used (see “Numerical Solution of OrdinaryDifferential Equations as Initial-Value Problems”).

If one wants to model a process unit that has significant flow varia-tion, and possibly some concentration distributions as well, one canconsider using computational fluid dynamics (CFD) to do so. Thesecalculations are very time-consuming, though, so that they are oftenleft until the mechanical design of the unit. The exception wouldoccur when the flow variation and concentration distribution had asignificant effect on the output of the unit so that mass and energy bal-ances couldn’t be made without it.

The process units are described in greater detail in other sections ofthe Handbook. In each case, parameters of the unit are specified (size,temperature, pressure, area, and so forth). In addition, in a computersimulation, the computer program must be able to take any input tothe unit and calculate the output for those parameters. Since theentire calculation is done iteratively, there is no assurance that the

It has been found that these dimensionless groups may be correlated well by anequation of the type

hD/k = K(cpµ/k)a(DVρ/µ)b

in which K, a, and b are experimentally determined dimensionless constants.However, any other type of algebraic expression or perhaps simply a graphicalrelation among these three groups that accurately fits the experimental datawould be an equally valid manner of expressing Eq. (3-121).

Naturally, other dimensionless groups might have been obtained inthe example by employing a different set of five repeating quantities

Page 93: 03 mathematics

input stream is a “reasonable” one, so that the computer codes mustbe written to give some sort of output even when the input stream isunreasonable. This difficulty makes the iterative process even morecomplicated.

PROCESS TOPOLOGY

A chemical process usually consists of a series of units, such as distilla-tion towers, reactors, and so forth (see Fig. 3-67). If the feed to theprocess is known and the operating parameters of the units are speci-fied by the user, then one can begin with the first unit, take theprocess input, calculate the unit output, carry that output to the inputof the next unit, and continue the process. However, if the processinvolves a recycle stream, as nearly all chemical processes do, thenwhen the calculation is begun, it is discovered that the recycle streamis unknown. This situation leads to an iterative process: the flow rates,temperature, and pressure of the unknown recycle stream areguessed, and the calculations proceed as before. When one reachesthe end of the process, where the recycle stream is formed to returnto the first unit, it is necessary to check to see if the recycle stream isthe same as assumed. If not, an iterative procedure must be used tocause convergence. Possible techniques are described in “NumericalSolutions of Nonlinear Equations in One Variable” and “NumericalSolution of Simultaneous Equations.” The direct method (or succes-sive substitution method) just involves calculating around the processover and over. The Wegstein method accelerates convergence for asingle variable, and Broyden’s method does the same for multiple vari-ables. The Newton method can be used provided there is some way tocalculate the derivatives (possibly by using a numerical derivative).Optimization methods can also be used (see “Optimization” in thissection). In the description given here, the recycle stream is called thetear stream: this is the stream that must be guessed to begin the cal-culation. When there are multiple recycle streams, convergence iseven more difficult, since more guesses are necessary, and what hap-pens in one recycle stream may cause difficulties for the guesses inother recycle streams. See Seader (1985) and Mah (1990).

It is sometimes desired to control some stream by varying an oper-ating parameter. For example, in a reaction/separation system, if thereis an impurity that must be purged, a common objective is to set thepurge fraction so that the impurity concentration into the reactor is keptat some moderate value. Commercial packages contain procedures for

doing this using what are often called control blocks. However, this canalso make the solution more difficult to find.

An alternative method of solving the equations is to solve them assimultaneous equations. In that case, one can specify the design vari-ables and the desired specifications and let the computer figure outthe process parameters that will achieve those objectives. It is possi-ble to overspecify the system or to give impossible conditions. How-ever, the biggest drawback to this method of simulation is that largesets (tens of thousands) of nonlinear algebraic equations must besolved simultaneously. As computers become faster, this is less of animpediment, provided efficient software is available.

Dynamic simulations are also possible, and these require solving dif-ferential equations, sometimes with algebraic constraints. If some partsof the process change extremely quickly when there is a disturbance, thatpart of the process may be modeled in the steady state for the disturbanceat any instant. Such situations are called stiff, and the methods for themare discussed in “Numerical Solution of Ordinary Differential Equationsas Initial-Value Problems.” It must be realized, though, that a dynamiccalculation can also be time-consuming, and sometimes the allowableunits are lumped-parameter models that are simplifications of the equa-tions used for the steady-state analysis. Thus, as always, the assumptionsneed to be examined critically before accepting the computer results.

COMMERCIAL PACKAGES

Computer programs are provided by many companies, and the mod-els range from empirical models to deterministic models. For exam-ple, if one wanted to know the pressure drop in a piping network,one would normally use a correlation for friction factor as a functionof Reynolds number to calculate the pressure drop in each segment.A sophisticated turbulence model of fluid flow is not needed in thatcase. As computers become faster, however, more and more modelsare deterministic. Since the commercial codes have been used bymany customers, the data in them have been verified, but possiblynot for the case you want to solve. Thus, you must test the thermo-dynamics correlations carefully. In 2005, there were a number ofcomputer codes, but the company names change constantly. Hereare a few of them for process simulation: Aspen Tech (Aspen Plus),Chemstations (CHEMCAD), Honeywell (UniSim Design), ProSim(ProSimPlus), and SimSci-Esseor (Pro II). The CAPE-OPEN proj-ect is working to make details as transferable as possible.

3-90 MATHEMATICS

6

3

Mixer Reactor Separator1 2 4 5

66

FIG. 3-67 Prototype flowsheet.